diff --git a/Python/matplotlab/README.md b/Python/matplotlab/README.md new file mode 100644 index 00000000..59ddf1e7 --- /dev/null +++ b/Python/matplotlab/README.md @@ -0,0 +1,335 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# 教程 + +本页面包含有关使用Matplotlib的更深入的指南。 +它分为初学者、中级和高级部分,以及涵盖特定主题的部分。 + +有关更短的示例,请参阅我们的[示例陈列馆](/gallery/index.html)。 +您还可以在我们的[用户指南](https://matplotlib.org/contents.html)中找到[外部资源](/resources/index.html) +和[常见问题解答](/faq/index.html)。 + +## 序言 + +这些教程涵盖了使用 Matplotlib 创建可视化的基础知识,以及有效使用软件包的一些最佳实践。 + + + +## 中级 + +这些教程涵盖了Matplotlib中一些更复杂的类和函数。它们对于特定的自定义和复杂的可视化很有用。 + + + +## 高级 + +这些教程涵盖了经验丰富的Matplotlib用户和开发人员的高级主题。 + + + +## 颜色 + +Matplotlib支持使用多种颜色和颜色图可视化信息。 这些教程涵盖了这些颜色图的外观,如何创建自己的颜色图以及如何针对用例自定义颜色图的基础知识。 + +有关更多信息,请参见[示例陈列馆](/gallery/index.htm)。 + + + +## 文本 + +matplotlib具有广泛的文本支持,包括对数学表达式的支持,对栅格和矢量输出的truetype支持, +任意旋转的换行符分隔文本以及unicode支持。 +这些教程涵盖了在Matplotlib中处理文本的基础知识。 + + + +## 小工具 + +这些教程涵盖了旨在扩展 Matplotlib 功能以实现特定目标的工具包。 + + \ No newline at end of file diff --git a/Python/matplotlab/advanced/path_tutorial.md b/Python/matplotlab/advanced/path_tutorial.md new file mode 100644 index 00000000..1d2c463c --- /dev/null +++ b/Python/matplotlab/advanced/path_tutorial.md @@ -0,0 +1,265 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Path Tutorial + +Defining paths in your Matplotlib visualization. + +The object underlying all of the ``matplotlib.patch`` objects is +the [``Path``](https://matplotlib.orgapi/path_api.html#matplotlib.path.Path), which supports the standard set of +moveto, lineto, curveto commands to draw simple and compound outlines +consisting of line segments and splines. The ``Path`` is instantiated +with a (N,2) array of (x,y) vertices, and a N-length array of path +codes. For example to draw the unit rectangle from (0,0) to (1,1), we +could use this code + +``` python +import matplotlib.pyplot as plt +from matplotlib.path import Path +import matplotlib.patches as patches + +verts = [ + (0., 0.), # left, bottom + (0., 1.), # left, top + (1., 1.), # right, top + (1., 0.), # right, bottom + (0., 0.), # ignored +] + +codes = [ + Path.MOVETO, + Path.LINETO, + Path.LINETO, + Path.LINETO, + Path.CLOSEPOLY, +] + +path = Path(verts, codes) + +fig, ax = plt.subplots() +patch = patches.PathPatch(path, facecolor='orange', lw=2) +ax.add_patch(patch) +ax.set_xlim(-2, 2) +ax.set_ylim(-2, 2) +plt.show() +``` + +![sphx_glr_path_tutorial_001](https://matplotlib.org/_images/sphx_glr_path_tutorial_001.png) + +The following path codes are recognized + + +--- + + + + + + +Code +Vertices +Description + + + +STOP +1 (ignored) +A marker for the end of the entire path (currently not required and ignored) + +MOVETO +1 +Pick up the pen and move to the given vertex. + +LINETO +1 +Draw a line from the current position to the given vertex. + +CURVE3 +2 (1 control point, 1 endpoint) +Draw a quadratic Bézier curve from the current position, with the given control point, to the given end point. + +CURVE4 +3 (2 control points, 1 endpoint) +Draw a cubic Bézier curve from the current position, with the given control points, to the given end point. + +CLOSEPOLY +1 (point itself is ignored) +Draw a line segment to the start point of the current polyline. + + + + +## Bézier example + +Some of the path components require multiple vertices to specify them: +for example CURVE 3 is a [bézier](https://en.wikipedia.org/wiki/B%C3%A9zier_curve) curve with one +control point and one end point, and CURVE4 has three vertices for the +two control points and the end point. The example below shows a +CURVE4 Bézier spline -- the bézier curve will be contained in the +convex hull of the start point, the two control points, and the end +point + +``` python +verts = [ + (0., 0.), # P0 + (0.2, 1.), # P1 + (1., 0.8), # P2 + (0.8, 0.), # P3 +] + +codes = [ + Path.MOVETO, + Path.CURVE4, + Path.CURVE4, + Path.CURVE4, +] + +path = Path(verts, codes) + +fig, ax = plt.subplots() +patch = patches.PathPatch(path, facecolor='none', lw=2) +ax.add_patch(patch) + +xs, ys = zip(*verts) +ax.plot(xs, ys, 'x--', lw=2, color='black', ms=10) + +ax.text(-0.05, -0.05, 'P0') +ax.text(0.15, 1.05, 'P1') +ax.text(1.05, 0.85, 'P2') +ax.text(0.85, -0.05, 'P3') + +ax.set_xlim(-0.1, 1.1) +ax.set_ylim(-0.1, 1.1) +plt.show() +``` + +![sphx_glr_path_tutorial_002](https://matplotlib.org/_images/sphx_glr_path_tutorial_002.png) + +## Compound paths + +All of the simple patch primitives in matplotlib, Rectangle, Circle, +Polygon, etc, are implemented with simple path. Plotting functions +like [``hist()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.hist.html#matplotlib.axes.Axes.hist) and +[``bar()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.bar.html#matplotlib.axes.Axes.bar), which create a number of +primitives, e.g., a bunch of Rectangles, can usually be implemented more +efficiently using a compound path. The reason ``bar`` creates a list +of rectangles and not a compound path is largely historical: the +[``Path``](https://matplotlib.orgapi/path_api.html#matplotlib.path.Path) code is comparatively new and ``bar`` +predates it. While we could change it now, it would break old code, +so here we will cover how to create compound paths, replacing the +functionality in bar, in case you need to do so in your own code for +efficiency reasons, e.g., you are creating an animated bar plot. + +We will make the histogram chart by creating a series of rectangles +for each histogram bar: the rectangle width is the bin width and the +rectangle height is the number of datapoints in that bin. First we'll +create some random normally distributed data and compute the +histogram. Because numpy returns the bin edges and not centers, the +length of ``bins`` is 1 greater than the length of ``n`` in the +example below: + +``` python +# histogram our data with numpy +data = np.random.randn(1000) +n, bins = np.histogram(data, 100) +``` + +We'll now extract the corners of the rectangles. Each of the +``left``, ``bottom``, etc, arrays below is ``len(n)``, where ``n`` is +the array of counts for each histogram bar: + +``` python +# get the corners of the rectangles for the histogram +left = np.array(bins[:-1]) +right = np.array(bins[1:]) +bottom = np.zeros(len(left)) +top = bottom + n +``` + +Now we have to construct our compound path, which will consist of a +series of ``MOVETO``, ``LINETO`` and ``CLOSEPOLY`` for each rectangle. +For each rectangle, we need 5 vertices: 1 for the ``MOVETO``, 3 for +the ``LINETO``, and 1 for the ``CLOSEPOLY``. As indicated in the +table above, the vertex for the closepoly is ignored but we still need +it to keep the codes aligned with the vertices: + +``` python +nverts = nrects*(1+3+1) +verts = np.zeros((nverts, 2)) +codes = np.ones(nverts, int) * path.Path.LINETO +codes[0::5] = path.Path.MOVETO +codes[4::5] = path.Path.CLOSEPOLY +verts[0::5,0] = left +verts[0::5,1] = bottom +verts[1::5,0] = left +verts[1::5,1] = top +verts[2::5,0] = right +verts[2::5,1] = top +verts[3::5,0] = right +verts[3::5,1] = bottom +``` + +All that remains is to create the path, attach it to a +``PathPatch``, and add it to our axes: + +``` python +barpath = path.Path(verts, codes) +patch = patches.PathPatch(barpath, facecolor='green', + edgecolor='yellow', alpha=0.5) +ax.add_patch(patch) +``` + +``` python +import numpy as np +import matplotlib.patches as patches +import matplotlib.path as path + +fig, ax = plt.subplots() +# Fixing random state for reproducibility +np.random.seed(19680801) + +# histogram our data with numpy +data = np.random.randn(1000) +n, bins = np.histogram(data, 100) + +# get the corners of the rectangles for the histogram +left = np.array(bins[:-1]) +right = np.array(bins[1:]) +bottom = np.zeros(len(left)) +top = bottom + n +nrects = len(left) + +nverts = nrects*(1+3+1) +verts = np.zeros((nverts, 2)) +codes = np.ones(nverts, int) * path.Path.LINETO +codes[0::5] = path.Path.MOVETO +codes[4::5] = path.Path.CLOSEPOLY +verts[0::5, 0] = left +verts[0::5, 1] = bottom +verts[1::5, 0] = left +verts[1::5, 1] = top +verts[2::5, 0] = right +verts[2::5, 1] = top +verts[3::5, 0] = right +verts[3::5, 1] = bottom + +barpath = path.Path(verts, codes) +patch = patches.PathPatch(barpath, facecolor='green', + edgecolor='yellow', alpha=0.5) +ax.add_patch(patch) + +ax.set_xlim(left[0], right[-1]) +ax.set_ylim(bottom.min(), top.max()) + +plt.show() +``` + +![sphx_glr_path_tutorial_003](https://matplotlib.org/_images/sphx_glr_path_tutorial_003.png) + +## Download + +- [Download Python source code: path_tutorial.py](https://matplotlib.org/_downloads/ec90dd07bc241d860eb972db796c96bc/path_tutorial.py) +- [Download Jupyter notebook: path_tutorial.ipynb](https://matplotlib.org/_downloads/da8cacf827800cc7398495a527da865d/path_tutorial.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/advanced/patheffects_guide.md b/Python/matplotlab/advanced/patheffects_guide.md new file mode 100644 index 00000000..82068387 --- /dev/null +++ b/Python/matplotlab/advanced/patheffects_guide.md @@ -0,0 +1,122 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Path effects guide + +Defining paths that objects follow on a canvas. + +Matplotlib's [``patheffects``](#module-matplotlib.patheffects) module provides functionality to +apply a multiple draw stage to any Artist which can be rendered via a +[``Path``](https://matplotlib.orgapi/path_api.html#matplotlib.path.Path). + +Artists which can have a path effect applied to them include [``Patch``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch), +[``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D), [``Collection``](https://matplotlib.orgapi/collections_api.html#matplotlib.collections.Collection) and even +[``Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text). Each artist's path effects can be controlled via the +``set_path_effects`` method ([``set_path_effects``](https://matplotlib.orgapi/_as_gen/matplotlib.artist.Artist.set_path_effects.html#matplotlib.artist.Artist.set_path_effects)), which takes +an iterable of [``AbstractPathEffect``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.AbstractPathEffect) instances. + +The simplest path effect is the [``Normal``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.Normal) effect, which simply +draws the artist without any effect: + +``` python +import matplotlib.pyplot as plt +import matplotlib.patheffects as path_effects + +fig = plt.figure(figsize=(5, 1.5)) +text = fig.text(0.5, 0.5, 'Hello path effects world!\nThis is the normal ' + 'path effect.\nPretty dull, huh?', + ha='center', va='center', size=20) +text.set_path_effects([path_effects.Normal()]) +plt.show() +``` + +![sphx_glr_patheffects_guide_001](https://matplotlib.org/_images/sphx_glr_patheffects_guide_001.png) + +Whilst the plot doesn't look any different to what you would expect without any path +effects, the drawing of the text now been changed to use the path effects +framework, opening up the possibilities for more interesting examples. + +## Adding a shadow + +A far more interesting path effect than [``Normal``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.Normal) is the +drop-shadow, which we can apply to any of our path based artists. The classes +[``SimplePatchShadow``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.SimplePatchShadow) and +[``SimpleLineShadow``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.SimpleLineShadow) do precisely this by drawing either a filled +patch or a line patch below the original artist: + +``` python +import matplotlib.patheffects as path_effects + +text = plt.text(0.5, 0.5, 'Hello path effects world!', + path_effects=[path_effects.withSimplePatchShadow()]) + +plt.plot([0, 3, 2, 5], linewidth=5, color='blue', + path_effects=[path_effects.SimpleLineShadow(), + path_effects.Normal()]) +plt.show() +``` + +![sphx_glr_patheffects_guide_002](https://matplotlib.org/_images/sphx_glr_patheffects_guide_002.png) + +Notice the two approaches to setting the path effects in this example. The +first uses the ``with*`` classes to include the desired functionality automatically +followed with the "normal" effect, whereas the latter explicitly defines the two path +effects to draw. + +## Making an artist stand out + +One nice way of making artists visually stand out is to draw an outline in a bold +color below the actual artist. The [``Stroke``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.Stroke) path effect +makes this a relatively simple task: + +``` python +fig = plt.figure(figsize=(7, 1)) +text = fig.text(0.5, 0.5, 'This text stands out because of\n' + 'its black border.', color='white', + ha='center', va='center', size=30) +text.set_path_effects([path_effects.Stroke(linewidth=3, foreground='black'), + path_effects.Normal()]) +plt.show() +``` + +![sphx_glr_patheffects_guide_003](https://matplotlib.org/_images/sphx_glr_patheffects_guide_003.png) + +It is important to note that this effect only works because we have drawn the text +path twice; once with a thick black line, and then once with the original text +path on top. + +You may have noticed that the keywords to [``Stroke``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.Stroke) and +[``SimplePatchShadow``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.SimplePatchShadow) and [``SimpleLineShadow``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.SimpleLineShadow) are not the usual Artist +keywords (such as ``facecolor`` and ``edgecolor`` etc.). This is because with these +path effects we are operating at lower level of matplotlib. In fact, the keywords +which are accepted are those for a [``matplotlib.backend_bases.GraphicsContextBase``](https://matplotlib.orgapi/backend_bases_api.html#matplotlib.backend_bases.GraphicsContextBase) +instance, which have been designed for making it easy to create new backends - and not +for its user interface. + +## Greater control of the path effect artist + +As already mentioned, some of the path effects operate at a lower level than most users +will be used to, meaning that setting keywords such as ``facecolor`` and ``edgecolor`` +raise an AttributeError. Luckily there is a generic [``PathPatchEffect``](https://matplotlib.orgapi/patheffects_api.html#matplotlib.patheffects.PathPatchEffect) path effect +which creates a [``PathPatch``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.PathPatch.html#matplotlib.patches.PathPatch) class with the original path. +The keywords to this effect are identical to those of [``PathPatch``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.PathPatch.html#matplotlib.patches.PathPatch): + +``` python +fig = plt.figure(figsize=(8, 1)) +t = fig.text(0.02, 0.5, 'Hatch shadow', fontsize=75, weight=1000, va='center') +t.set_path_effects([path_effects.PathPatchEffect(offset=(4, -4), hatch='xxxx', + facecolor='gray'), + path_effects.PathPatchEffect(edgecolor='white', linewidth=1.1, + facecolor='black')]) +plt.show() +``` + +![sphx_glr_patheffects_guide_004](https://matplotlib.org/_images/sphx_glr_patheffects_guide_004.png) + +## Download + +- [Download Python source code: patheffects_guide.py](https://matplotlib.org/_downloads/b0857128f7eceadab81240baf9185710/patheffects_guide.py) +- [Download Jupyter notebook: patheffects_guide.ipynb](https://matplotlib.org/_downloads/d678b58ce777643e611577a5aafc6f8d/patheffects_guide.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/advanced/transforms_tutorial.md b/Python/matplotlab/advanced/transforms_tutorial.md new file mode 100644 index 00000000..2eaeff51 --- /dev/null +++ b/Python/matplotlab/advanced/transforms_tutorial.md @@ -0,0 +1,615 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Transformations Tutorial + +Like any graphics packages, Matplotlib is built on top of a +transformation framework to easily move between coordinate systems, +the userland ``data`` coordinate system, the ``axes`` coordinate system, +the ``figure`` coordinate system, and the ``display`` coordinate system. +In 95% of your plotting, you won't need to think about this, as it +happens under the hood, but as you push the limits of custom figure +generation, it helps to have an understanding of these objects so you +can reuse the existing transformations Matplotlib makes available to +you, or create your own (see [``matplotlib.transforms``](https://matplotlib.orgapi/transformations.html#module-matplotlib.transforms)). The table +below summarizes the some useful coordinate systems, the transformation +object you should use to work in that coordinate system, and the +description of that system. In the ``Transformation Object`` column, +``ax`` is a [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) instance, and ``fig`` is a +[``Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) instance. + + +--- + + + + + + +Coordinates +Transformation object +Description + + + +"data" +ax.transData +The coordinate system for the data, +controlled by xlim and ylim. + +"axes" +ax.trans[Axes](https://matplotlib.org/../api/axes_api.html#matplotlib.axes.Axes) +The coordinate system of the +Axes; (0, 0) +is bottom left of the axes, and +(1, 1) is top right of the axes. + +"figure" +fig.trans[[Figure](https://matplotlib.org/../api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure)](https://matplotlib.org/../api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) +The coordinate system of the +Figure; (0, 0) is bottom left +of the figure, and (1, 1) is top +right of the figure. + +"figure-inches" +fig.dpi_scale_trans +The coordinate system of the +Figure in inches; (0, 0) is +bottom left of the figure, and +(width, height) is the top right +of the figure in inches. + +"display" +None, or +IdentityTransform() +The pixel coordinate system of the +display window; (0, 0) is bottom +left of the window, and (width, +height) is top right of the +display window in pixels. + +"xaxis", +"yaxis" +ax.get_xaxis_transform(), +ax.get_yaxis_transform() +Blended coordinate systems; use +data coordinates on one of the axis +and axes coordinates on the other. + + + + +All of the transformation objects in the table above take inputs in +their coordinate system, and transform the input to the ``display`` +coordinate system. That is why the ``display`` coordinate system has +``None`` for the ``Transformation Object`` column -- it already is in +display coordinates. The transformations also know how to invert +themselves, to go from ``display`` back to the native coordinate system. +This is particularly useful when processing events from the user +interface, which typically occur in display space, and you want to +know where the mouse click or key-press occurred in your data +coordinate system. + +Note that specifying objects in ``display`` coordinates will change their +location if the ``dpi`` of the figure changes. This can cause confusion when +printing or changing screen resolution, because the object can change location +and size. Therefore it is most common +for artists placed in an axes or figure to have their transform set to +something *other* than the [``IdentityTransform()``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.IdentityTransform); the default when +an artist is placed on an axes using ``add_artist`` is for the +transform to be ``ax.transData``. + +## Data coordinates + +Let's start with the most commonly used coordinate, the ``data`` +coordinate system. Whenever you add data to the axes, Matplotlib +updates the datalimits, most commonly updated with the +[``set_xlim()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_xlim.html#matplotlib.axes.Axes.set_xlim) and +[``set_ylim()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_ylim.html#matplotlib.axes.Axes.set_ylim) methods. For example, in the +figure below, the data limits stretch from 0 to 10 on the x-axis, and +-1 to 1 on the y-axis. + +``` python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.patches as mpatches + +x = np.arange(0, 10, 0.005) +y = np.exp(-x/2.) * np.sin(2*np.pi*x) + +fig, ax = plt.subplots() +ax.plot(x, y) +ax.set_xlim(0, 10) +ax.set_ylim(-1, 1) + +plt.show() +``` + +![sphx_glr_transforms_tutorial_001](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_001.png) + +You can use the ``ax.transData`` instance to transform from your +``data`` to your ``display`` coordinate system, either a single point or a +sequence of points as shown below: + +``` python +In [14]: type(ax.transData) +Out[14]: + +In [15]: ax.transData.transform((5, 0)) +Out[15]: array([ 335.175, 247. ]) + +In [16]: ax.transData.transform([(5, 0), (1, 2)]) +Out[16]: +array([[ 335.175, 247. ], + [ 132.435, 642.2 ]]) +``` + +You can use the [``inverted()``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.Transform.inverted) +method to create a transform which will take you from display to data +coordinates: + +``` python +In [41]: inv = ax.transData.inverted() + +In [42]: type(inv) +Out[42]: + +In [43]: inv.transform((335.175, 247.)) +Out[43]: array([ 5., 0.]) +``` + +If your are typing along with this tutorial, the exact values of the +display coordinates may differ if you have a different window size or +dpi setting. Likewise, in the figure below, the display labeled +points are probably not the same as in the ipython session because the +documentation figure size defaults are different. + +``` python +x = np.arange(0, 10, 0.005) +y = np.exp(-x/2.) * np.sin(2*np.pi*x) + +fig, ax = plt.subplots() +ax.plot(x, y) +ax.set_xlim(0, 10) +ax.set_ylim(-1, 1) + +xdata, ydata = 5, 0 +xdisplay, ydisplay = ax.transData.transform_point((xdata, ydata)) + +bbox = dict(boxstyle="round", fc="0.8") +arrowprops = dict( + arrowstyle="->", + connectionstyle="angle,angleA=0,angleB=90,rad=10") + +offset = 72 +ax.annotate('data = (%.1f, %.1f)' % (xdata, ydata), + (xdata, ydata), xytext=(-2*offset, offset), textcoords='offset points', + bbox=bbox, arrowprops=arrowprops) + +disp = ax.annotate('display = (%.1f, %.1f)' % (xdisplay, ydisplay), + (xdisplay, ydisplay), xytext=(0.5*offset, -offset), + xycoords='figure pixels', + textcoords='offset points', + bbox=bbox, arrowprops=arrowprops) + +plt.show() +``` + +![sphx_glr_transforms_tutorial_002](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_002.png) + +::: tip Note + +If you run the source code in the example above in a GUI backend, +you may also find that the two arrows for the ``data`` and ``display`` +annotations do not point to exactly the same point. This is because +the display point was computed before the figure was displayed, and +the GUI backend may slightly resize the figure when it is created. +The effect is more pronounced if you resize the figure yourself. +This is one good reason why you rarely want to work in display +space, but you can connect to the ``'on_draw'`` +[``Event``](https://matplotlib.orgapi/backend_bases_api.html#matplotlib.backend_bases.Event) to update figure +coordinates on figure draws; see [Event handling and picking](https://matplotlib.orgusers/event_handling.html#event-handling-tutorial). + +::: + +When you change the x or y limits of your axes, the data limits are +updated so the transformation yields a new display point. Note that +when we just change the ylim, only the y-display coordinate is +altered, and when we change the xlim too, both are altered. More on +this later when we talk about the +[``Bbox``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.Bbox). + +``` python +In [54]: ax.transData.transform((5, 0)) +Out[54]: array([ 335.175, 247. ]) + +In [55]: ax.set_ylim(-1, 2) +Out[55]: (-1, 2) + +In [56]: ax.transData.transform((5, 0)) +Out[56]: array([ 335.175 , 181.13333333]) + +In [57]: ax.set_xlim(10, 20) +Out[57]: (10, 20) + +In [58]: ax.transData.transform((5, 0)) +Out[58]: array([-171.675 , 181.13333333]) +``` + +## Axes coordinates + +After the ``data`` coordinate system, ``axes`` is probably the second most +useful coordinate system. Here the point (0, 0) is the bottom left of +your axes or subplot, (0.5, 0.5) is the center, and (1.0, 1.0) is the +top right. You can also refer to points outside the range, so (-0.1, +1.1) is to the left and above your axes. This coordinate system is +extremely useful when placing text in your axes, because you often +want a text bubble in a fixed, location, e.g., the upper left of the axes +pane, and have that location remain fixed when you pan or zoom. Here +is a simple example that creates four panels and labels them 'A', 'B', +'C', 'D' as you often see in journals. + +``` python +fig = plt.figure() +for i, label in enumerate(('A', 'B', 'C', 'D')): + ax = fig.add_subplot(2, 2, i+1) + ax.text(0.05, 0.95, label, transform=ax.transAxes, + fontsize=16, fontweight='bold', va='top') + +plt.show() +``` + +![sphx_glr_transforms_tutorial_003](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_003.png) + +You can also make lines or patches in the axes coordinate system, but +this is less useful in my experience than using ``ax.transAxes`` for +placing text. Nonetheless, here is a silly example which plots some +random dots in ``data`` space, and overlays a semi-transparent +[``Circle``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Circle.html#matplotlib.patches.Circle) centered in the middle of the axes +with a radius one quarter of the axes -- if your axes does not +preserve aspect ratio (see [``set_aspect()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_aspect.html#matplotlib.axes.Axes.set_aspect)), +this will look like an ellipse. Use the pan/zoom tool to move around, +or manually change the data xlim and ylim, and you will see the data +move, but the circle will remain fixed because it is not in ``data`` +coordinates and will always remain at the center of the axes. + +``` python +fig, ax = plt.subplots() +x, y = 10*np.random.rand(2, 1000) +ax.plot(x, y, 'go', alpha=0.2) # plot some data in data coordinates + +circ = mpatches.Circle((0.5, 0.5), 0.25, transform=ax.transAxes, + facecolor='blue', alpha=0.75) +ax.add_patch(circ) +plt.show() +``` + +![sphx_glr_transforms_tutorial_004](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_004.png) + +## Blended transformations + +Drawing in ``blended`` coordinate spaces which mix ``axes`` with ``data`` +coordinates is extremely useful, for example to create a horizontal +span which highlights some region of the y-data but spans across the +x-axis regardless of the data limits, pan or zoom level, etc. In fact +these blended lines and spans are so useful, we have built in +functions to make them easy to plot (see +[``axhline()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.axhline.html#matplotlib.axes.Axes.axhline), +[``axvline()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.axvline.html#matplotlib.axes.Axes.axvline), +[``axhspan()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.axhspan.html#matplotlib.axes.Axes.axhspan), +[``axvspan()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.axvspan.html#matplotlib.axes.Axes.axvspan)) but for didactic purposes we +will implement the horizontal span here using a blended +transformation. This trick only works for separable transformations, +like you see in normal Cartesian coordinate systems, but not on +inseparable transformations like the +[``PolarTransform``](https://matplotlib.orgapi/projections_api.html#matplotlib.projections.polar.PolarAxes.PolarTransform). + +``` python +import matplotlib.transforms as transforms + +fig, ax = plt.subplots() +x = np.random.randn(1000) + +ax.hist(x, 30) +ax.set_title(r'$\sigma=1 \/ \dots \/ \sigma=2$', fontsize=16) + +# the x coords of this transformation are data, and the +# y coord are axes +trans = transforms.blended_transform_factory( + ax.transData, ax.transAxes) + +# highlight the 1..2 stddev region with a span. +# We want x to be in data coordinates and y to +# span from 0..1 in axes coords +rect = mpatches.Rectangle((1, 0), width=1, height=1, + transform=trans, color='yellow', + alpha=0.5) + +ax.add_patch(rect) + +plt.show() +``` + +![sphx_glr_transforms_tutorial_005](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_005.png) + +::: tip Note + +The blended transformations where x is in data coords and y in axes +coordinates is so useful that we have helper methods to return the +versions mpl uses internally for drawing ticks, ticklabels, etc. +The methods are [``matplotlib.axes.Axes.get_xaxis_transform()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.get_xaxis_transform.html#matplotlib.axes.Axes.get_xaxis_transform) and +[``matplotlib.axes.Axes.get_yaxis_transform()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.get_yaxis_transform.html#matplotlib.axes.Axes.get_yaxis_transform). So in the example +above, the call to +[``blended_transform_factory()``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.blended_transform_factory) can be +replaced by ``get_xaxis_transform``: + +``` python +trans = ax.get_xaxis_transform() +``` + +::: + +## Plotting in physical units + +Sometimes we want an object to be a certain physical size on the plot. +Here we draw the same circle as above, but in physical units. If done +interactively, you can see that changing the size of the figure does +not change the offset of the circle from the lower-left corner, +does not change its size, and the circle remains a circle regardless of +the aspect ratio of the axes. + +``` python +fig, ax = plt.subplots(figsize=(5, 4)) +x, y = 10*np.random.rand(2, 1000) +ax.plot(x, y*10., 'go', alpha=0.2) # plot some data in data coordinates +# add a circle in fixed-units +circ = mpatches.Circle((2.5, 2), 1.0, transform=fig.dpi_scale_trans, + facecolor='blue', alpha=0.75) +ax.add_patch(circ) +plt.show() +``` + +![sphx_glr_transforms_tutorial_006](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_006.png) + +If we change the figure size, the circle does not change its absolute +position and is cropped. + +``` python +fig, ax = plt.subplots(figsize=(7, 2)) +x, y = 10*np.random.rand(2, 1000) +ax.plot(x, y*10., 'go', alpha=0.2) # plot some data in data coordinates +# add a circle in fixed-units +circ = mpatches.Circle((2.5, 2), 1.0, transform=fig.dpi_scale_trans, + facecolor='blue', alpha=0.75) +ax.add_patch(circ) +plt.show() +``` + +![sphx_glr_transforms_tutorial_007](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_007.png) + +Another use is putting a patch with a set physical dimension around a +data point on the axes. Here we add together two transforms. The +first sets the scaling of how large the ellipse should be and the second +sets its position. The ellipse is then placed at the origin, and then +we use the helper transform [``ScaledTranslation``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.ScaledTranslation) +to move it +to the right place in the ``ax.transData`` coordinate system. +This helper is instantiated with: + +``` python +trans = ScaledTranslation(xt, yt, scale_trans) +``` + +where ``xt`` and ``yt`` are the translation offsets, and ``scale_trans`` is +a transformation which scales ``xt`` and ``yt`` at transformation time +before applying the offsets. + +Note the use of the plus operator on the transforms below. +This code says: first apply the scale transformation ``fig.dpi_scale_trans`` +to make the ellipse the proper size, but still centered at (0, 0), +and then translate the data to ``xdata[0]`` and ``ydata[0]`` in data space. + +In interactive use, the ellipse stays the same size even if the +axes limits are changed via zoom. + +``` python +fig, ax = plt.subplots() +xdata, ydata = (0.2, 0.7), (0.5, 0.5) +ax.plot(xdata, ydata, "o") +ax.set_xlim((0, 1)) + +trans = (fig.dpi_scale_trans + + transforms.ScaledTranslation(xdata[0], ydata[0], ax.transData)) + +# plot an ellipse around the point that is 150 x 130 points in diameter... +circle = mpatches.Ellipse((0, 0), 150/72, 130/72, angle=40, + fill=None, transform=trans) +ax.add_patch(circle) +plt.show() +``` + +![sphx_glr_transforms_tutorial_008](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_008.png) + +::: tip Note + +The order of transformation matters. Here the ellipse +is given the right dimensions in display space *first* and then moved +in data space to the correct spot. +If we had done the ``ScaledTranslation`` first, then +``xdata[0]`` and ``ydata[0]`` would +first be transformed to ``display`` coordinates (``[ 358.4  475.2]`` on +a 200-dpi monitor) and then those coordinates +would be scaled by ``fig.dpi_scale_trans`` pushing the center of +the ellipse well off the screen (i.e. ``[ 71680.  95040.]``). + +::: + +## Using offset transforms to create a shadow effect + +Another use of [``ScaledTranslation``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.ScaledTranslation) is to create +a new transformation that is +offset from another transformation, e.g., to place one object shifted a +bit relative to another object. Typically you want the shift to be in +some physical dimension, like points or inches rather than in data +coordinates, so that the shift effect is constant at different zoom +levels and dpi settings. + +One use for an offset is to create a shadow effect, where you draw one +object identical to the first just to the right of it, and just below +it, adjusting the zorder to make sure the shadow is drawn first and +then the object it is shadowing above it. + +Here we apply the transforms in the *opposite* order to the use of +[``ScaledTranslation``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.ScaledTranslation) above. The plot is +first made in data units (``ax.transData``) and then shifted by +``dx`` and ``dy`` points using ``fig.dpi_scale_trans``. (In typography, +a`point <[https://en.wikipedia.org/wiki/Point_%28typography%29](https://en.wikipedia.org/wiki/Point_%28typography%29)>`_ is +1/72 inches, and by specifying your offsets in points, your figure +will look the same regardless of the dpi resolution it is saved in.) + +``` python +fig, ax = plt.subplots() + +# make a simple sine wave +x = np.arange(0., 2., 0.01) +y = np.sin(2*np.pi*x) +line, = ax.plot(x, y, lw=3, color='blue') + +# shift the object over 2 points, and down 2 points +dx, dy = 2/72., -2/72. +offset = transforms.ScaledTranslation(dx, dy, fig.dpi_scale_trans) +shadow_transform = ax.transData + offset + +# now plot the same data with our offset transform; +# use the zorder to make sure we are below the line +ax.plot(x, y, lw=3, color='gray', + transform=shadow_transform, + zorder=0.5*line.get_zorder()) + +ax.set_title('creating a shadow effect with an offset transform') +plt.show() +``` + +![sphx_glr_transforms_tutorial_009](https://matplotlib.org/_images/sphx_glr_transforms_tutorial_009.png) + +::: tip Note + +The dpi and inches offset is a +common-enough use case that we have a special helper function to +create it in [``matplotlib.transforms.offset_copy()``](https://matplotlib.orgapi/transformations.html#matplotlib.transforms.offset_copy), which returns +a new transform with an added offset. So above we could have done: + +``` python +shadow_transform = transforms.offset_copy(ax.transData, + fig=fig, dx, dy, units='inches') +``` + +::: + +## The transformation pipeline + +The ``ax.transData`` transform we have been working with in this +tutorial is a composite of three different transformations that +comprise the transformation pipeline from ``data`` -> ``display`` +coordinates. Michael Droettboom implemented the transformations +framework, taking care to provide a clean API that segregated the +nonlinear projections and scales that happen in polar and logarithmic +plots, from the linear affine transformations that happen when you pan +and zoom. There is an efficiency here, because you can pan and zoom +in your axes which affects the affine transformation, but you may not +need to compute the potentially expensive nonlinear scales or +projections on simple navigation events. It is also possible to +multiply affine transformation matrices together, and then apply them +to coordinates in one step. This is not true of all possible +transformations. + +Here is how the ``ax.transData`` instance is defined in the basic +separable axis [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) class: + +``` python +self.transData = self.transScale + (self.transLimits + self.transAxes) +``` + +We've been introduced to the ``transAxes`` instance above in +[Axes coordinates](#axes-coords), which maps the (0, 0), (1, 1) corners of the +axes or subplot bounding box to ``display`` space, so let's look at +these other two pieces. + +``self.transLimits`` is the transformation that takes you from +``data`` to ``axes`` coordinates; i.e., it maps your view xlim and ylim +to the unit space of the axes (and ``transAxes`` then takes that unit +space to display space). We can see this in action here + +``` python +In [80]: ax = subplot(111) + +In [81]: ax.set_xlim(0, 10) +Out[81]: (0, 10) + +In [82]: ax.set_ylim(-1, 1) +Out[82]: (-1, 1) + +In [84]: ax.transLimits.transform((0, -1)) +Out[84]: array([ 0., 0.]) + +In [85]: ax.transLimits.transform((10, -1)) +Out[85]: array([ 1., 0.]) + +In [86]: ax.transLimits.transform((10, 1)) +Out[86]: array([ 1., 1.]) + +In [87]: ax.transLimits.transform((5, 0)) +Out[87]: array([ 0.5, 0.5]) +``` + +and we can use this same inverted transformation to go from the unit +``axes`` coordinates back to ``data`` coordinates. + +``` python +In [90]: inv.transform((0.25, 0.25)) +Out[90]: array([ 2.5, -0.5]) +``` + +The final piece is the ``self.transScale`` attribute, which is +responsible for the optional non-linear scaling of the data, e.g., for +logarithmic axes. When an Axes is initially setup, this is just set to +the identity transform, since the basic Matplotlib axes has linear +scale, but when you call a logarithmic scaling function like +[``semilogx()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.semilogx.html#matplotlib.axes.Axes.semilogx) or explicitly set the scale to +logarithmic with [``set_xscale()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_xscale.html#matplotlib.axes.Axes.set_xscale), then the +``ax.transScale`` attribute is set to handle the nonlinear projection. +The scales transforms are properties of the respective ``xaxis`` and +``yaxis`` [``Axis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Axis) instances. For example, when +you call ``ax.set_xscale('log')``, the xaxis updates its scale to a +[``matplotlib.scale.LogScale``](https://matplotlib.orgapi/scale_api.html#matplotlib.scale.LogScale) instance. + +For non-separable axes the PolarAxes, there is one more piece to +consider, the projection transformation. The ``transData`` +[``matplotlib.projections.polar.PolarAxes``](https://matplotlib.orgapi/projections_api.html#matplotlib.projections.polar.PolarAxes) is similar to that for +the typical separable matplotlib Axes, with one additional piece +``transProjection``: + +``` python +self.transData = self.transScale + self.transProjection + \ + (self.transProjectionAffine + self.transAxes) +``` + +``transProjection`` handles the projection from the space, +e.g., latitude and longitude for map data, or radius and theta for polar +data, to a separable Cartesian coordinate system. There are several +projection examples in the ``matplotlib.projections`` package, and the +best way to learn more is to open the source for those packages and +see how to make your own, since Matplotlib supports extensible axes +and projections. Michael Droettboom has provided a nice tutorial +example of creating a Hammer projection axes; see +[Custom projection](https://matplotlib.orggallery/misc/custom_projection.html). + +**Total running time of the script:** ( 0 minutes 1.328 seconds) + +## Download + +- [Download Python source code: transforms_tutorial.py](https://matplotlib.org/_downloads/1d1cf62db33a4554c487470c01670fe5/transforms_tutorial.py) +- [Download Jupyter notebook: transforms_tutorial.ipynb](https://matplotlib.org/_downloads/b6ea9be45c260fbed02d8e2d9b2e4549/transforms_tutorial.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/colors/colorbar_only.md b/Python/matplotlab/colors/colorbar_only.md new file mode 100644 index 00000000..0c6d54dc --- /dev/null +++ b/Python/matplotlab/colors/colorbar_only.md @@ -0,0 +1,123 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Customized Colorbars Tutorial + +This tutorial shows how to build colorbars without an attached plot. + +## Customized Colorbars + +[``ColorbarBase``](https://matplotlib.orgapi/colorbar_api.html#matplotlib.colorbar.ColorbarBase) puts a colorbar in a specified axes, +and can make a colorbar for a given colormap; it does not need a mappable +object like an image. In this tutorial we will explore what can be done with +standalone colorbar. + +### Basic continuous colorbar + +Set the colormap and norm to correspond to the data for which the colorbar +will be used. Then create the colorbar by calling +[``ColorbarBase``](https://matplotlib.orgapi/colorbar_api.html#matplotlib.colorbar.ColorbarBase) and specify axis, colormap, norm +and orientation as parameters. Here we create a basic continuous colorbar +with ticks and labels. For more information see the +[``colorbar``](https://matplotlib.orgapi/colorbar_api.html#module-matplotlib.colorbar) API. + +``` python +import matplotlib.pyplot as plt +import matplotlib as mpl + +fig, ax = plt.subplots(figsize=(6, 1)) +fig.subplots_adjust(bottom=0.5) + +cmap = mpl.cm.cool +norm = mpl.colors.Normalize(vmin=5, vmax=10) + +cb1 = mpl.colorbar.ColorbarBase(ax, cmap=cmap, + norm=norm, + orientation='horizontal') +cb1.set_label('Some Units') +fig.show() +``` + +![sphx_glr_colorbar_only_001](https://matplotlib.org/_images/sphx_glr_colorbar_only_001.png) + +### Discrete intervals colorbar + +The second example illustrates the use of a +[``ListedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.ListedColormap.html#matplotlib.colors.ListedColormap) which generates a colormap from a +set of listed colors, ``colors.BoundaryNorm()`` which generates a colormap +index based on discrete intervals and extended ends to show the "over" and +"under" value colors. Over and under are used to display data outside of the +normalized [0,1] range. Here we pass colors as gray shades as a string +encoding a float in the 0-1 range. + +If a [``ListedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.ListedColormap.html#matplotlib.colors.ListedColormap) is used, the length of the +bounds array must be one greater than the length of the color list. The +bounds must be monotonically increasing. + +This time we pass some more arguments in addition to previous arguments to +[``ColorbarBase``](https://matplotlib.orgapi/colorbar_api.html#matplotlib.colorbar.ColorbarBase). For the out-of-range values to +display on the colorbar, we have to use the *extend* keyword argument. To use +*extend*, you must specify two extra boundaries. Finally spacing argument +ensures that intervals are shown on colorbar proportionally. + +``` python +fig, ax = plt.subplots(figsize=(6, 1)) +fig.subplots_adjust(bottom=0.5) + +cmap = mpl.colors.ListedColormap(['red', 'green', 'blue', 'cyan']) +cmap.set_over('0.25') +cmap.set_under('0.75') + +bounds = [1, 2, 4, 7, 8] +norm = mpl.colors.BoundaryNorm(bounds, cmap.N) +cb2 = mpl.colorbar.ColorbarBase(ax, cmap=cmap, + norm=norm, + boundaries=[0] + bounds + [13], + extend='both', + ticks=bounds, + spacing='proportional', + orientation='horizontal') +cb2.set_label('Discrete intervals, some other units') +fig.show() +``` + +![sphx_glr_colorbar_only_002](https://matplotlib.org/_images/sphx_glr_colorbar_only_002.png) + +### Colorbar with custom extension lengths + +Here we illustrate the use of custom length colorbar extensions, used on a +colorbar with discrete intervals. To make the length of each extension the +same as the length of the interior colors, use ``extendfrac='auto'``. + +``` python +fig, ax = plt.subplots(figsize=(6, 1)) +fig.subplots_adjust(bottom=0.5) + +cmap = mpl.colors.ListedColormap(['royalblue', 'cyan', + 'yellow', 'orange']) +cmap.set_over('red') +cmap.set_under('blue') + +bounds = [-1.0, -0.5, 0.0, 0.5, 1.0] +norm = mpl.colors.BoundaryNorm(bounds, cmap.N) +cb3 = mpl.colorbar.ColorbarBase(ax, cmap=cmap, + norm=norm, + boundaries=[-10] + bounds + [10], + extend='both', + extendfrac='auto', + ticks=bounds, + spacing='uniform', + orientation='horizontal') +cb3.set_label('Custom extension lengths, some other units') +fig.show() +``` + +![sphx_glr_colorbar_only_003](https://matplotlib.org/_images/sphx_glr_colorbar_only_003.png) + +## Download + +- [Download Python source code: colorbar_only.py](https://matplotlib.org/_downloads/23690f47313380b801750e3adc4c317e/colorbar_only.py) +- [Download Jupyter notebook: colorbar_only.ipynb](https://matplotlib.org/_downloads/4d3eb6ad2b03a5eb988f576ea050f104/colorbar_only.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/colors/colormap-manipulation.md b/Python/matplotlab/colors/colormap-manipulation.md new file mode 100644 index 00000000..4f89f884 --- /dev/null +++ b/Python/matplotlab/colors/colormap-manipulation.md @@ -0,0 +1,311 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Creating Colormaps in Matplotlib + +Matplotlib has a number of built-in colormaps accessible via +[``matplotlib.cm.get_cmap``](https://matplotlib.orgapi/cm_api.html#matplotlib.cm.get_cmap). There are also external libraries like +[palettable](https://jiffyclub.github.io/palettable/) that have many extra colormaps. + +However, we often want to create or manipulate colormaps in Matplotlib. +This can be done using the class [``ListedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.ListedColormap.html#matplotlib.colors.ListedColormap) and a Nx4 numpy array of +values between 0 and 1 to represent the RGBA values of the colormap. There +is also a [``LinearSegmentedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.LinearSegmentedColormap.html#matplotlib.colors.LinearSegmentedColormap) class that allows colormaps to be +specified with a few anchor points defining segments, and linearly +interpolating between the anchor points. + +## Getting colormaps and accessing their values + +First, getting a named colormap, most of which are listed in +[Choosing Colormaps in Matplotlib](colormaps.html) requires the use of +[``matplotlib.cm.get_cmap``](https://matplotlib.orgapi/cm_api.html#matplotlib.cm.get_cmap), which returns a +[``matplotlib.colors.ListedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.ListedColormap.html#matplotlib.colors.ListedColormap) object. The second argument gives +the size of the list of colors used to define the colormap, and below we +use a modest value of 12 so there are not a lot of values to look at. + +``` python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib import cm +from matplotlib.colors import ListedColormap, LinearSegmentedColormap + +viridis = cm.get_cmap('viridis', 12) +print(viridis) +``` + +Out: + +``` + +``` + +The object ``viridis`` is a callable, that when passed a float between +0 and 1 returns an RGBA value from the colormap: + +``` python +print(viridis(0.56)) +``` + +Out: + +``` +(0.119512, 0.607464, 0.540218, 1.0) +``` + +The list of colors that comprise the colormap can be directly accessed using +the ``colors`` property, +or it can be accessed indirectly by calling ``viridis`` with an array +of values matching the length of the colormap. Note that the returned list +is in the form of an RGBA Nx4 array, where N is the length of the colormap. + +``` python +print('viridis.colors', viridis.colors) +print('viridis(range(12))', viridis(range(12))) +print('viridis(np.linspace(0, 1, 12))', viridis(np.linspace(0, 1, 12))) +``` + +Out: + +``` +viridis.colors [[0.267004 0.004874 0.329415 1. ] + [0.283072 0.130895 0.449241 1. ] + [0.262138 0.242286 0.520837 1. ] + [0.220057 0.343307 0.549413 1. ] + [0.177423 0.437527 0.557565 1. ] + [0.143343 0.522773 0.556295 1. ] + [0.119512 0.607464 0.540218 1. ] + [0.166383 0.690856 0.496502 1. ] + [0.319809 0.770914 0.411152 1. ] + [0.525776 0.833491 0.288127 1. ] + [0.762373 0.876424 0.137064 1. ] + [0.993248 0.906157 0.143936 1. ]] +viridis(range(12)) [[0.267004 0.004874 0.329415 1. ] + [0.283072 0.130895 0.449241 1. ] + [0.262138 0.242286 0.520837 1. ] + [0.220057 0.343307 0.549413 1. ] + [0.177423 0.437527 0.557565 1. ] + [0.143343 0.522773 0.556295 1. ] + [0.119512 0.607464 0.540218 1. ] + [0.166383 0.690856 0.496502 1. ] + [0.319809 0.770914 0.411152 1. ] + [0.525776 0.833491 0.288127 1. ] + [0.762373 0.876424 0.137064 1. ] + [0.993248 0.906157 0.143936 1. ]] +viridis(np.linspace(0, 1, 12)) [[0.267004 0.004874 0.329415 1. ] + [0.283072 0.130895 0.449241 1. ] + [0.262138 0.242286 0.520837 1. ] + [0.220057 0.343307 0.549413 1. ] + [0.177423 0.437527 0.557565 1. ] + [0.143343 0.522773 0.556295 1. ] + [0.119512 0.607464 0.540218 1. ] + [0.166383 0.690856 0.496502 1. ] + [0.319809 0.770914 0.411152 1. ] + [0.525776 0.833491 0.288127 1. ] + [0.762373 0.876424 0.137064 1. ] + [0.993248 0.906157 0.143936 1. ]] +``` + +The colormap is a lookup table, so "oversampling" the colormap returns +nearest-neighbor interpolation (note the repeated colors in the list below) + +``` python +print('viridis(np.linspace(0, 1, 15))', viridis(np.linspace(0, 1, 15))) +``` + +Out: + +``` +viridis(np.linspace(0, 1, 15)) [[0.267004 0.004874 0.329415 1. ] + [0.267004 0.004874 0.329415 1. ] + [0.283072 0.130895 0.449241 1. ] + [0.262138 0.242286 0.520837 1. ] + [0.220057 0.343307 0.549413 1. ] + [0.177423 0.437527 0.557565 1. ] + [0.143343 0.522773 0.556295 1. ] + [0.119512 0.607464 0.540218 1. ] + [0.119512 0.607464 0.540218 1. ] + [0.166383 0.690856 0.496502 1. ] + [0.319809 0.770914 0.411152 1. ] + [0.525776 0.833491 0.288127 1. ] + [0.762373 0.876424 0.137064 1. ] + [0.993248 0.906157 0.143936 1. ] + [0.993248 0.906157 0.143936 1. ]] +``` + +## Creating listed colormaps + +This is essential the inverse operation of the above where we supply a +Nx4 numpy array with all values between 0 and 1, +to [``ListedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.ListedColormap.html#matplotlib.colors.ListedColormap) to make a new colormap. This means that +any numpy operations that we can do on a Nx4 array make carpentry of +new colormaps from existing colormaps quite straight forward. + +Suppose we want to make the first 25 entries of a 256-length "viridis" +colormap pink for some reason: + +``` python +viridis = cm.get_cmap('viridis', 256) +newcolors = viridis(np.linspace(0, 1, 256)) +pink = np.array([248/256, 24/256, 148/256, 1]) +newcolors[:25, :] = pink +newcmp = ListedColormap(newcolors) + + +def plot_examples(cms): + """ + helper function to plot two colormaps + """ + np.random.seed(19680801) + data = np.random.randn(30, 30) + + fig, axs = plt.subplots(1, 2, figsize=(6, 3), constrained_layout=True) + for [ax, cmap] in zip(axs, cms): + psm = ax.pcolormesh(data, cmap=cmap, rasterized=True, vmin=-4, vmax=4) + fig.colorbar(psm, ax=ax) + plt.show() + +plot_examples([viridis, newcmp]) +``` + +![sphx_glr_colormap-manipulation_001](https://matplotlib.org/_images/sphx_glr_colormap-manipulation_001.png) + +We can easily reduce the dynamic range of a colormap; here we choose the +middle 0.5 of the colormap. However, we need to interpolate from a larger +colormap, otherwise the new colormap will have repeated values. + +``` python +viridisBig = cm.get_cmap('viridis', 512) +newcmp = ListedColormap(viridisBig(np.linspace(0.25, 0.75, 256))) +plot_examples([viridis, newcmp]) +``` + +![sphx_glr_colormap-manipulation_002](https://matplotlib.org/_images/sphx_glr_colormap-manipulation_002.png) + +and we can easily concatenate two colormaps: + +``` python +top = cm.get_cmap('Oranges_r', 128) +bottom = cm.get_cmap('Blues', 128) + +newcolors = np.vstack((top(np.linspace(0, 1, 128)), + bottom(np.linspace(0, 1, 128)))) +newcmp = ListedColormap(newcolors, name='OrangeBlue') +plot_examples([viridis, newcmp]) +``` + +![sphx_glr_colormap-manipulation_003](https://matplotlib.org/_images/sphx_glr_colormap-manipulation_003.png) + +Of course we need not start from a named colormap, we just need to create +the Nx4 array to pass to [``ListedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.ListedColormap.html#matplotlib.colors.ListedColormap). Here we create a +brown colormap that goes to white.... + +``` python +N = 256 +vals = np.ones((N, 4)) +vals[:, 0] = np.linspace(90/256, 1, N) +vals[:, 1] = np.linspace(39/256, 1, N) +vals[:, 2] = np.linspace(41/256, 1, N) +newcmp = ListedColormap(vals) +plot_examples([viridis, newcmp]) +``` + +![sphx_glr_colormap-manipulation_004](https://matplotlib.org/_images/sphx_glr_colormap-manipulation_004.png) + +## Creating linear segmented colormaps + +[``LinearSegmentedColormap``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.LinearSegmentedColormap.html#matplotlib.colors.LinearSegmentedColormap) class specifies colormaps using anchor points +between which RGB(A) values are interpolated. + +The format to specify these colormaps allows discontinuities at the anchor +points. Each anchor point is specified as a row in a matrix of the +form ``[x[i] yleft[i] yright[i]]``, where ``x[i]`` is the anchor, and +``yleft[i]`` and ``yright[i]`` are the values of the color on either +side of the anchor point. + +If there are no discontinuities, then ``yleft[i]=yright[i]``: + +``` python +cdict = {'red': [[0.0, 0.0, 0.0], + [0.5, 1.0, 1.0], + [1.0, 1.0, 1.0]], + 'green': [[0.0, 0.0, 0.0], + [0.25, 0.0, 0.0], + [0.75, 1.0, 1.0], + [1.0, 1.0, 1.0]], + 'blue': [[0.0, 0.0, 0.0], + [0.5, 0.0, 0.0], + [1.0, 1.0, 1.0]]} + + +def plot_linearmap(cdict): + newcmp = LinearSegmentedColormap('testCmap', segmentdata=cdict, N=256) + rgba = newcmp(np.linspace(0, 1, 256)) + fig, ax = plt.subplots(figsize=(4, 3), constrained_layout=True) + col = ['r', 'g', 'b'] + for xx in [0.25, 0.5, 0.75]: + ax.axvline(xx, color='0.7', linestyle='--') + for i in range(3): + ax.plot(np.arange(256)/256, rgba[:, i], color=col[i]) + ax.set_xlabel('index') + ax.set_ylabel('RGB') + plt.show() + +plot_linearmap(cdict) +``` + +![sphx_glr_colormap-manipulation_005](https://matplotlib.org/_images/sphx_glr_colormap-manipulation_005.png) + +In order to make a discontinuity at an anchor point, the third column is +different than the second. The matrix for each of "red", "green", "blue", +and optionally "alpha" is set up as: + +``` python +cdict['red'] = [... + [x[i] yleft[i] yright[i]], + [x[i+1] yleft[i+1] yright[i+1]], + ...] +``` + +and for values passed to the colormap between ``x[i]`` and ``x[i+1]``, +the interpolation is between ``yright[i]`` and ``yleft[i+1]``. + +In the example below there is a discontinuity in red at 0.5. The +interpolation between 0 and 0.5 goes from 0.3 to 1, and between 0.5 and 1 +it goes from 0.9 to 1. Note that red[0, 1], and red[2, 2] are both +superfluous to the interpolation because red[0, 1] is the value to the +left of 0, and red[2, 2] is the value to the right of 1.0. + +``` python +cdict['red'] = [[0.0, 0.0, 0.3], + [0.5, 1.0, 0.9], + [1.0, 1.0, 1.0]] +plot_linearmap(cdict) +``` + +![sphx_glr_colormap-manipulation_006](https://matplotlib.org/_images/sphx_glr_colormap-manipulation_006.png) + +### References + +The use of the following functions, methods, classes and modules is shown +in this example: + +``` python +import matplotlib +matplotlib.axes.Axes.pcolormesh +matplotlib.figure.Figure.colorbar +matplotlib.colors +matplotlib.colors.LinearSegmentedColormap +matplotlib.colors.ListedColormap +matplotlib.cm +matplotlib.cm.get_cmap +``` + +**Total running time of the script:** ( 0 minutes 2.220 seconds) + +## Download + +- [Download Python source code: colormap-manipulation.py](https://matplotlib.org/_downloads/f55e73a6ac8441fd68270d3c6f2a7c7c/colormap-manipulation.py) +- [Download Jupyter notebook: colormap-manipulation.ipynb](https://matplotlib.org/_downloads/fd9acfdbb45f341d3bb04199f0868a38/colormap-manipulation.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/colors/colormapnorms.md b/Python/matplotlab/colors/colormapnorms.md new file mode 100644 index 00000000..eb913782 --- /dev/null +++ b/Python/matplotlab/colors/colormapnorms.md @@ -0,0 +1,281 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Colormap Normalization + +Objects that use colormaps by default linearly map the colors in the +colormap from data values *vmin* to *vmax*. For example: + +``` python +pcm = ax.pcolormesh(x, y, Z, vmin=-1., vmax=1., cmap='RdBu_r') +``` + +will map the data in *Z* linearly from -1 to +1, so *Z=0* will +give a color at the center of the colormap *RdBu_r* (white in this +case). + +Matplotlib does this mapping in two steps, with a normalization from +the input data to [0, 1] occurring first, and then mapping onto the +indices in the colormap. Normalizations are classes defined in the +[``matplotlib.colors()``](https://matplotlib.orgapi/colors_api.html#module-matplotlib.colors) module. The default, linear normalization +is [``matplotlib.colors.Normalize()``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.Normalize.html#matplotlib.colors.Normalize). + +Artists that map data to color pass the arguments *vmin* and *vmax* to +construct a [``matplotlib.colors.Normalize()``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.Normalize.html#matplotlib.colors.Normalize) instance, then call it: + +``` python +In [1]: import matplotlib as mpl + +In [2]: norm = mpl.colors.Normalize(vmin=-1.,vmax=1.) + +In [3]: norm(0.) +Out[3]: 0.5 +``` + +However, there are sometimes cases where it is useful to map data to +colormaps in a non-linear fashion. + +## Logarithmic + +One of the most common transformations is to plot data by taking its logarithm +(to the base-10). This transformation is useful to display changes across +disparate scales. Using [``colors.LogNorm``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.LogNorm.html#matplotlib.colors.LogNorm) normalizes the data via +\(log_{10}\). In the example below, there are two bumps, one much smaller +than the other. Using [``colors.LogNorm``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.LogNorm.html#matplotlib.colors.LogNorm), the shape and location of each bump +can clearly be seen: + +``` python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors +import matplotlib.cbook as cbook + +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] + +# A low hump with a spike coming out of the top right. Needs to have +# z/colour axis on a log scale so we see both hump and spike. linear +# scale only shows the spike. +Z1 = np.exp(-(X)**2 - (Y)**2) +Z2 = np.exp(-(X * 10)**2 - (Y * 10)**2) +Z = Z1 + 50 * Z2 + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolor(X, Y, Z, + norm=colors.LogNorm(vmin=Z.min(), vmax=Z.max()), + cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[0], extend='max') + +pcm = ax[1].pcolor(X, Y, Z, cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[1], extend='max') +plt.show() +``` + +![sphx_glr_colormapnorms_001](https://matplotlib.org/_images/sphx_glr_colormapnorms_001.png) + +## Symmetric logarithmic + +Similarly, it sometimes happens that there is data that is positive +and negative, but we would still like a logarithmic scaling applied to +both. In this case, the negative numbers are also scaled +logarithmically, and mapped to smaller numbers; e.g., if ``vmin=-vmax``, +then they the negative numbers are mapped from 0 to 0.5 and the +positive from 0.5 to 1. + +Since the logarithm of values close to zero tends toward infinity, a +small range around zero needs to be mapped linearly. The parameter +*linthresh* allows the user to specify the size of this range +(-*linthresh*, *linthresh*). The size of this range in the colormap is +set by *linscale*. When *linscale* == 1.0 (the default), the space used +for the positive and negative halves of the linear range will be equal +to one decade in the logarithmic range. + +``` python +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z, + norm=colors.SymLogNorm(linthresh=0.03, linscale=0.03, + vmin=-1.0, vmax=1.0), + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both') + +pcm = ax[1].pcolormesh(X, Y, Z, cmap='RdBu_r', vmin=-np.max(Z)) +fig.colorbar(pcm, ax=ax[1], extend='both') +plt.show() +``` + +![sphx_glr_colormapnorms_002](https://matplotlib.org/_images/sphx_glr_colormapnorms_002.png) + +## Power-law + +Sometimes it is useful to remap the colors onto a power-law +relationship (i.e. \(y=x^{\gamma}\), where \(\gamma\) is the +power). For this we use the ``colors.PowerNorm()``. It takes as an +argument *gamma* (*gamma* == 1.0 will just yield the default linear +normalization): + +::: tip Note + +There should probably be a good reason for plotting the data using +this type of transformation. Technical viewers are used to linear +and logarithmic axes and data transformations. Power laws are less +common, and viewers should explicitly be made aware that they have +been used. + +::: + +``` python +N = 100 +X, Y = np.mgrid[0:3:complex(0, N), 0:2:complex(0, N)] +Z1 = (1 + np.sin(Y * 10.)) * X**(2.) + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z1, norm=colors.PowerNorm(gamma=0.5), + cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[0], extend='max') + +pcm = ax[1].pcolormesh(X, Y, Z1, cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[1], extend='max') +plt.show() +``` + +![sphx_glr_colormapnorms_003](https://matplotlib.org/_images/sphx_glr_colormapnorms_003.png) + +## Discrete bounds + +Another normaization that comes with Matplotlib is +``colors.BoundaryNorm()``. In addition to *vmin* and *vmax*, this +takes as arguments boundaries between which data is to be mapped. The +colors are then linearly distributed between these "bounds". For +instance: + +``` python +In [4]: import matplotlib.colors as colors + +In [5]: bounds = np.array([-0.25, -0.125, 0, 0.5, 1]) + +In [6]: norm = colors.BoundaryNorm(boundaries=bounds, ncolors=4) + +In [7]: print(norm([-0.2,-0.15,-0.02, 0.3, 0.8, 0.99])) +[0 0 1 2 3 3] +``` + +Note unlike the other norms, this norm returns values from 0 to *ncolors*-1. + +``` python +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +fig, ax = plt.subplots(3, 1, figsize=(8, 8)) +ax = ax.flatten() +# even bounds gives a contour-like effect +bounds = np.linspace(-1, 1, 10) +norm = colors.BoundaryNorm(boundaries=bounds, ncolors=256) +pcm = ax[0].pcolormesh(X, Y, Z, + norm=norm, + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both', orientation='vertical') + +# uneven bounds changes the colormapping: +bounds = np.array([-0.25, -0.125, 0, 0.5, 1]) +norm = colors.BoundaryNorm(boundaries=bounds, ncolors=256) +pcm = ax[1].pcolormesh(X, Y, Z, norm=norm, cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[1], extend='both', orientation='vertical') + +pcm = ax[2].pcolormesh(X, Y, Z, cmap='RdBu_r', vmin=-np.max(Z)) +fig.colorbar(pcm, ax=ax[2], extend='both', orientation='vertical') +plt.show() +``` + +![sphx_glr_colormapnorms_004](https://matplotlib.org/_images/sphx_glr_colormapnorms_004.png) + +## DivergingNorm: Different mapping on either side of a center + +Sometimes we want to have a different colormap on either side of a +conceptual center point, and we want those two colormaps to have +different linear scales. An example is a topographic map where the land +and ocean have a center at zero, but land typically has a greater +elevation range than the water has depth range, and they are often +represented by a different colormap. + +``` python +filename = cbook.get_sample_data('topobathy.npz', asfileobj=False) +with np.load(filename) as dem: + topo = dem['topo'] + longitude = dem['longitude'] + latitude = dem['latitude'] + +fig, ax = plt.subplots() +# make a colormap that has land and ocean clearly delineated and of the +# same length (256 + 256) +colors_undersea = plt.cm.terrain(np.linspace(0, 0.17, 256)) +colors_land = plt.cm.terrain(np.linspace(0.25, 1, 256)) +all_colors = np.vstack((colors_undersea, colors_land)) +terrain_map = colors.LinearSegmentedColormap.from_list('terrain_map', + all_colors) + +# make the norm: Note the center is offset so that the land has more +# dynamic range: +divnorm = colors.DivergingNorm(vmin=-500., vcenter=0, vmax=4000) + +pcm = ax.pcolormesh(longitude, latitude, topo, rasterized=True, norm=divnorm, + cmap=terrain_map,) +# Simple geographic plot, set aspect ratio beecause distance between lines of +# longitude depends on latitude. +ax.set_aspect(1 / np.cos(np.deg2rad(49))) +fig.colorbar(pcm, shrink=0.6) +plt.show() +``` + +![sphx_glr_colormapnorms_005](https://matplotlib.org/_images/sphx_glr_colormapnorms_005.png) + +## Custom normalization: Manually implement two linear ranges + +The [``DivergingNorm``](https://matplotlib.orgapi/_as_gen/matplotlib.colors.DivergingNorm.html#matplotlib.colors.DivergingNorm) described above makes a useful example for +defining your own norm. + +``` python +class MidpointNormalize(colors.Normalize): + def __init__(self, vmin=None, vmax=None, vcenter=None, clip=False): + self.vcenter = vcenter + colors.Normalize.__init__(self, vmin, vmax, clip) + + def __call__(self, value, clip=None): + # I'm ignoring masked values and all kinds of edge cases to make a + # simple example... + x, y = [self.vmin, self.vcenter, self.vmax], [0, 0.5, 1] + return np.ma.masked_array(np.interp(value, x, y)) + + +fig, ax = plt.subplots() +midnorm = MidpointNormalize(vmin=-500., vcenter=0, vmax=4000) + +pcm = ax.pcolormesh(longitude, latitude, topo, rasterized=True, norm=midnorm, + cmap=terrain_map) +ax.set_aspect(1 / np.cos(np.deg2rad(49))) +fig.colorbar(pcm, shrink=0.6, extend='both') +plt.show() +``` + +![sphx_glr_colormapnorms_006](https://matplotlib.org/_images/sphx_glr_colormapnorms_006.png) + +**Total running time of the script:** ( 0 minutes 1.895 seconds) + +## Download + +- [Download Python source code: colormapnorms.py](https://matplotlib.org/_downloads/56fa91958fd427757e621c21de870bda/colormapnorms.py) +- [Download Jupyter notebook: colormapnorms.ipynb](https://matplotlib.org/_downloads/59a7c8f3db252ae16cd43fd50d6a004c/colormapnorms.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/colors/colormaps.md b/Python/matplotlab/colors/colormaps.md new file mode 100644 index 00000000..fdbe5e36 --- /dev/null +++ b/Python/matplotlab/colors/colormaps.md @@ -0,0 +1,521 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Choosing Colormaps in Matplotlib + +Matplotlib has a number of built-in colormaps accessible via +[``matplotlib.cm.get_cmap``](https://matplotlib.orgapi/cm_api.html#matplotlib.cm.get_cmap). There are also external libraries like +[[palettable]](#palettable) and [[colorcet]](#colorcet) that have many extra colormaps. +Here we briefly discuss how to choose between the many options. For +help on creating your own colormaps, see +[Creating Colormaps in Matplotlib](colormap-manipulation.html). + +## Overview + +The idea behind choosing a good colormap is to find a good representation in 3D +colorspace for your data set. The best colormap for any given data set depends +on many things including: + +- Whether representing form or metric data ([[Ware]](#ware)) +- Your knowledge of the data set (*e.g.*, is there a critical value +from which the other values deviate?) +- If there is an intuitive color scheme for the parameter you are plotting +- If there is a standard in the field the audience may be expecting + +For many applications, a perceptually uniform colormap is the best +choice --- one in which equal steps in data are perceived as equal +steps in the color space. Researchers have found that the human brain +perceives changes in the lightness parameter as changes in the data +much better than, for example, changes in hue. Therefore, colormaps +which have monotonically increasing lightness through the colormap +will be better interpreted by the viewer. A wonderful example of +perceptually uniform colormaps is [[colorcet]](#colorcet). + +Color can be represented in 3D space in various ways. One way to represent color +is using CIELAB. In CIELAB, color space is represented by lightness, +\(L^*\); red-green, \(a^*\); and yellow-blue, \(b^*\). The lightness +parameter \(L^*\) can then be used to learn more about how the matplotlib +colormaps will be perceived by viewers. + +An excellent starting resource for learning about human perception of colormaps +is from [[IBM]](#ibm). + +## Classes of colormaps + +Colormaps are often split into several categories based on their function (see, +*e.g.*, [[Moreland]](#moreland)): + +1. Sequential: change in lightness and often saturation of color +incrementally, often using a single hue; should be used for +representing information that has ordering. +1. Diverging: change in lightness and possibly saturation of two +different colors that meet in the middle at an unsaturated color; +should be used when the information being plotted has a critical +middle value, such as topography or when the data deviates around +zero. +1. Cyclic: change in lightness of two different colors that meet in +the middle and beginning/end at an unsaturated color; should be +used for values that wrap around at the endpoints, such as phase +angle, wind direction, or time of day. +1. Qualitative: often are miscellaneous colors; should be used to +represent information which does not have ordering or +relationships. + +``` python +# sphinx_gallery_thumbnail_number = 2 + +import numpy as np +import matplotlib as mpl +import matplotlib.pyplot as plt +from matplotlib import cm +from colorspacious import cspace_converter +from collections import OrderedDict + +cmaps = OrderedDict() +``` + +### Sequential + +For the Sequential plots, the lightness value increases monotonically through +the colormaps. This is good. Some of the \(L^*\) values in the colormaps +span from 0 to 100 (binary and the other grayscale), and others start around +\(L^*=20\). Those that have a smaller range of \(L^*\) will accordingly +have a smaller perceptual range. Note also that the \(L^*\) function varies +amongst the colormaps: some are approximately linear in \(L^*\) and others +are more curved. + +``` python +cmaps['Perceptually Uniform Sequential'] = [ + 'viridis', 'plasma', 'inferno', 'magma', 'cividis'] + +cmaps['Sequential'] = [ + 'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds', + 'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu', + 'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn'] +``` + +### Sequential2 + +Many of the \(L^*\) values from the Sequential2 plots are monotonically +increasing, but some (autumn, cool, spring, and winter) plateau or even go both +up and down in \(L^*\) space. Others (afmhot, copper, gist_heat, and hot) +have kinks in the \(L^*\) functions. Data that is being represented in a +region of the colormap that is at a plateau or kink will lead to a perception of +banding of the data in those values in the colormap (see [[mycarta-banding]](#mycarta-banding) for +an excellent example of this). + +``` python +cmaps['Sequential (2)'] = [ + 'binary', 'gist_yarg', 'gist_gray', 'gray', 'bone', 'pink', + 'spring', 'summer', 'autumn', 'winter', 'cool', 'Wistia', + 'hot', 'afmhot', 'gist_heat', 'copper'] +``` + +### Diverging + +For the Diverging maps, we want to have monotonically increasing \(L^*\) +values up to a maximum, which should be close to \(L^*=100\), followed by +monotonically decreasing \(L^*\) values. We are looking for approximately +equal minimum \(L^*\) values at opposite ends of the colormap. By these +measures, BrBG and RdBu are good options. coolwarm is a good option, but it +doesn't span a wide range of \(L^*\) values (see grayscale section below). + +``` python +cmaps['Diverging'] = [ + 'PiYG', 'PRGn', 'BrBG', 'PuOr', 'RdGy', 'RdBu', + 'RdYlBu', 'RdYlGn', 'Spectral', 'coolwarm', 'bwr', 'seismic'] +``` + +### Cyclic + +For Cyclic maps, we want to start and end on the same color, and meet a +symmetric center point in the middle. \(L^*\) should change monotonically +from start to middle, and inversely from middle to end. It should be symmetric +on the increasing and decreasing side, and only differ in hue. At the ends and +middle, \(L^*\) will reverse direction, which should be smoothed in +\(L^*\) space to reduce artifacts. See [[kovesi-colormaps]](#kovesi-colormaps) for more +information on the design of cyclic maps. + +The often-used HSV colormap is included in this set of colormaps, although it +is not symmetric to a center point. Additionally, the \(L^*\) values vary +widely throughout the colormap, making it a poor choice for representing data +for viewers to see perceptually. See an extension on this idea at +[[mycarta-jet]](#mycarta-jet). + +``` python +cmaps['Cyclic'] = ['twilight', 'twilight_shifted', 'hsv'] +``` + +### Qualitative + +Qualitative colormaps are not aimed at being perceptual maps, but looking at the +lightness parameter can verify that for us. The \(L^*\) values move all over +the place throughout the colormap, and are clearly not monotonically increasing. +These would not be good options for use as perceptual colormaps. + +``` python +cmaps['Qualitative'] = ['Pastel1', 'Pastel2', 'Paired', 'Accent', + 'Dark2', 'Set1', 'Set2', 'Set3', + 'tab10', 'tab20', 'tab20b', 'tab20c'] +``` + +### Miscellaneous + +Some of the miscellaneous colormaps have particular uses for which +they have been created. For example, gist_earth, ocean, and terrain +all seem to be created for plotting topography (green/brown) and water +depths (blue) together. We would expect to see a divergence in these +colormaps, then, but multiple kinks may not be ideal, such as in +gist_earth and terrain. CMRmap was created to convert well to +grayscale, though it does appear to have some small kinks in +\(L^*\). cubehelix was created to vary smoothly in both lightness +and hue, but appears to have a small hump in the green hue area. + +The often-used jet colormap is included in this set of colormaps. We can see +that the \(L^*\) values vary widely throughout the colormap, making it a +poor choice for representing data for viewers to see perceptually. See an +extension on this idea at [[mycarta-jet]](#mycarta-jet). + +``` python +cmaps['Miscellaneous'] = [ + 'flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern', + 'gnuplot', 'gnuplot2', 'CMRmap', 'cubehelix', 'brg', + 'gist_rainbow', 'rainbow', 'jet', 'nipy_spectral', 'gist_ncar'] +``` + +First, we'll show the range of each colormap. Note that some seem +to change more "quickly" than others. + +``` python +nrows = max(len(cmap_list) for cmap_category, cmap_list in cmaps.items()) +gradient = np.linspace(0, 1, 256) +gradient = np.vstack((gradient, gradient)) + + +def plot_color_gradients(cmap_category, cmap_list, nrows): + fig, axes = plt.subplots(nrows=nrows) + fig.subplots_adjust(top=0.95, bottom=0.01, left=0.2, right=0.99) + axes[0].set_title(cmap_category + ' colormaps', fontsize=14) + + for ax, name in zip(axes, cmap_list): + ax.imshow(gradient, aspect='auto', cmap=plt.get_cmap(name)) + pos = list(ax.get_position().bounds) + x_text = pos[0] - 0.01 + y_text = pos[1] + pos[3]/2. + fig.text(x_text, y_text, name, va='center', ha='right', fontsize=10) + + # Turn off *all* ticks & spines, not just the ones with colormaps. + for ax in axes: + ax.set_axis_off() + + +for cmap_category, cmap_list in cmaps.items(): + plot_color_gradients(cmap_category, cmap_list, nrows) + +plt.show() +``` + +- ![sphx_glr_colormaps_001](https://matplotlib.org/_images/sphx_glr_colormaps_001.png) +- ![sphx_glr_colormaps_002](https://matplotlib.org/_images/sphx_glr_colormaps_002.png) +- ![sphx_glr_colormaps_003](https://matplotlib.org/_images/sphx_glr_colormaps_003.png) +- ![sphx_glr_colormaps_004](https://matplotlib.org/_images/sphx_glr_colormaps_004.png) +- ![sphx_glr_colormaps_005](https://matplotlib.org/_images/sphx_glr_colormaps_005.png) +- ![sphx_glr_colormaps_006](https://matplotlib.org/_images/sphx_glr_colormaps_006.png) +- ![sphx_glr_colormaps_007](https://matplotlib.org/_images/sphx_glr_colormaps_007.png) + +## Lightness of matplotlib colormaps + +Here we examine the lightness values of the matplotlib colormaps. +Note that some documentation on the colormaps is available +([[list-colormaps]](#list-colormaps)). + +``` python +mpl.rcParams.update({'font.size': 12}) + +# Number of colormap per subplot for particular cmap categories +_DSUBS = {'Perceptually Uniform Sequential': 5, 'Sequential': 6, + 'Sequential (2)': 6, 'Diverging': 6, 'Cyclic': 3, + 'Qualitative': 4, 'Miscellaneous': 6} + +# Spacing between the colormaps of a subplot +_DC = {'Perceptually Uniform Sequential': 1.4, 'Sequential': 0.7, + 'Sequential (2)': 1.4, 'Diverging': 1.4, 'Cyclic': 1.4, + 'Qualitative': 1.4, 'Miscellaneous': 1.4} + +# Indices to step through colormap +x = np.linspace(0.0, 1.0, 100) + +# Do plot +for cmap_category, cmap_list in cmaps.items(): + + # Do subplots so that colormaps have enough space. + # Default is 6 colormaps per subplot. + dsub = _DSUBS.get(cmap_category, 6) + nsubplots = int(np.ceil(len(cmap_list) / dsub)) + + # squeeze=False to handle similarly the case of a single subplot + fig, axes = plt.subplots(nrows=nsubplots, squeeze=False, + figsize=(7, 2.6*nsubplots)) + + for i, ax in enumerate(axes.flat): + + locs = [] # locations for text labels + + for j, cmap in enumerate(cmap_list[i*dsub:(i+1)*dsub]): + + # Get RGB values for colormap and convert the colormap in + # CAM02-UCS colorspace. lab[0, :, 0] is the lightness. + rgb = cm.get_cmap(cmap)(x)[np.newaxis, :, :3] + lab = cspace_converter("sRGB1", "CAM02-UCS")(rgb) + + # Plot colormap L values. Do separately for each category + # so each plot can be pretty. To make scatter markers change + # color along plot: + # http://stackoverflow.com/questions/8202605/ + + if cmap_category == 'Sequential': + # These colormaps all start at high lightness but we want them + # reversed to look nice in the plot, so reverse the order. + y_ = lab[0, ::-1, 0] + c_ = x[::-1] + else: + y_ = lab[0, :, 0] + c_ = x + + dc = _DC.get(cmap_category, 1.4) # cmaps horizontal spacing + ax.scatter(x + j*dc, y_, c=c_, cmap=cmap, s=300, linewidths=0.0) + + # Store locations for colormap labels + if cmap_category in ('Perceptually Uniform Sequential', + 'Sequential'): + locs.append(x[-1] + j*dc) + elif cmap_category in ('Diverging', 'Qualitative', 'Cyclic', + 'Miscellaneous', 'Sequential (2)'): + locs.append(x[int(x.size/2.)] + j*dc) + + # Set up the axis limits: + # * the 1st subplot is used as a reference for the x-axis limits + # * lightness values goes from 0 to 100 (y-axis limits) + ax.set_xlim(axes[0, 0].get_xlim()) + ax.set_ylim(0.0, 100.0) + + # Set up labels for colormaps + ax.xaxis.set_ticks_position('top') + ticker = mpl.ticker.FixedLocator(locs) + ax.xaxis.set_major_locator(ticker) + formatter = mpl.ticker.FixedFormatter(cmap_list[i*dsub:(i+1)*dsub]) + ax.xaxis.set_major_formatter(formatter) + ax.xaxis.set_tick_params(rotation=50) + + ax.set_xlabel(cmap_category + ' colormaps', fontsize=14) + fig.text(0.0, 0.55, 'Lightness $L^*$', fontsize=12, + transform=fig.transFigure, rotation=90) + + fig.tight_layout(h_pad=0.0, pad=1.5) + plt.show() +``` + +- ![sphx_glr_colormaps_008](https://matplotlib.org/_images/sphx_glr_colormaps_008.png) +- ![sphx_glr_colormaps_009](https://matplotlib.org/_images/sphx_glr_colormaps_009.png) +- ![sphx_glr_colormaps_010](https://matplotlib.org/_images/sphx_glr_colormaps_010.png) +- ![sphx_glr_colormaps_011](https://matplotlib.org/_images/sphx_glr_colormaps_011.png) +- ![sphx_glr_colormaps_012](https://matplotlib.org/_images/sphx_glr_colormaps_012.png) +- ![sphx_glr_colormaps_013](https://matplotlib.org/_images/sphx_glr_colormaps_013.png) +- ![sphx_glr_colormaps_014](https://matplotlib.org/_images/sphx_glr_colormaps_014.png) + +## Grayscale conversion + +It is important to pay attention to conversion to grayscale for color +plots, since they may be printed on black and white printers. If not +carefully considered, your readers may end up with indecipherable +plots because the grayscale changes unpredictably through the +colormap. + +Conversion to grayscale is done in many different ways [[bw]](#bw). Some of the +better ones use a linear combination of the rgb values of a pixel, but +weighted according to how we perceive color intensity. A nonlinear method of +conversion to grayscale is to use the \(L^*\) values of the pixels. In +general, similar principles apply for this question as they do for presenting +one's information perceptually; that is, if a colormap is chosen that is +monotonically increasing in \(L^*\) values, it will print in a reasonable +manner to grayscale. + +With this in mind, we see that the Sequential colormaps have reasonable +representations in grayscale. Some of the Sequential2 colormaps have decent +enough grayscale representations, though some (autumn, spring, summer, +winter) have very little grayscale change. If a colormap like this was used +in a plot and then the plot was printed to grayscale, a lot of the +information may map to the same gray values. The Diverging colormaps mostly +vary from darker gray on the outer edges to white in the middle. Some +(PuOr and seismic) have noticeably darker gray on one side than the other +and therefore are not very symmetric. coolwarm has little range of gray scale +and would print to a more uniform plot, losing a lot of detail. Note that +overlaid, labeled contours could help differentiate between one side of the +colormap vs. the other since color cannot be used once a plot is printed to +grayscale. Many of the Qualitative and Miscellaneous colormaps, such as +Accent, hsv, and jet, change from darker to lighter and back to darker gray +throughout the colormap. This would make it impossible for a viewer to +interpret the information in a plot once it is printed in grayscale. + +``` python +mpl.rcParams.update({'font.size': 14}) + +# Indices to step through colormap. +x = np.linspace(0.0, 1.0, 100) + +gradient = np.linspace(0, 1, 256) +gradient = np.vstack((gradient, gradient)) + + +def plot_color_gradients(cmap_category, cmap_list): + fig, axes = plt.subplots(nrows=len(cmap_list), ncols=2) + fig.subplots_adjust(top=0.95, bottom=0.01, left=0.2, right=0.99, + wspace=0.05) + fig.suptitle(cmap_category + ' colormaps', fontsize=14, y=1.0, x=0.6) + + for ax, name in zip(axes, cmap_list): + + # Get RGB values for colormap. + rgb = cm.get_cmap(plt.get_cmap(name))(x)[np.newaxis, :, :3] + + # Get colormap in CAM02-UCS colorspace. We want the lightness. + lab = cspace_converter("sRGB1", "CAM02-UCS")(rgb) + L = lab[0, :, 0] + L = np.float32(np.vstack((L, L, L))) + + ax[0].imshow(gradient, aspect='auto', cmap=plt.get_cmap(name)) + ax[1].imshow(L, aspect='auto', cmap='binary_r', vmin=0., vmax=100.) + pos = list(ax[0].get_position().bounds) + x_text = pos[0] - 0.01 + y_text = pos[1] + pos[3]/2. + fig.text(x_text, y_text, name, va='center', ha='right', fontsize=10) + + # Turn off *all* ticks & spines, not just the ones with colormaps. + for ax in axes.flat: + ax.set_axis_off() + + plt.show() + + +for cmap_category, cmap_list in cmaps.items(): + + plot_color_gradients(cmap_category, cmap_list) +``` + +- ![sphx_glr_colormaps_015](https://matplotlib.org/_images/sphx_glr_colormaps_015.png) +- ![sphx_glr_colormaps_016](https://matplotlib.org/_images/sphx_glr_colormaps_016.png) +- ![sphx_glr_colormaps_017](https://matplotlib.org/_images/sphx_glr_colormaps_017.png) +- ![sphx_glr_colormaps_018](https://matplotlib.org/_images/sphx_glr_colormaps_018.png) +- ![sphx_glr_colormaps_019](https://matplotlib.org/_images/sphx_glr_colormaps_019.png) +- ![sphx_glr_colormaps_020](https://matplotlib.org/_images/sphx_glr_colormaps_020.png) +- ![sphx_glr_colormaps_021](https://matplotlib.org/_images/sphx_glr_colormaps_021.png) + +## Color vision deficiencies + +There is a lot of information available about color blindness (*e.g.*, +[[colorblindness]](#colorblindness)). Additionally, there are tools available to convert images +to how they look for different types of color vision deficiencies. + +The most common form of color vision deficiency involves differentiating +between red and green. Thus, avoiding colormaps with both red and green will +avoid many problems in general. + +## References + + +--- + + +[colorcet]([1](#id[2](#id4)), 2) [https://colorcet.pyviz.org](https://colorcet.pyviz.org) + + + + +--- + + +[[Ware]](#id3)[http://ccom.unh.edu/sites/default/files/publications/Ware_1988_CGA_Color_sequences_univariate_maps.pdf](http://ccom.unh.edu/sites/default/files/publications/Ware_1988_CGA_Color_sequences_univariate_maps.pdf) + + + + +--- + + +[[Moreland]](#id6)[http://www.kennethmoreland.com/color-maps/ColorMapsExpanded.pdf](http://www.kennethmoreland.com/color-maps/ColorMapsExpanded.pdf) + + + + +--- + + +[[list-colormaps]](#id11)[https://gist.github.com/endolith/2719900#id7](https://gist.github.com/endolith/2719900#id7) + + + + +--- + + +[[mycarta-banding]](#id7)[https://mycarta.wordpress.com/2012/10/14/the-rainbow-is-deadlong-live-the-rainbow-part-4-cie-lab-heated-body/](https://mycarta.wordpress.com/2012/10/14/the-rainbow-is-deadlong-live-the-rainbow-part-4-cie-lab-heated-body/) + + + + +--- + + +[mycarta-jet]([1](#id9), [2](#id10)) [https://mycarta.wordpress.com/2012/10/06/the-rainbow-is-deadlong-live-the-rainbow-part-3/](https://mycarta.wordpress.com/2012/10/06/the-rainbow-is-deadlong-live-the-rainbow-part-3/) + + + + +--- + + +[[kovesi-colormaps]](#id8)[https://arxiv.org/abs/1509.03700](https://arxiv.org/abs/1509.03700) + + + + +--- + + +[[bw]](#id12)[http://www.tannerhelland.com/3643/grayscale-image-algorithm-vb6/](http://www.tannerhelland.com/3643/grayscale-image-algorithm-vb6/) + + + + +--- + + +[[colorblindness]](#id13)[http://www.color-blindness.com/](http://www.color-blindness.com/) + + + + +--- + + +[[IBM]](#id5)[https://doi.org/10.1109/VISUAL.1995.480803](https://doi.org/10.1109/VISUAL.1995.480803) + + + + +--- + + +[[palettable]](#id1)[https://jiffyclub.github.io/palettable/](https://jiffyclub.github.io/palettable/) + + + +**Total running time of the script:** ( 0 minutes 9.320 seconds) + +## Download + +- [Download Python source code: colormaps.py](https://matplotlib.org/_downloads/9df0748eeda573fbccab51a7272f7d81/colormaps.py) +- [Download Jupyter notebook: colormaps.ipynb](https://matplotlib.org/_downloads/6024d841c77bf197ffe5612254186669/colormaps.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/colors/colors.md b/Python/matplotlab/colors/colors.md new file mode 100644 index 00000000..8ef8cff0 --- /dev/null +++ b/Python/matplotlab/colors/colors.md @@ -0,0 +1,135 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Specifying Colors + +Matplotlib recognizes the following formats to specify a color: + +- an RGB or RGBA (red, green, blue, alpha) tuple of float values in ``[0, 1]`` +(e.g., ``(0.1, 0.2, 0.5)`` or ``(0.1, 0.2, 0.5, 0.3)``); +- a hex RGB or RGBA string (e.g., ``'#0f0f0f'`` or ``'#0f0f0f80'``; +case-insensitive); +- a string representation of a float value in ``[0, 1]`` inclusive for gray +level (e.g., ``'0.5'``); +- one of ``{'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'}``; +- a X11/CSS4 color name (case-insensitive); +- a name from the [xkcd color survey](https://xkcd.com/color/rgb/), prefixed with ``'xkcd:'`` (e.g., +``'xkcd:sky blue'``; case insensitive); +- one of the Tableau Colors from the 'T10' categorical palette (the default +color cycle): ``{'tab:blue', 'tab:orange', 'tab:green', 'tab:red', +'tab:purple', 'tab:brown', 'tab:pink', 'tab:gray', 'tab:olive', 'tab:cyan'}`` +(case-insensitive); +- a "CN" color spec, i.e. ``'C'`` followed by a number, which is an index into +the default property cycle (``matplotlib.rcParams['axes.prop_cycle']``); the +indexing is intended to occur at rendering time, and defaults to black if the +cycle does not include color. + +"Red", "Green", and "Blue" are the intensities of those colors, the combination +of which span the colorspace. + +How "Alpha" behaves depends on the ``zorder`` of the Artist. Higher +``zorder`` Artists are drawn on top of lower Artists, and "Alpha" determines +whether the lower artist is covered by the higher. +If the old RGB of a pixel is ``RGBold`` and the RGB of the +pixel of the Artist being added is ``RGBnew`` with Alpha ``alpha``, +then the RGB of the pixel is updated to: +``RGB = RGBOld * (1 - Alpha) + RGBnew * Alpha``. Alpha +of 1 means the old color is completely covered by the new Artist, Alpha of 0 +means that pixel of the Artist is transparent. + +For more information on colors in matplotlib see + +- the [Color Demo](https://matplotlib.orggallery/color/color_demo.html) example; +- the [``matplotlib.colors``](https://matplotlib.orgapi/colors_api.html#module-matplotlib.colors) API; +- the [List of named colors](https://matplotlib.orggallery/color/named_colors.html) example. + +## "CN" color selection + +"CN" colors are converted to RGBA as soon as the artist is created. For +example, + +``` python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib as mpl + +th = np.linspace(0, 2*np.pi, 128) + + +def demo(sty): + mpl.style.use(sty) + fig, ax = plt.subplots(figsize=(3, 3)) + + ax.set_title('style: {!r}'.format(sty), color='C0') + + ax.plot(th, np.cos(th), 'C1', label='C1') + ax.plot(th, np.sin(th), 'C2', label='C2') + ax.legend() + +demo('default') +demo('seaborn') +``` + +- ![sphx_glr_colors_001](https://matplotlib.org/_images/sphx_glr_colors_001.png) +- ![sphx_glr_colors_002](https://matplotlib.org/_images/sphx_glr_colors_002.png) + +will use the first color for the title and then plot using the second +and third colors of each style's ``mpl.rcParams['axes.prop_cycle']``. + +## xkcd v X11/CSS4 + +The xkcd colors are derived from a user survey conducted by the +webcomic xkcd. [Details of the survey are available on the xkcd blog](https://blog.xkcd.com/2010/05/03/color-survey-results/). + +Out of 148 colors in the CSS color list, there are 95 name collisions +between the X11/CSS4 names and the xkcd names, all but 3 of which have +different hex values. For example ``'blue'`` maps to ``'#0000FF'`` +where as ``'xkcd:blue'`` maps to ``'#0343DF'``. Due to these name +collisions all of the xkcd colors have ``'xkcd:'`` prefixed. As noted in +the blog post, while it might be interesting to re-define the X11/CSS4 names +based on such a survey, we do not do so unilaterally. + +The name collisions are shown in the table below; the color names +where the hex values agree are shown in bold. + +``` python +import matplotlib._color_data as mcd +import matplotlib.patches as mpatch + +overlap = {name for name in mcd.CSS4_COLORS + if "xkcd:" + name in mcd.XKCD_COLORS} + +fig = plt.figure(figsize=[4.8, 16]) +ax = fig.add_axes([0, 0, 1, 1]) + +for j, n in enumerate(sorted(overlap, reverse=True)): + weight = None + cn = mcd.CSS4_COLORS[n] + xkcd = mcd.XKCD_COLORS["xkcd:" + n].upper() + if cn == xkcd: + weight = 'bold' + + r1 = mpatch.Rectangle((0, j), 1, 1, color=cn) + r2 = mpatch.Rectangle((1, j), 1, 1, color=xkcd) + txt = ax.text(2, j+.5, ' ' + n, va='center', fontsize=10, + weight=weight) + ax.add_patch(r1) + ax.add_patch(r2) + ax.axhline(j, color='k') + +ax.text(.5, j + 1.5, 'X11', ha='center', va='center') +ax.text(1.5, j + 1.5, 'xkcd', ha='center', va='center') +ax.set_xlim(0, 3) +ax.set_ylim(0, j + 2) +ax.axis('off') +``` + +![sphx_glr_colors_003](https://matplotlib.org/_images/sphx_glr_colors_003.png) + +## Download + +- [Download Python source code: colors.py](https://matplotlib.org/_downloads/8fb6dfde0db5f6422a7627d0d4e328b2/colors.py) +- [Download Jupyter notebook: colors.ipynb](https://matplotlib.org/_downloads/04907c28d4180c02e547778b9aaee05d/colors.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/gallery/README.md b/Python/matplotlab/gallery/README.md new file mode 100644 index 00000000..73e0cb48 --- /dev/null +++ b/Python/matplotlab/gallery/README.md @@ -0,0 +1,3911 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Gallery + +This gallery contains examples of the many things you can do with +Matplotlib. Click on any image to see the full image and source code. + +For longer tutorials, see our [tutorials page](https://pandas.pydata.org/pandas-docs/stable/tutorials/index.html). +You can also find [external resources](https://pandas.pydata.org/pandas-docs/stable/resources/index.html) and +a [FAQ](https://pandas.pydata.org/pandas-docs/stable/faq/index.html) in our [user guide](https://pandas.pydata.org/pandas-docs/stable/contents.html). + + +## Lines, bars and markers + + + +## Images, contours and fields + + + +## Subplots, axes and figures + + + +## Statistics + + + +## Pie and polar charts + + + +## Text, labels and annotations + + + +## Pyplot + + + +## Color +For more in-depth information about the colormaps available in matplotlib +as well as a description of their properties, +see the colormaps tutorial. + + +## Shapes and collections + + + +## Style sheets + + + +## Axes Grid + + + +## Axis Artist + + + +## Showcase + + + +## Animation + + + +## Event handling +Matplotlib supports event handling with a GUI +neutral event model, so you can connect to Matplotlib events without knowledge +of what user interface Matplotlib will ultimately be plugged in to. This has +two advantages: the code you write will be more portable, and Matplotlib events +are aware of things like data coordinate space and which axes the event occurs +in so you don't have to mess with low level transformation details to go from +canvas space to data space. Object picking examples are also included. + + +## Front Page + + + +## Miscellaneous + + + +## 3D plotting + + + +## Our Favorite Recipes +Here is a collection of short tutorials, examples and code snippets +that illustrate some of the useful idioms and tricks to make snazzier +figures and overcome some matplotlib warts. + + +## Scales +These examples cover how different scales are handled in Matplotlib. + + +## Specialty Plots + + + +## Ticks and spines + + + +## Units +These examples cover the many representations of units +in Matplotlib. + + +## Embedding Matplotlib in graphical user interfaces +You can embed Matplotlib directly into a user interface application by +following the embedding_in_SOMEGUI.py examples here. Currently +matplotlib supports wxpython, pygtk, tkinter and pyqt4/5. + + +## Userdemo + + + +## Widgets +Examples of how to write primitive, but GUI agnostic, widgets in +matplotlib + \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/animate_decay.md b/Python/matplotlab/gallery/animation/animate_decay.md new file mode 100644 index 00000000..d9d68fea --- /dev/null +++ b/Python/matplotlab/gallery/animation/animate_decay.md @@ -0,0 +1,59 @@ +# 衰变 + +这个例子展示了: +- 使用生成器来驱动动画, +- 在动画期间更改轴限制。 + +![衰变示例](https://matplotlib.org/_images/sphx_glr_animate_decay_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.animation as animation + + +def data_gen(t=0): + cnt = 0 + while cnt < 1000: + cnt += 1 + t += 0.1 + yield t, np.sin(2*np.pi*t) * np.exp(-t/10.) + + +def init(): + ax.set_ylim(-1.1, 1.1) + ax.set_xlim(0, 10) + del xdata[:] + del ydata[:] + line.set_data(xdata, ydata) + return line, + +fig, ax = plt.subplots() +line, = ax.plot([], [], lw=2) +ax.grid() +xdata, ydata = [], [] + + +def run(data): + # update the data + t, y = data + xdata.append(t) + ydata.append(y) + xmin, xmax = ax.get_xlim() + + if t >= xmax: + ax.set_xlim(xmin, 2*xmax) + ax.figure.canvas.draw() + line.set_data(xdata, ydata) + + return line, + +ani = animation.FuncAnimation(fig, run, data_gen, blit=False, interval=10, + repeat=False, init_func=init) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: animate_decay.py](https://matplotlib.org/_downloads/animate_decay.py) +- [下载Jupyter notebook: animate_decay.ipynb](https://matplotlib.org/_downloads/animate_decay.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/animated_histogram.md b/Python/matplotlab/gallery/animation/animated_histogram.md new file mode 100644 index 00000000..d2e87cdc --- /dev/null +++ b/Python/matplotlab/gallery/animation/animated_histogram.md @@ -0,0 +1,89 @@ +# 动画直方图 + +使用路径补丁为动画直方图绘制一堆矩形。 + +```python +import numpy as np + +import matplotlib.pyplot as plt +import matplotlib.patches as patches +import matplotlib.path as path +import matplotlib.animation as animation + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# histogram our data with numpy +data = np.random.randn(1000) +n, bins = np.histogram(data, 100) + +# get the corners of the rectangles for the histogram +left = np.array(bins[:-1]) +right = np.array(bins[1:]) +bottom = np.zeros(len(left)) +top = bottom + n +nrects = len(left) +``` + +这里有一个棘手的部分 - 我们必须为每个rect使用 ``plt.Path.MOVETO``,``plt.Path.LINETO``和``plt.Path.CLOSEPOLY``设置顶点和路径代码数组。 + +- 每个矩形我们需要1个 ``MOVETO``,它设置了初始点。 +- 我们需要3个``LINETO``,它告诉Matplotlib从顶点1到顶点2,v2到v3和v3到v4绘制线。 +- 然后我们需要一个``CLOSEPOLY``,它告诉Matplotlib从v4到我们的初始顶点(MOVETO顶点)绘制一条线,以便关闭多边形。 + +**注意:**CLOSEPOLY的顶点被忽略,但我们仍然需要在verts数组中使用占位符来保持代码与顶点对齐。 + +```python +nverts = nrects * (1 + 3 + 1) +verts = np.zeros((nverts, 2)) +codes = np.ones(nverts, int) * path.Path.LINETO +codes[0::5] = path.Path.MOVETO +codes[4::5] = path.Path.CLOSEPOLY +verts[0::5, 0] = left +verts[0::5, 1] = bottom +verts[1::5, 0] = left +verts[1::5, 1] = top +verts[2::5, 0] = right +verts[2::5, 1] = top +verts[3::5, 0] = right +verts[3::5, 1] = bottom +``` + +为了给直方图设置动画,我们需要一个动画函数,它生成一组随机数字并更新直方图顶点的位置(在这种情况下,只有每个矩形的高度)。 补丁最终将成为补丁对象。 + +```python +patch = None + + +def animate(i): + # simulate new data coming in + data = np.random.randn(1000) + n, bins = np.histogram(data, 100) + top = bottom + n + verts[1::5, 1] = top + verts[2::5, 1] = top + return [patch, ] +``` + +现在我们使用顶点和代码为直方图构建Path和Patch实例。 我们将补丁添加到Axes实例,并使用我们的animate函数设置``FuncAnimation``。 + +```python +fig, ax = plt.subplots() +barpath = path.Path(verts, codes) +patch = patches.PathPatch( + barpath, facecolor='green', edgecolor='yellow', alpha=0.5) +ax.add_patch(patch) + +ax.set_xlim(left[0], right[-1]) +ax.set_ylim(bottom.min(), top.max()) + +ani = animation.FuncAnimation(fig, animate, 100, repeat=False, blit=True) +plt.show() +``` + +![动画直方图示例](https://matplotlib.org/_images/sphx_glr_animated_histogram_001.png) + +## 下载这个示例 + +- [下载python源码: animated_histogram.py](https://matplotlib.org/_downloads/animated_histogram.py) +- [下载Jupyter notebook: animated_histogram.ipynb](https://matplotlib.org/_downloads/animated_histogram.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/animation_demo.md b/Python/matplotlab/gallery/animation/animation_demo.md new file mode 100644 index 00000000..b8e99339 --- /dev/null +++ b/Python/matplotlab/gallery/animation/animation_demo.md @@ -0,0 +1,33 @@ +# pyplot动画 + +通过调用绘图命令之间的[暂停](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pause.html#matplotlib.pyplot.pause)来生成动画。 + +此处显示的方法仅适用于简单,低性能的使用。 对于要求更高的应用程序,请查看``动画模块``和使用它的示例。 + +请注意,调用[time.sleep](https://docs.python.org/3/library/time.html#time.sleep)而不是[暂停](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pause.html#matplotlib.pyplot.pause)将不起作用。 + +![pyplot动画](https://matplotlib.org/_images/sphx_glr_animation_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +np.random.seed(19680801) +data = np.random.random((50, 50, 50)) + +fig, ax = plt.subplots() + +for i in range(len(data)): + ax.cla() + ax.imshow(data[i]) + ax.set_title("frame {}".format(i)) + # Note that using time.sleep does *not* work here! + plt.pause(0.1) +``` + +**脚本总运行时间:**(0分7.211秒) + +## 下载这个示例 + +- [下载python源码: animation_demo.py](https://matplotlib.org/_downloads/animation_demo.py) +- [下载Jupyter notebook: animation_demo.ipynb](https://matplotlib.org/_downloads/animation_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/bayes_update.md b/Python/matplotlab/gallery/animation/bayes_update.md new file mode 100644 index 00000000..f28fe8a6 --- /dev/null +++ b/Python/matplotlab/gallery/animation/bayes_update.md @@ -0,0 +1,71 @@ +# 贝叶斯更新 + +此动画显示在新数据到达时重新安装的后验估计更新。 + +垂直线表示绘制的分布应该收敛的理论值。 + +![贝叶斯更新示例](https://matplotlib.org/_images/sphx_glr_bayes_update_001.png) + +```python +import math + +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.animation import FuncAnimation + + +def beta_pdf(x, a, b): + return (x**(a-1) * (1-x)**(b-1) * math.gamma(a + b) + / (math.gamma(a) * math.gamma(b))) + + +class UpdateDist(object): + def __init__(self, ax, prob=0.5): + self.success = 0 + self.prob = prob + self.line, = ax.plot([], [], 'k-') + self.x = np.linspace(0, 1, 200) + self.ax = ax + + # Set up plot parameters + self.ax.set_xlim(0, 1) + self.ax.set_ylim(0, 15) + self.ax.grid(True) + + # This vertical line represents the theoretical value, to + # which the plotted distribution should converge. + self.ax.axvline(prob, linestyle='--', color='black') + + def init(self): + self.success = 0 + self.line.set_data([], []) + return self.line, + + def __call__(self, i): + # This way the plot can continuously run and we just keep + # watching new realizations of the process + if i == 0: + return self.init() + + # Choose success based on exceed a threshold with a uniform pick + if np.random.rand(1,) < self.prob: + self.success += 1 + y = beta_pdf(self.x, self.success + 1, (i - self.success) + 1) + self.line.set_data(self.x, y) + return self.line, + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig, ax = plt.subplots() +ud = UpdateDist(ax, prob=0.7) +anim = FuncAnimation(fig, ud, frames=np.arange(100), init_func=ud.init, + interval=100, blit=True) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bayes_update.py](https://matplotlib.org/_downloads/bayes_update.py) +- [下载Jupyter notebook: bayes_update.ipynb](https://matplotlib.org/_downloads/bayes_update.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/double_pendulum_sgskip.md b/Python/matplotlab/gallery/animation/double_pendulum_sgskip.md new file mode 100644 index 00000000..ec7aba45 --- /dev/null +++ b/Python/matplotlab/gallery/animation/double_pendulum_sgskip.md @@ -0,0 +1,99 @@ +# 双摆问题 + +这个动画说明了双摆问题。 + +双摆公式从 http://www.physics.usyd.edu.au/~wheat/dpend_html/solve_dpend.c 的C代码翻译而来。 + +```python +from numpy import sin, cos +import numpy as np +import matplotlib.pyplot as plt +import scipy.integrate as integrate +import matplotlib.animation as animation + +G = 9.8 # acceleration due to gravity, in m/s^2 +L1 = 1.0 # length of pendulum 1 in m +L2 = 1.0 # length of pendulum 2 in m +M1 = 1.0 # mass of pendulum 1 in kg +M2 = 1.0 # mass of pendulum 2 in kg + + +def derivs(state, t): + + dydx = np.zeros_like(state) + dydx[0] = state[1] + + del_ = state[2] - state[0] + den1 = (M1 + M2)*L1 - M2*L1*cos(del_)*cos(del_) + dydx[1] = (M2*L1*state[1]*state[1]*sin(del_)*cos(del_) + + M2*G*sin(state[2])*cos(del_) + + M2*L2*state[3]*state[3]*sin(del_) - + (M1 + M2)*G*sin(state[0]))/den1 + + dydx[2] = state[3] + + den2 = (L2/L1)*den1 + dydx[3] = (-M2*L2*state[3]*state[3]*sin(del_)*cos(del_) + + (M1 + M2)*G*sin(state[0])*cos(del_) - + (M1 + M2)*L1*state[1]*state[1]*sin(del_) - + (M1 + M2)*G*sin(state[2]))/den2 + + return dydx + +# create a time array from 0..100 sampled at 0.05 second steps +dt = 0.05 +t = np.arange(0.0, 20, dt) + +# th1 and th2 are the initial angles (degrees) +# w10 and w20 are the initial angular velocities (degrees per second) +th1 = 120.0 +w1 = 0.0 +th2 = -10.0 +w2 = 0.0 + +# initial state +state = np.radians([th1, w1, th2, w2]) + +# integrate your ODE using scipy.integrate. +y = integrate.odeint(derivs, state, t) + +x1 = L1*sin(y[:, 0]) +y1 = -L1*cos(y[:, 0]) + +x2 = L2*sin(y[:, 2]) + x1 +y2 = -L2*cos(y[:, 2]) + y1 + +fig = plt.figure() +ax = fig.add_subplot(111, autoscale_on=False, xlim=(-2, 2), ylim=(-2, 2)) +ax.set_aspect('equal') +ax.grid() + +line, = ax.plot([], [], 'o-', lw=2) +time_template = 'time = %.1fs' +time_text = ax.text(0.05, 0.9, '', transform=ax.transAxes) + + +def init(): + line.set_data([], []) + time_text.set_text('') + return line, time_text + + +def animate(i): + thisx = [0, x1[i], x2[i]] + thisy = [0, y1[i], y2[i]] + + line.set_data(thisx, thisy) + time_text.set_text(time_template % (i*dt)) + return line, time_text + +ani = animation.FuncAnimation(fig, animate, np.arange(1, len(y)), + interval=25, blit=True, init_func=init) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: double_pendulum_sgskip.py](https://matplotlib.org/_downloads/double_pendulum_sgskip.py) +- [下载Jupyter notebook: double_pendulum_sgskip.ipynb](https://matplotlib.org/_downloads/double_pendulum_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/dynamic_image.md b/Python/matplotlab/gallery/animation/dynamic_image.md new file mode 100644 index 00000000..8b918aed --- /dev/null +++ b/Python/matplotlab/gallery/animation/dynamic_image.md @@ -0,0 +1,47 @@ +# 使用预先计算的图像列表的动画图像 + +![使用预先计算的图像列表的动画图像示例](https://matplotlib.org/_images/sphx_glr_dynamic_image_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.animation as animation + +fig = plt.figure() + + +def f(x, y): + return np.sin(x) + np.cos(y) + +x = np.linspace(0, 2 * np.pi, 120) +y = np.linspace(0, 2 * np.pi, 100).reshape(-1, 1) +# ims is a list of lists, each row is a list of artists to draw in the +# current frame; here we are just animating one artist, the image, in +# each frame +ims = [] +for i in range(60): + x += np.pi / 15. + y += np.pi / 20. + im = plt.imshow(f(x, y), animated=True) + ims.append([im]) + +ani = animation.ArtistAnimation(fig, ims, interval=50, blit=True, + repeat_delay=1000) + +# To save the animation, use e.g. +# +# ani.save("movie.mp4") +# +# or +# +# from matplotlib.animation import FFMpegWriter +# writer = FFMpegWriter(fps=15, metadata=dict(artist='Me'), bitrate=1800) +# ani.save("movie.mp4", writer=writer) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: dynamic_image.py](https://matplotlib.org/_downloads/dynamic_image.py) +- [下载Jupyter notebook: dynamic_image.ipynb](https://matplotlib.org/_downloads/dynamic_image.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/frame_grabbing_sgskip.md b/Python/matplotlab/gallery/animation/frame_grabbing_sgskip.md new file mode 100644 index 00000000..460647fe --- /dev/null +++ b/Python/matplotlab/gallery/animation/frame_grabbing_sgskip.md @@ -0,0 +1,39 @@ +# 帧抓取 + +直接使用MovieWriter抓取单个帧并将其写入文件。 这避免了任何事件循环集成,因此甚至可以与Agg后端一起使用。 建议不要在交互式设置中使用。 + +```python +import numpy as np +import matplotlib +matplotlib.use("Agg") +import matplotlib.pyplot as plt +from matplotlib.animation import FFMpegWriter + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +metadata = dict(title='Movie Test', artist='Matplotlib', + comment='Movie support!') +writer = FFMpegWriter(fps=15, metadata=metadata) + +fig = plt.figure() +l, = plt.plot([], [], 'k-o') + +plt.xlim(-5, 5) +plt.ylim(-5, 5) + +x0, y0 = 0, 0 + +with writer.saving(fig, "writer_test.mp4", 100): + for i in range(100): + x0 += 0.1 * np.random.randn() + y0 += 0.1 * np.random.randn() + l.set_data(x0, y0) + writer.grab_frame() +``` + +## 下载这个示例 + +- [下载python源码: frame_grabbing_sgskip.py](https://matplotlib.org/_downloads/frame_grabbing_sgskip.py) +- [下载Jupyter notebook: frame_grabbing_sgskip.ipynb](https://matplotlib.org/_downloads/frame_grabbing_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/rain.md b/Python/matplotlab/gallery/animation/rain.md new file mode 100644 index 00000000..8d1339cd --- /dev/null +++ b/Python/matplotlab/gallery/animation/rain.md @@ -0,0 +1,75 @@ +# 雨模拟 + +通过设置50个散点的比例和不透明度来模拟表面上的雨滴。 + +作者:Nicolas P. Rougier + +![雨模拟示例](https://matplotlib.org/_images/sphx_glr_rain_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.animation import FuncAnimation + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# Create new Figure and an Axes which fills it. +fig = plt.figure(figsize=(7, 7)) +ax = fig.add_axes([0, 0, 1, 1], frameon=False) +ax.set_xlim(0, 1), ax.set_xticks([]) +ax.set_ylim(0, 1), ax.set_yticks([]) + +# Create rain data +n_drops = 50 +rain_drops = np.zeros(n_drops, dtype=[('position', float, 2), + ('size', float, 1), + ('growth', float, 1), + ('color', float, 4)]) + +# Initialize the raindrops in random positions and with +# random growth rates. +rain_drops['position'] = np.random.uniform(0, 1, (n_drops, 2)) +rain_drops['growth'] = np.random.uniform(50, 200, n_drops) + +# Construct the scatter which we will update during animation +# as the raindrops develop. +scat = ax.scatter(rain_drops['position'][:, 0], rain_drops['position'][:, 1], + s=rain_drops['size'], lw=0.5, edgecolors=rain_drops['color'], + facecolors='none') + + +def update(frame_number): + # Get an index which we can use to re-spawn the oldest raindrop. + current_index = frame_number % n_drops + + # Make all colors more transparent as time progresses. + rain_drops['color'][:, 3] -= 1.0/len(rain_drops) + rain_drops['color'][:, 3] = np.clip(rain_drops['color'][:, 3], 0, 1) + + # Make all circles bigger. + rain_drops['size'] += rain_drops['growth'] + + # Pick a new position for oldest rain drop, resetting its size, + # color and growth factor. + rain_drops['position'][current_index] = np.random.uniform(0, 1, 2) + rain_drops['size'][current_index] = 5 + rain_drops['color'][current_index] = (0, 0, 0, 1) + rain_drops['growth'][current_index] = np.random.uniform(50, 200) + + # Update the scatter collection, with the new colors, sizes and positions. + scat.set_edgecolors(rain_drops['color']) + scat.set_sizes(rain_drops['size']) + scat.set_offsets(rain_drops['position']) + + +# Construct the animation, using the update function as the animation director. +animation = FuncAnimation(fig, update, interval=10) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: rain.py](https://matplotlib.org/_downloads/rain.py) +- [下载Jupyter notebook: rain.ipynb](https://matplotlib.org/_downloads/rain.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/random_walk.md b/Python/matplotlab/gallery/animation/random_walk.md new file mode 100644 index 00000000..3503478d --- /dev/null +++ b/Python/matplotlab/gallery/animation/random_walk.md @@ -0,0 +1,75 @@ +# 动画3D随机游走 + +![动画3D随机游走](https://matplotlib.org/_images/sphx_glr_random_walk_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import mpl_toolkits.mplot3d.axes3d as p3 +import matplotlib.animation as animation + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +def Gen_RandLine(length, dims=2): + """ + Create a line using a random walk algorithm + + length is the number of points for the line. + dims is the number of dimensions the line has. + """ + lineData = np.empty((dims, length)) + lineData[:, 0] = np.random.rand(dims) + for index in range(1, length): + # scaling the random numbers by 0.1 so + # movement is small compared to position. + # subtraction by 0.5 is to change the range to [-0.5, 0.5] + # to allow a line to move backwards. + step = ((np.random.rand(dims) - 0.5) * 0.1) + lineData[:, index] = lineData[:, index - 1] + step + + return lineData + + +def update_lines(num, dataLines, lines): + for line, data in zip(lines, dataLines): + # NOTE: there is no .set_data() for 3 dim data... + line.set_data(data[0:2, :num]) + line.set_3d_properties(data[2, :num]) + return lines + +# Attaching 3D axis to the figure +fig = plt.figure() +ax = p3.Axes3D(fig) + +# Fifty lines of random 3-D lines +data = [Gen_RandLine(25, 3) for index in range(50)] + +# Creating fifty line objects. +# NOTE: Can't pass empty arrays into 3d version of plot() +lines = [ax.plot(dat[0, 0:1], dat[1, 0:1], dat[2, 0:1])[0] for dat in data] + +# Setting the axes properties +ax.set_xlim3d([0.0, 1.0]) +ax.set_xlabel('X') + +ax.set_ylim3d([0.0, 1.0]) +ax.set_ylabel('Y') + +ax.set_zlim3d([0.0, 1.0]) +ax.set_zlabel('Z') + +ax.set_title('3D Test') + +# Creating the Animation object +line_ani = animation.FuncAnimation(fig, update_lines, 25, fargs=(data, lines), + interval=50, blit=False) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: random_walk.py](https://matplotlib.org/_downloads/random_walk.py) +- [下载Jupyter notebook: random_walk.ipynb](https://matplotlib.org/_downloads/random_walk.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/simple_anim.md b/Python/matplotlab/gallery/animation/simple_anim.md new file mode 100644 index 00000000..14af9022 --- /dev/null +++ b/Python/matplotlab/gallery/animation/simple_anim.md @@ -0,0 +1,45 @@ +# 动画线图 + +![动画线图](https://matplotlib.org/_images/sphx_glr_simple_anim_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.animation as animation + +fig, ax = plt.subplots() + +x = np.arange(0, 2*np.pi, 0.01) +line, = ax.plot(x, np.sin(x)) + + +def init(): # only required for blitting to give a clean slate. + line.set_ydata([np.nan] * len(x)) + return line, + + +def animate(i): + line.set_ydata(np.sin(x + i / 100)) # update the data. + return line, + + +ani = animation.FuncAnimation( + fig, animate, init_func=init, interval=2, blit=True, save_count=50) + +# To save the animation, use e.g. +# +# ani.save("movie.mp4") +# +# or +# +# from matplotlib.animation import FFMpegWriter +# writer = FFMpegWriter(fps=15, metadata=dict(artist='Me'), bitrate=1800) +# ani.save("movie.mp4", writer=writer) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_anim.py](https://matplotlib.org/_downloads/simple_anim.py) +- [下载Jupyter notebook: simple_anim.ipynb](https://matplotlib.org/_downloads/simple_anim.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/strip_chart.md b/Python/matplotlab/gallery/animation/strip_chart.md new file mode 100644 index 00000000..655a450c --- /dev/null +++ b/Python/matplotlab/gallery/animation/strip_chart.md @@ -0,0 +1,67 @@ +# 示波器 + +模拟示波器。 + +![示波器示例](https://matplotlib.org/_images/sphx_glr_strip_chart_001.png) + +```python +import numpy as np +from matplotlib.lines import Line2D +import matplotlib.pyplot as plt +import matplotlib.animation as animation + + +class Scope(object): + def __init__(self, ax, maxt=2, dt=0.02): + self.ax = ax + self.dt = dt + self.maxt = maxt + self.tdata = [0] + self.ydata = [0] + self.line = Line2D(self.tdata, self.ydata) + self.ax.add_line(self.line) + self.ax.set_ylim(-.1, 1.1) + self.ax.set_xlim(0, self.maxt) + + def update(self, y): + lastt = self.tdata[-1] + if lastt > self.tdata[0] + self.maxt: # reset the arrays + self.tdata = [self.tdata[-1]] + self.ydata = [self.ydata[-1]] + self.ax.set_xlim(self.tdata[0], self.tdata[0] + self.maxt) + self.ax.figure.canvas.draw() + + t = self.tdata[-1] + self.dt + self.tdata.append(t) + self.ydata.append(y) + self.line.set_data(self.tdata, self.ydata) + return self.line, + + +def emitter(p=0.03): + 'return a random value with probability p, else 0' + while True: + v = np.random.rand(1) + if v > p: + yield 0. + else: + yield np.random.rand(1) + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig, ax = plt.subplots() +scope = Scope(ax) + +# pass a generator in "emitter" to produce data for the update func +ani = animation.FuncAnimation(fig, scope.update, emitter, interval=10, + blit=True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: strip_chart.py](https://matplotlib.org/_downloads/strip_chart.py) +- [下载Jupyter notebook: strip_chart.ipynb](https://matplotlib.org/_downloads/strip_chart.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/animation/unchained.md b/Python/matplotlab/gallery/animation/unchained.md new file mode 100644 index 00000000..2566c81d --- /dev/null +++ b/Python/matplotlab/gallery/animation/unchained.md @@ -0,0 +1,77 @@ +# MATPLOTLIB UNCHAINED + +脉冲星的假信号频率的比较路径演示(主要是因为Joy Division的未知乐趣的封面而闻名)。 + +作者:Nicolas P. Rougier + +![MATPLOTLIB UNCHAINED示例](https://matplotlib.org/_images/sphx_glr_unchained_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.animation as animation + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# Create new Figure with black background +fig = plt.figure(figsize=(8, 8), facecolor='black') + +# Add a subplot with no frame +ax = plt.subplot(111, frameon=False) + +# Generate random data +data = np.random.uniform(0, 1, (64, 75)) +X = np.linspace(-1, 1, data.shape[-1]) +G = 1.5 * np.exp(-4 * X ** 2) + +# Generate line plots +lines = [] +for i in range(len(data)): + # Small reduction of the X extents to get a cheap perspective effect + xscale = 1 - i / 200. + # Same for linewidth (thicker strokes on bottom) + lw = 1.5 - i / 100.0 + line, = ax.plot(xscale * X, i + G * data[i], color="w", lw=lw) + lines.append(line) + +# Set y limit (or first line is cropped because of thickness) +ax.set_ylim(-1, 70) + +# No ticks +ax.set_xticks([]) +ax.set_yticks([]) + +# 2 part titles to get different font weights +ax.text(0.5, 1.0, "MATPLOTLIB ", transform=ax.transAxes, + ha="right", va="bottom", color="w", + family="sans-serif", fontweight="light", fontsize=16) +ax.text(0.5, 1.0, "UNCHAINED", transform=ax.transAxes, + ha="left", va="bottom", color="w", + family="sans-serif", fontweight="bold", fontsize=16) + + +def update(*args): + # Shift all data to the right + data[:, 1:] = data[:, :-1] + + # Fill-in new values + data[:, 0] = np.random.uniform(0, 1, len(data)) + + # Update data + for i in range(len(data)): + lines[i].set_ydata(i + G * data[i]) + + # Return modified artists + return lines + +# Construct the animation, using the update function as the animation director. +anim = animation.FuncAnimation(fig, update, interval=10) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: unchained.py](https://matplotlib.org/_downloads/unchained.py) +- [下载Jupyter notebook: unchained.ipynb](https://matplotlib.org/_downloads/unchained.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_anchored_direction_arrows.md b/Python/matplotlab/gallery/axes_grid1/demo_anchored_direction_arrows.md new file mode 100644 index 00000000..7f1c464f --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_anchored_direction_arrows.md @@ -0,0 +1,83 @@ +# 演示锚定方向箭头 + +![演示锚定方向箭头示例](https://matplotlib.org/_images/sphx_glr_demo_anchored_direction_arrows_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np +from mpl_toolkits.axes_grid1.anchored_artists import AnchoredDirectionArrows +import matplotlib.font_manager as fm + +fig, ax = plt.subplots() +ax.imshow(np.random.random((10, 10))) + +# Simple example +simple_arrow = AnchoredDirectionArrows(ax.transAxes, 'X', 'Y') +ax.add_artist(simple_arrow) + +# High contrast arrow +high_contrast_part_1 = AnchoredDirectionArrows( + ax.transAxes, + '111', r'11$\overline{2}$', + loc='upper right', + arrow_props={'ec': 'w', 'fc': 'none', 'alpha': 1, + 'lw': 2} + ) +ax.add_artist(high_contrast_part_1) + +high_contrast_part_2 = AnchoredDirectionArrows( + ax.transAxes, + '111', r'11$\overline{2}$', + loc='upper right', + arrow_props={'ec': 'none', 'fc': 'k'}, + text_props={'ec': 'w', 'fc': 'k', 'lw': 0.4} + ) +ax.add_artist(high_contrast_part_2) + +# Rotated arrow +fontprops = fm.FontProperties(family='serif') + +roatated_arrow = AnchoredDirectionArrows( + ax.transAxes, + '30', '120', + loc='center', + color='w', + angle=30, + fontproperties=fontprops + ) +ax.add_artist(roatated_arrow) + +# Altering arrow directions +a1 = AnchoredDirectionArrows( + ax.transAxes, 'A', 'B', loc='lower center', + length=-0.15, + sep_x=0.03, sep_y=0.03, + color='r' + ) +ax.add_artist(a1) + +a2 = AnchoredDirectionArrows( + ax.transAxes, 'A', ' B', loc='lower left', + aspect_ratio=-1, + sep_x=0.01, sep_y=-0.02, + color='orange' + ) +ax.add_artist(a2) + + +a3 = AnchoredDirectionArrows( + ax.transAxes, ' A', 'B', loc='lower right', + length=-0.15, + aspect_ratio=-1, + sep_y=-0.1, sep_x=0.04, + color='cyan' + ) +ax.add_artist(a3) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_anchored_direction_arrows.py](https://matplotlib.org/_downloads/demo_anchored_direction_arrows.py) +- [下载Jupyter notebook: demo_anchored_direction_arrows.ipynb](https://matplotlib.org/_downloads/demo_anchored_direction_arrows.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_axes_divider.md b/Python/matplotlab/gallery/axes_grid1/demo_axes_divider.md new file mode 100644 index 00000000..78ed2be8 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_axes_divider.md @@ -0,0 +1,137 @@ +# 演示Axes Divider + +轴分割器用于计算轴的位置,并使用现有轴实例为它们创建分隔线。 + +![演示Axes Divider示例](https://matplotlib.org/_images/sphx_glr_demo_axes_divider_001.png) + +```python +import matplotlib.pyplot as plt + + +def get_demo_image(): + import numpy as np + from matplotlib.cbook import get_sample_data + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + + +def demo_simple_image(ax): + Z, extent = get_demo_image() + + im = ax.imshow(Z, extent=extent, interpolation="nearest") + cb = plt.colorbar(im) + plt.setp(cb.ax.get_yticklabels(), visible=False) + + +def demo_locatable_axes_hard(fig1): + + from mpl_toolkits.axes_grid1 import SubplotDivider, Size + from mpl_toolkits.axes_grid1.mpl_axes import Axes + + divider = SubplotDivider(fig1, 2, 2, 2, aspect=True) + + # axes for image + ax = Axes(fig1, divider.get_position()) + + # axes for colorbar + ax_cb = Axes(fig1, divider.get_position()) + + h = [Size.AxesX(ax), # main axes + Size.Fixed(0.05), # padding, 0.1 inch + Size.Fixed(0.2), # colorbar, 0.3 inch + ] + + v = [Size.AxesY(ax)] + + divider.set_horizontal(h) + divider.set_vertical(v) + + ax.set_axes_locator(divider.new_locator(nx=0, ny=0)) + ax_cb.set_axes_locator(divider.new_locator(nx=2, ny=0)) + + fig1.add_axes(ax) + fig1.add_axes(ax_cb) + + ax_cb.axis["left"].toggle(all=False) + ax_cb.axis["right"].toggle(ticks=True) + + Z, extent = get_demo_image() + + im = ax.imshow(Z, extent=extent, interpolation="nearest") + plt.colorbar(im, cax=ax_cb) + plt.setp(ax_cb.get_yticklabels(), visible=False) + + +def demo_locatable_axes_easy(ax): + from mpl_toolkits.axes_grid1 import make_axes_locatable + + divider = make_axes_locatable(ax) + + ax_cb = divider.new_horizontal(size="5%", pad=0.05) + fig1 = ax.get_figure() + fig1.add_axes(ax_cb) + + Z, extent = get_demo_image() + im = ax.imshow(Z, extent=extent, interpolation="nearest") + + plt.colorbar(im, cax=ax_cb) + ax_cb.yaxis.tick_right() + ax_cb.yaxis.set_tick_params(labelright=False) + + +def demo_images_side_by_side(ax): + from mpl_toolkits.axes_grid1 import make_axes_locatable + + divider = make_axes_locatable(ax) + + Z, extent = get_demo_image() + ax2 = divider.new_horizontal(size="100%", pad=0.05) + fig1 = ax.get_figure() + fig1.add_axes(ax2) + + ax.imshow(Z, extent=extent, interpolation="nearest") + ax2.imshow(Z, extent=extent, interpolation="nearest") + ax2.yaxis.set_tick_params(labelleft=False) + + +def demo(): + + fig1 = plt.figure(1, (6, 6)) + fig1.clf() + + # PLOT 1 + # simple image & colorbar + ax = fig1.add_subplot(2, 2, 1) + demo_simple_image(ax) + + # PLOT 2 + # image and colorbar whose location is adjusted in the drawing time. + # a hard way + + demo_locatable_axes_hard(fig1) + + # PLOT 3 + # image and colorbar whose location is adjusted in the drawing time. + # a easy way + + ax = fig1.add_subplot(2, 2, 3) + demo_locatable_axes_easy(ax) + + # PLOT 4 + # two images side by side with fixed padding. + + ax = fig1.add_subplot(2, 2, 4) + demo_images_side_by_side(ax) + + plt.show() + + +demo() +``` + +## 下载这个示例 + +- [下载python源码: demo_axes_divider.py](https://matplotlib.org/_downloads/demo_axes_divider.py) +- [下载Jupyter notebook: demo_axes_divider.ipynb](https://matplotlib.org/_downloads/demo_axes_divider.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_axes_grid.md b/Python/matplotlab/gallery/axes_grid1/demo_axes_grid.md new file mode 100644 index 00000000..7cbc9af7 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_axes_grid.md @@ -0,0 +1,144 @@ +# 演示Axes Grid + +具有单个或自己的彩条的2x2图像的网格。 + +![演示Axes Grid](https://matplotlib.org/_images/sphx_glr_demo_axes_grid_0011.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import ImageGrid + + +def get_demo_image(): + import numpy as np + from matplotlib.cbook import get_sample_data + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + + +def demo_simple_grid(fig): + """ + A grid of 2x2 images with 0.05 inch pad between images and only + the lower-left axes is labeled. + """ + grid = ImageGrid(fig, 141, # similar to subplot(141) + nrows_ncols=(2, 2), + axes_pad=0.05, + label_mode="1", + ) + + Z, extent = get_demo_image() + for i in range(4): + im = grid[i].imshow(Z, extent=extent, interpolation="nearest") + + # This only affects axes in first column and second row as share_all = + # False. + grid.axes_llc.set_xticks([-2, 0, 2]) + grid.axes_llc.set_yticks([-2, 0, 2]) + + +def demo_grid_with_single_cbar(fig): + """ + A grid of 2x2 images with a single colorbar + """ + grid = ImageGrid(fig, 142, # similar to subplot(142) + nrows_ncols=(2, 2), + axes_pad=0.0, + share_all=True, + label_mode="L", + cbar_location="top", + cbar_mode="single", + ) + + Z, extent = get_demo_image() + for i in range(4): + im = grid[i].imshow(Z, extent=extent, interpolation="nearest") + grid.cbar_axes[0].colorbar(im) + + for cax in grid.cbar_axes: + cax.toggle_label(False) + + # This affects all axes as share_all = True. + grid.axes_llc.set_xticks([-2, 0, 2]) + grid.axes_llc.set_yticks([-2, 0, 2]) + + +def demo_grid_with_each_cbar(fig): + """ + A grid of 2x2 images. Each image has its own colorbar. + """ + + grid = ImageGrid(fig, 143, # similar to subplot(143) + nrows_ncols=(2, 2), + axes_pad=0.1, + label_mode="1", + share_all=True, + cbar_location="top", + cbar_mode="each", + cbar_size="7%", + cbar_pad="2%", + ) + Z, extent = get_demo_image() + for i in range(4): + im = grid[i].imshow(Z, extent=extent, interpolation="nearest") + grid.cbar_axes[i].colorbar(im) + + for cax in grid.cbar_axes: + cax.toggle_label(False) + + # This affects all axes because we set share_all = True. + grid.axes_llc.set_xticks([-2, 0, 2]) + grid.axes_llc.set_yticks([-2, 0, 2]) + + +def demo_grid_with_each_cbar_labelled(fig): + """ + A grid of 2x2 images. Each image has its own colorbar. + """ + + grid = ImageGrid(fig, 144, # similar to subplot(144) + nrows_ncols=(2, 2), + axes_pad=(0.45, 0.15), + label_mode="1", + share_all=True, + cbar_location="right", + cbar_mode="each", + cbar_size="7%", + cbar_pad="2%", + ) + Z, extent = get_demo_image() + + # Use a different colorbar range every time + limits = ((0, 1), (-2, 2), (-1.7, 1.4), (-1.5, 1)) + for i in range(4): + im = grid[i].imshow(Z, extent=extent, interpolation="nearest", + vmin=limits[i][0], vmax=limits[i][1]) + grid.cbar_axes[i].colorbar(im) + + for i, cax in enumerate(grid.cbar_axes): + cax.set_yticks((limits[i][0], limits[i][1])) + + # This affects all axes because we set share_all = True. + grid.axes_llc.set_xticks([-2, 0, 2]) + grid.axes_llc.set_yticks([-2, 0, 2]) + + +if 1: + F = plt.figure(1, (10.5, 2.5)) + + F.subplots_adjust(left=0.05, right=0.95) + + demo_simple_grid(F) + demo_grid_with_single_cbar(F) + demo_grid_with_each_cbar(F) + demo_grid_with_each_cbar_labelled(F) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_axes_grid.py](https://matplotlib.org/_downloads/demo_axes_grid.py) +- [下载Jupyter notebook: demo_axes_grid.ipynb](https://matplotlib.org/_downloads/demo_axes_grid.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_axes_grid2.md b/Python/matplotlab/gallery/axes_grid1/demo_axes_grid2.md new file mode 100644 index 00000000..cad4b573 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_axes_grid2.md @@ -0,0 +1,125 @@ +# 演示Axes Grid2 + +共享xaxis和yaxis的图像网格。 + +![演示Axes Grid2](https://matplotlib.org/_images/sphx_glr_demo_axes_grid2_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import ImageGrid +import numpy as np + + +def get_demo_image(): + from matplotlib.cbook import get_sample_data + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + + +def add_inner_title(ax, title, loc, size=None, **kwargs): + from matplotlib.offsetbox import AnchoredText + from matplotlib.patheffects import withStroke + if size is None: + size = dict(size=plt.rcParams['legend.fontsize']) + at = AnchoredText(title, loc=loc, prop=size, + pad=0., borderpad=0.5, + frameon=False, **kwargs) + ax.add_artist(at) + at.txt._text.set_path_effects([withStroke(foreground="w", linewidth=3)]) + return at + +if 1: + F = plt.figure(1, (6, 6)) + F.clf() + + # prepare images + Z, extent = get_demo_image() + ZS = [Z[i::3, :] for i in range(3)] + extent = extent[0], extent[1]/3., extent[2], extent[3] + + # demo 1 : colorbar at each axes + + grid = ImageGrid(F, 211, # similar to subplot(111) + nrows_ncols=(1, 3), + direction="row", + axes_pad=0.05, + add_all=True, + label_mode="1", + share_all=True, + cbar_location="top", + cbar_mode="each", + cbar_size="7%", + cbar_pad="1%", + ) + + for ax, z in zip(grid, ZS): + im = ax.imshow( + z, origin="lower", extent=extent, interpolation="nearest") + ax.cax.colorbar(im) + + for ax, im_title in zip(grid, ["Image 1", "Image 2", "Image 3"]): + t = add_inner_title(ax, im_title, loc='lower left') + t.patch.set_alpha(0.5) + + for ax, z in zip(grid, ZS): + ax.cax.toggle_label(True) + #axis = ax.cax.axis[ax.cax.orientation] + #axis.label.set_text("counts s$^{-1}$") + #axis.label.set_size(10) + #axis.major_ticklabels.set_size(6) + + # changing the colorbar ticks + grid[1].cax.set_xticks([-1, 0, 1]) + grid[2].cax.set_xticks([-1, 0, 1]) + + grid[0].set_xticks([-2, 0]) + grid[0].set_yticks([-2, 0, 2]) + + # demo 2 : shared colorbar + + grid2 = ImageGrid(F, 212, + nrows_ncols=(1, 3), + direction="row", + axes_pad=0.05, + add_all=True, + label_mode="1", + share_all=True, + cbar_location="right", + cbar_mode="single", + cbar_size="10%", + cbar_pad=0.05, + ) + + grid2[0].set_xlabel("X") + grid2[0].set_ylabel("Y") + + vmax, vmin = np.max(ZS), np.min(ZS) + import matplotlib.colors + norm = matplotlib.colors.Normalize(vmax=vmax, vmin=vmin) + + for ax, z in zip(grid2, ZS): + im = ax.imshow(z, norm=norm, + origin="lower", extent=extent, + interpolation="nearest") + + # With cbar_mode="single", cax attribute of all axes are identical. + ax.cax.colorbar(im) + ax.cax.toggle_label(True) + + for ax, im_title in zip(grid2, ["(a)", "(b)", "(c)"]): + t = add_inner_title(ax, im_title, loc='upper left') + t.patch.set_ec("none") + t.patch.set_alpha(0.5) + + grid2[0].set_xticks([-2, 0]) + grid2[0].set_yticks([-2, 0, 2]) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_axes_grid2.py](https://matplotlib.org/_downloads/demo_axes_grid2.py) +- [下载Jupyter notebook: demo_axes_grid2.ipynb](https://matplotlib.org/_downloads/demo_axes_grid2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_axes_hbox_divider.md b/Python/matplotlab/gallery/axes_grid1/demo_axes_hbox_divider.md new file mode 100644 index 00000000..510a2bcc --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_axes_hbox_divider.md @@ -0,0 +1,62 @@ +# 演示轴 Hbox Divider + +HBox Divider用于排列子图。 + +![演示轴 Hbox Divider](https://matplotlib.org/_images/sphx_glr_demo_axes_hbox_divider_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1.axes_divider import HBoxDivider +import mpl_toolkits.axes_grid1.axes_size as Size + + +def make_heights_equal(fig, rect, ax1, ax2, pad): + # pad in inches + + h1, v1 = Size.AxesX(ax1), Size.AxesY(ax1) + h2, v2 = Size.AxesX(ax2), Size.AxesY(ax2) + + pad_v = Size.Scaled(1) + pad_h = Size.Fixed(pad) + + my_divider = HBoxDivider(fig, rect, + horizontal=[h1, pad_h, h2], + vertical=[v1, pad_v, v2]) + + ax1.set_axes_locator(my_divider.new_locator(0)) + ax2.set_axes_locator(my_divider.new_locator(2)) + + +if __name__ == "__main__": + + arr1 = np.arange(20).reshape((4, 5)) + arr2 = np.arange(20).reshape((5, 4)) + + fig, (ax1, ax2) = plt.subplots(1, 2) + ax1.imshow(arr1, interpolation="nearest") + ax2.imshow(arr2, interpolation="nearest") + + rect = 111 # subplot param for combined axes + make_heights_equal(fig, rect, ax1, ax2, pad=0.5) # pad in inches + + for ax in [ax1, ax2]: + ax.locator_params(nbins=4) + + # annotate + ax3 = plt.axes([0.5, 0.5, 0.001, 0.001], frameon=False) + ax3.xaxis.set_visible(False) + ax3.yaxis.set_visible(False) + ax3.annotate("Location of two axes are adjusted\n" + "so that they have equal heights\n" + "while maintaining their aspect ratios", (0.5, 0.5), + xycoords="axes fraction", va="center", ha="center", + bbox=dict(boxstyle="round, pad=1", fc="w")) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_axes_hbox_divider.py](https://matplotlib.org/_downloads/demo_axes_hbox_divider.py) +- [下载Jupyter notebook: demo_axes_hbox_divider.ipynb](https://matplotlib.org/_downloads/demo_axes_hbox_divider.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_axes_rgb.md b/Python/matplotlab/gallery/axes_grid1/demo_axes_rgb.md new file mode 100644 index 00000000..b8301bf8 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_axes_rgb.md @@ -0,0 +1,99 @@ +# 演示轴 RGB + +RGBAxes显示RGB合成图像。 + +![演示轴 RGB](https://matplotlib.org/_images/sphx_glr_demo_axes_rgb_001.png) + +![演示轴 RGB2](https://matplotlib.org/_images/sphx_glr_demo_axes_rgb_002.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +from mpl_toolkits.axes_grid1.axes_rgb import make_rgb_axes, RGBAxes + + +def get_demo_image(): + from matplotlib.cbook import get_sample_data + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + + +def get_rgb(): + Z, extent = get_demo_image() + + Z[Z < 0] = 0. + Z = Z/Z.max() + + R = Z[:13, :13] + G = Z[2:, 2:] + B = Z[:13, 2:] + + return R, G, B + + +def make_cube(r, g, b): + ny, nx = r.shape + R = np.zeros([ny, nx, 3], dtype="d") + R[:, :, 0] = r + G = np.zeros_like(R) + G[:, :, 1] = g + B = np.zeros_like(R) + B[:, :, 2] = b + + RGB = R + G + B + + return R, G, B, RGB + + +def demo_rgb(): + fig, ax = plt.subplots() + ax_r, ax_g, ax_b = make_rgb_axes(ax, pad=0.02) + #fig.add_axes(ax_r) + #fig.add_axes(ax_g) + #fig.add_axes(ax_b) + + r, g, b = get_rgb() + im_r, im_g, im_b, im_rgb = make_cube(r, g, b) + kwargs = dict(origin="lower", interpolation="nearest") + ax.imshow(im_rgb, **kwargs) + ax_r.imshow(im_r, **kwargs) + ax_g.imshow(im_g, **kwargs) + ax_b.imshow(im_b, **kwargs) + + +def demo_rgb2(): + fig = plt.figure(2) + ax = RGBAxes(fig, [0.1, 0.1, 0.8, 0.8], pad=0.0) + #fig.add_axes(ax) + #ax.add_RGB_to_figure() + + r, g, b = get_rgb() + kwargs = dict(origin="lower", interpolation="nearest") + ax.imshow_rgb(r, g, b, **kwargs) + + ax.RGB.set_xlim(0., 9.5) + ax.RGB.set_ylim(0.9, 10.6) + + for ax1 in [ax.RGB, ax.R, ax.G, ax.B]: + for sp1 in ax1.spines.values(): + sp1.set_color("w") + for tick in ax1.xaxis.get_major_ticks() + ax1.yaxis.get_major_ticks(): + tick.tick1line.set_mec("w") + tick.tick2line.set_mec("w") + + return ax + + +demo_rgb() +ax = demo_rgb2() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_axes_rgb.py](https://matplotlib.org/_downloads/demo_axes_rgb.py) +- [下载Jupyter notebook: demo_axes_rgb.ipynb](https://matplotlib.org/_downloads/demo_axes_rgb.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_colorbar_of_inset_axes.md b/Python/matplotlab/gallery/axes_grid1/demo_colorbar_of_inset_axes.md new file mode 100644 index 00000000..f7257e18 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_colorbar_of_inset_axes.md @@ -0,0 +1,57 @@ +# 演示嵌入轴颜色条 + +![演示嵌入轴颜色条](https://matplotlib.org/_images/sphx_glr_demo_colorbar_of_inset_axes_001.png) + +```python +import matplotlib.pyplot as plt + +from mpl_toolkits.axes_grid1.inset_locator import inset_axes, zoomed_inset_axes +from mpl_toolkits.axes_grid1.colorbar import colorbar + + +def get_demo_image(): + from matplotlib.cbook import get_sample_data + import numpy as np + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + + +fig, ax = plt.subplots(figsize=[5, 4]) + +Z, extent = get_demo_image() + +ax.set(aspect=1, + xlim=(-15, 15), + ylim=(-20, 5)) + + +axins = zoomed_inset_axes(ax, zoom=2, loc='upper left') +im = axins.imshow(Z, extent=extent, interpolation="nearest", + origin="lower") + +plt.xticks(visible=False) +plt.yticks(visible=False) + + +# colorbar +cax = inset_axes(axins, + width="5%", # width = 10% of parent_bbox width + height="100%", # height : 50% + loc='lower left', + bbox_to_anchor=(1.05, 0., 1, 1), + bbox_transform=axins.transAxes, + borderpad=0, + ) + +colorbar(im, cax=cax) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_colorbar_of_inset_axes.py](https://matplotlib.org/_downloads/demo_colorbar_of_inset_axes.py) +- [下载Jupyter notebook: demo_colorbar_of_inset_axes.ipynb](https://matplotlib.org/_downloads/demo_colorbar_of_inset_axes.ipynb) + diff --git a/Python/matplotlab/gallery/axes_grid1/demo_colorbar_with_axes_divider.md b/Python/matplotlab/gallery/axes_grid1/demo_colorbar_with_axes_divider.md new file mode 100644 index 00000000..95a03c4a --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_colorbar_with_axes_divider.md @@ -0,0 +1,30 @@ +# 演示带轴分割器的颜色条 + +![演示带轴分割器的颜色条](https://matplotlib.org/_images/sphx_glr_demo_colorbar_with_axes_divider_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1.axes_divider import make_axes_locatable +from mpl_toolkits.axes_grid1.colorbar import colorbar + +fig, (ax1, ax2) = plt.subplots(1, 2) +fig.subplots_adjust(wspace=0.5) + +im1 = ax1.imshow([[1, 2], [3, 4]]) +ax1_divider = make_axes_locatable(ax1) +cax1 = ax1_divider.append_axes("right", size="7%", pad="2%") +cb1 = colorbar(im1, cax=cax1) + +im2 = ax2.imshow([[1, 2], [3, 4]]) +ax2_divider = make_axes_locatable(ax2) +cax2 = ax2_divider.append_axes("top", size="7%", pad="2%") +cb2 = colorbar(im2, cax=cax2, orientation="horizontal") +cax2.xaxis.set_ticks_position("top") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_colorbar_with_axes_divider.py](https://matplotlib.org/_downloads/demo_colorbar_with_axes_divider.py) +- [下载Jupyter notebook: demo_colorbar_with_axes_divider.ipynb](https://matplotlib.org/_downloads/demo_colorbar_with_axes_divider.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_colorbar_with_inset_locator.md b/Python/matplotlab/gallery/axes_grid1/demo_colorbar_with_inset_locator.md new file mode 100644 index 00000000..0e7aac84 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_colorbar_with_inset_locator.md @@ -0,0 +1,43 @@ +# 使用插入定位器演示Colorbar + +![使用插入定位器演示Colorbar](https://matplotlib.org/_images/sphx_glr_demo_colorbar_with_inset_locator_001.png) + +```python +import matplotlib.pyplot as plt + +from mpl_toolkits.axes_grid1.inset_locator import inset_axes + +fig, (ax1, ax2) = plt.subplots(1, 2, figsize=[6, 3]) + +axins1 = inset_axes(ax1, + width="50%", # width = 10% of parent_bbox width + height="5%", # height : 50% + loc='upper right') + +im1 = ax1.imshow([[1, 2], [2, 3]]) +plt.colorbar(im1, cax=axins1, orientation="horizontal", ticks=[1, 2, 3]) +axins1.xaxis.set_ticks_position("bottom") + +axins = inset_axes(ax2, + width="5%", # width = 10% of parent_bbox width + height="50%", # height : 50% + loc='lower left', + bbox_to_anchor=(1.05, 0., 1, 1), + bbox_transform=ax2.transAxes, + borderpad=0, + ) + +# Controlling the placement of the inset axes is basically same as that +# of the legend. you may want to play with the borderpad value and +# the bbox_to_anchor coordinate. + +im = ax2.imshow([[1, 2], [2, 3]]) +plt.colorbar(im, cax=axins, ticks=[1, 2, 3]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_colorbar_with_inset_locator.py](https://matplotlib.org/_downloads/demo_colorbar_with_inset_locator.py) +- [下载Jupyter notebook: demo_colorbar_with_inset_locator.ipynb](https://matplotlib.org/_downloads/demo_colorbar_with_inset_locator.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_edge_colorbar.md b/Python/matplotlab/gallery/axes_grid1/demo_edge_colorbar.md new file mode 100644 index 00000000..967cbed0 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_edge_colorbar.md @@ -0,0 +1,98 @@ +# 演示Edge Colorbar + +![演示Edge Colorbar](https://matplotlib.org/_images/sphx_glr_demo_edge_colorbar_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import AxesGrid + + +def get_demo_image(): + import numpy as np + from matplotlib.cbook import get_sample_data + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + + +def demo_bottom_cbar(fig): + """ + A grid of 2x2 images with a colorbar for each column. + """ + grid = AxesGrid(fig, 121, # similar to subplot(132) + nrows_ncols=(2, 2), + axes_pad=0.10, + share_all=True, + label_mode="1", + cbar_location="bottom", + cbar_mode="edge", + cbar_pad=0.25, + cbar_size="15%", + direction="column" + ) + + Z, extent = get_demo_image() + cmaps = [plt.get_cmap("autumn"), plt.get_cmap("summer")] + for i in range(4): + im = grid[i].imshow(Z, extent=extent, interpolation="nearest", + cmap=cmaps[i//2]) + if i % 2: + cbar = grid.cbar_axes[i//2].colorbar(im) + + for cax in grid.cbar_axes: + cax.toggle_label(True) + cax.axis[cax.orientation].set_label("Bar") + + # This affects all axes as share_all = True. + grid.axes_llc.set_xticks([-2, 0, 2]) + grid.axes_llc.set_yticks([-2, 0, 2]) + + +def demo_right_cbar(fig): + """ + A grid of 2x2 images. Each row has its own colorbar. + """ + + grid = AxesGrid(F, 122, # similar to subplot(122) + nrows_ncols=(2, 2), + axes_pad=0.10, + label_mode="1", + share_all=True, + cbar_location="right", + cbar_mode="edge", + cbar_size="7%", + cbar_pad="2%", + ) + Z, extent = get_demo_image() + cmaps = [plt.get_cmap("spring"), plt.get_cmap("winter")] + for i in range(4): + im = grid[i].imshow(Z, extent=extent, interpolation="nearest", + cmap=cmaps[i//2]) + if i % 2: + grid.cbar_axes[i//2].colorbar(im) + + for cax in grid.cbar_axes: + cax.toggle_label(True) + cax.axis[cax.orientation].set_label('Foo') + + # This affects all axes because we set share_all = True. + grid.axes_llc.set_xticks([-2, 0, 2]) + grid.axes_llc.set_yticks([-2, 0, 2]) + + +if 1: + F = plt.figure(1, (5.5, 2.5)) + + F.subplots_adjust(left=0.05, right=0.93) + + demo_bottom_cbar(F) + demo_right_cbar(F) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_edge_colorbar.py](https://matplotlib.org/_downloads/demo_edge_colorbar.py) +- [下载Jupyter notebook: demo_edge_colorbar.ipynb](https://matplotlib.org/_downloads/demo_edge_colorbar.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_fixed_size_axes.md b/Python/matplotlab/gallery/axes_grid1/demo_fixed_size_axes.md new file mode 100644 index 00000000..e0edcf91 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_fixed_size_axes.md @@ -0,0 +1,62 @@ +# 演示固定尺寸轴 + +![演示固定尺寸轴](https://matplotlib.org/_images/sphx_glr_demo_fixed_size_axes_001.png) + +![演示固定尺寸轴2](https://matplotlib.org/_images/sphx_glr_demo_fixed_size_axes_002.png) + +```python +import matplotlib.pyplot as plt + +from mpl_toolkits.axes_grid1 import Divider, Size +from mpl_toolkits.axes_grid1.mpl_axes import Axes + + +def demo_fixed_size_axes(): + fig1 = plt.figure(1, (6, 6)) + + # The first items are for padding and the second items are for the axes. + # sizes are in inch. + h = [Size.Fixed(1.0), Size.Fixed(4.5)] + v = [Size.Fixed(0.7), Size.Fixed(5.)] + + divider = Divider(fig1, (0.0, 0.0, 1., 1.), h, v, aspect=False) + # the width and height of the rectangle is ignored. + + ax = Axes(fig1, divider.get_position()) + ax.set_axes_locator(divider.new_locator(nx=1, ny=1)) + + fig1.add_axes(ax) + + ax.plot([1, 2, 3]) + + +def demo_fixed_pad_axes(): + fig = plt.figure(2, (6, 6)) + + # The first & third items are for padding and the second items are for the + # axes. Sizes are in inches. + h = [Size.Fixed(1.0), Size.Scaled(1.), Size.Fixed(.2)] + v = [Size.Fixed(0.7), Size.Scaled(1.), Size.Fixed(.5)] + + divider = Divider(fig, (0.0, 0.0, 1., 1.), h, v, aspect=False) + # the width and height of the rectangle is ignored. + + ax = Axes(fig, divider.get_position()) + ax.set_axes_locator(divider.new_locator(nx=1, ny=1)) + + fig.add_axes(ax) + + ax.plot([1, 2, 3]) + + +if __name__ == "__main__": + demo_fixed_size_axes() + demo_fixed_pad_axes() + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_fixed_size_axes.py](https://matplotlib.org/_downloads/demo_fixed_size_axes.py) +- [下载Jupyter notebook: demo_fixed_size_axes.ipynb](https://matplotlib.org/_downloads/demo_fixed_size_axes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/demo_imagegrid_aspect.md b/Python/matplotlab/gallery/axes_grid1/demo_imagegrid_aspect.md new file mode 100644 index 00000000..11c94f87 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/demo_imagegrid_aspect.md @@ -0,0 +1,31 @@ +# 演示Imagegrid Aspect + +![演示Imagegrid Aspect](https://matplotlib.org/_images/sphx_glr_demo_imagegrid_aspect_001.png) + +```python +import matplotlib.pyplot as plt + +from mpl_toolkits.axes_grid1 import ImageGrid +fig = plt.figure(1) + +grid1 = ImageGrid(fig, 121, (2, 2), axes_pad=0.1, + aspect=True, share_all=True) + +for i in [0, 1]: + grid1[i].set_aspect(2) + + +grid2 = ImageGrid(fig, 122, (2, 2), axes_pad=0.1, + aspect=True, share_all=True) + + +for i in [1, 3]: + grid2[i].set_aspect(2) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_imagegrid_aspect.py](https://matplotlib.org/_downloads/demo_imagegrid_aspect.py) +- [下载Jupyter notebook: demo_imagegrid_aspect.ipynb](https://matplotlib.org/_downloads/demo_imagegrid_aspect.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/inset_locator_demo.md b/Python/matplotlab/gallery/axes_grid1/inset_locator_demo.md new file mode 100644 index 00000000..72c5da8c --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/inset_locator_demo.md @@ -0,0 +1,137 @@ +# 插入定位器演示 + +[inset_locator](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.inset_locator.html#module-mpl_toolkits.axes_grid1.inset_locator) 的 [inset_axes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.inset_locator.inset_axes.html#mpl_toolkits.axes_grid1.inset_locator.inset_axes) 允许通过指定宽度和高度以及可选地将位置(Loc)接受位置作为代码(类似于[图例](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend)),轻松地在轴的角中放置插入。默认情况下,通过“边界板”(BorderPad)参数控制的内嵌与轴之间的某些点偏移。 + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1.inset_locator import inset_axes + + +fig, (ax, ax2) = plt.subplots(1, 2, figsize=[5.5, 2.8]) + +# Create inset of width 1.3 inches and height 0.9 inches +# at the default upper right location +axins = inset_axes(ax, width=1.3, height=0.9) + +# Create inset of width 30% and height 40% of the parent axes' bounding box +# at the lower left corner (loc=3) +axins2 = inset_axes(ax, width="30%", height="40%", loc=3) + +# Create inset of mixed specifications in the second subplot; +# width is 30% of parent axes' bounding box and +# height is 1 inch at the upper left corner (loc=2) +axins3 = inset_axes(ax2, width="30%", height=1., loc=2) + +# Create an inset in the lower right corner (loc=4) with borderpad=1, i.e. +# 10 points padding (as 10pt is the default fontsize) to the parent axes +axins4 = inset_axes(ax2, width="20%", height="20%", loc=4, borderpad=1) + +# Turn ticklabels of insets off +for axi in [axins, axins2, axins3, axins4]: + axi.tick_params(labelleft=False, labelbottom=False) + +plt.show() +``` + +![插入定位器演示](https://matplotlib.org/_images/sphx_glr_inset_locator_demo_001.png) + +参数``bbox_to_anchor`` 和 ``bbox_transfrom`` 可用于对插入位置和大小进行更细粒度的控制,甚至可以将插入位置置于完全任意位置。``bbox_to_anchor`` 根据 ``bbox_transform`` 设置坐标中的边界框。 + +```python +fig = plt.figure(figsize=[5.5, 2.8]) +ax = fig.add_subplot(121) + +# We use the axes transform as bbox_transform. Therefore the bounding box +# needs to be specified in axes coordinates ((0,0) is the lower left corner +# of the axes, (1,1) is the upper right corner). +# The bounding box (.2, .4, .6, .5) starts at (.2,.4) and ranges to (.8,.9) +# in those coordinates. +# Inside of this bounding box an inset of half the bounding box' width and +# three quarters of the bounding box' height is created. The lower left corner +# of the inset is aligned to the lower left corner of the bounding box (loc=3). +# The inset is then offset by the default 0.5 in units of the font size. + +axins = inset_axes(ax, width="50%", height="75%", + bbox_to_anchor=(.2, .4, .6, .5), + bbox_transform=ax.transAxes, loc=3) + +# For visualization purposes we mark the bounding box by a rectangle +ax.add_patch(plt.Rectangle((.2, .4), .6, .5, ls="--", ec="c", fc="None", + transform=ax.transAxes)) + +# We set the axis limits to something other than the default, in order to not +# distract from the fact that axes coodinates are used here. +ax.axis([0, 10, 0, 10]) + + +# Note how the two following insets are created at the same positions, one by +# use of the default parent axes' bbox and the other via a bbox in axes +# coordinates and the respective transform. +ax2 = fig.add_subplot(222) +axins2 = inset_axes(ax2, width="30%", height="50%") + +ax3 = fig.add_subplot(224) +axins3 = inset_axes(ax3, width="100%", height="100%", + bbox_to_anchor=(.7, .5, .3, .5), + bbox_transform=ax3.transAxes) + +# For visualization purposes we mark the bounding box by a rectangle +ax2.add_patch(plt.Rectangle((0, 0), 1, 1, ls="--", lw=2, ec="c", fc="None")) +ax3.add_patch(plt.Rectangle((.7, .5), .3, .5, ls="--", lw=2, + ec="c", fc="None")) + +# Turn ticklabels off +for axi in [axins2, axins3, ax2, ax3]: + axi.tick_params(labelleft=False, labelbottom=False) + +plt.show() +``` + +![插入定位器演示2](https://matplotlib.org/_images/sphx_glr_inset_locator_demo_002.png) + +在上述方法中,使用了轴变换和4元组边界框,因为它主要用于指定相对于其所插入的轴的插入值。但是,其他用例也是可能的。下面的示例检查其中一些。 + +```python +fig = plt.figure(figsize=[5.5, 2.8]) +ax = fig.add_subplot(131) + +# Create an inset outside the axes +axins = inset_axes(ax, width="100%", height="100%", + bbox_to_anchor=(1.05, .6, .5, .4), + bbox_transform=ax.transAxes, loc=2, borderpad=0) +axins.tick_params(left=False, right=True, labelleft=False, labelright=True) + +# Create an inset with a 2-tuple bounding box. Note that this creates a +# bbox without extent. This hence only makes sense when specifying +# width and height in absolute units (inches). +axins2 = inset_axes(ax, width=0.5, height=0.4, + bbox_to_anchor=(0.33, 0.25), + bbox_transform=ax.transAxes, loc=3, borderpad=0) + + +ax2 = fig.add_subplot(133) +ax2.set_xscale("log") +ax2.axis([1e-6, 1e6, -2, 6]) + +# Create inset in data coordinates using ax.transData as transform +axins3 = inset_axes(ax2, width="100%", height="100%", + bbox_to_anchor=(1e-2, 2, 1e3, 3), + bbox_transform=ax2.transData, loc=2, borderpad=0) + +# Create an inset horizontally centered in figure coordinates and vertically +# bound to line up with the axes. +from matplotlib.transforms import blended_transform_factory +transform = blended_transform_factory(fig.transFigure, ax2.transAxes) +axins4 = inset_axes(ax2, width="16%", height="34%", + bbox_to_anchor=(0, 0, 1, 1), + bbox_transform=transform, loc=8, borderpad=0) + +plt.show() +``` + +![插入定位器演示3](https://matplotlib.org/_images/sphx_glr_inset_locator_demo_003.png) + +## 下载这个示例 + +- [下载python源码: inset_locator_demo.py](https://matplotlib.org/_downloads/inset_locator_demo.py) +- [下载Jupyter notebook: inset_locator_demo.ipynb](https://matplotlib.org/_downloads/inset_locator_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/inset_locator_demo2.md b/Python/matplotlab/gallery/axes_grid1/inset_locator_demo2.md new file mode 100644 index 00000000..ae213c38 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/inset_locator_demo2.md @@ -0,0 +1,89 @@ +# 插入定位器演示2 + +本演示展示了如何通过 [zoomed_inset_axes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.inset_locator.zoomed_inset_axes.html#mpl_toolkits.axes_grid1.inset_locator.zoomed_inset_axes) 创建缩放插图。 在第一个子图中,[AnchoredSizeBar](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.anchored_artists.AnchoredSizeBar.html#mpl_toolkits.axes_grid1.anchored_artists.AnchoredSizeBar) 显示缩放效果。 在第二个子图中,通过[mark_inset](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.inset_locator.mark_inset.html#mpl_toolkits.axes_grid1.inset_locator.mark_inset) 创建与感兴趣区域的连接。 + +![插入定位器演示2](https://matplotlib.org/_images/sphx_glr_inset_locator_demo2_001.png) + +```python +import matplotlib.pyplot as plt + +from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes, mark_inset +from mpl_toolkits.axes_grid1.anchored_artists import AnchoredSizeBar + +import numpy as np + + +def get_demo_image(): + from matplotlib.cbook import get_sample_data + import numpy as np + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + +fig, (ax, ax2) = plt.subplots(ncols=2, figsize=[6, 3]) + + +# First subplot, showing an inset with a size bar. +ax.set_aspect(1) + +axins = zoomed_inset_axes(ax, zoom=0.5, loc='upper right') +# fix the number of ticks on the inset axes +axins.yaxis.get_major_locator().set_params(nbins=7) +axins.xaxis.get_major_locator().set_params(nbins=7) + +plt.setp(axins.get_xticklabels(), visible=False) +plt.setp(axins.get_yticklabels(), visible=False) + + +def add_sizebar(ax, size): + asb = AnchoredSizeBar(ax.transData, + size, + str(size), + loc=8, + pad=0.1, borderpad=0.5, sep=5, + frameon=False) + ax.add_artist(asb) + +add_sizebar(ax, 0.5) +add_sizebar(axins, 0.5) + + +# Second subplot, showing an image with an inset zoom +# and a marked inset +Z, extent = get_demo_image() +Z2 = np.zeros([150, 150], dtype="d") +ny, nx = Z.shape +Z2[30:30 + ny, 30:30 + nx] = Z + +# extent = [-3, 4, -4, 3] +ax2.imshow(Z2, extent=extent, interpolation="nearest", + origin="lower") + + +axins2 = zoomed_inset_axes(ax2, 6, loc=1) # zoom = 6 +axins2.imshow(Z2, extent=extent, interpolation="nearest", + origin="lower") + +# sub region of the original image +x1, x2, y1, y2 = -1.5, -0.9, -2.5, -1.9 +axins2.set_xlim(x1, x2) +axins2.set_ylim(y1, y2) +# fix the number of ticks on the inset axes +axins2.yaxis.get_major_locator().set_params(nbins=7) +axins2.xaxis.get_major_locator().set_params(nbins=7) + +plt.setp(axins2.get_xticklabels(), visible=False) +plt.setp(axins2.get_yticklabels(), visible=False) + +# draw a bbox of the region of the inset axes in the parent axes and +# connecting lines between the bbox and the inset axes area +mark_inset(ax2, axins2, loc1=2, loc2=4, fc="none", ec="0.5") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: inset_locator_demo2.py](https://matplotlib.org/_downloads/inset_locator_demo2.py) +- [下载Jupyter notebook: inset_locator_demo2.ipynb](https://matplotlib.org/_downloads/inset_locator_demo2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/make_room_for_ylabel_using_axesgrid.md b/Python/matplotlab/gallery/axes_grid1/make_room_for_ylabel_using_axesgrid.md new file mode 100644 index 00000000..eddbe6a2 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/make_room_for_ylabel_using_axesgrid.md @@ -0,0 +1,76 @@ +# 使用Axesgrid为Ylabel腾出空间 + +![使用Axesgrid为Ylabel腾出空间](https://matplotlib.org/_images/sphx_glr_make_room_for_ylabel_using_axesgrid_001.png) + +![使用Axesgrid为Ylabel腾出空间示例1](https://matplotlib.org/_images/sphx_glr_make_room_for_ylabel_using_axesgrid_001.png) + +![使用Axesgrid为Ylabel腾出空间示例2](https://matplotlib.org/_images/sphx_glr_make_room_for_ylabel_using_axesgrid_001.png) + +```python +from mpl_toolkits.axes_grid1 import make_axes_locatable +from mpl_toolkits.axes_grid1.axes_divider import make_axes_area_auto_adjustable + + +if __name__ == "__main__": + + import matplotlib.pyplot as plt + + def ex1(): + plt.figure(1) + ax = plt.axes([0, 0, 1, 1]) + #ax = plt.subplot(111) + + ax.set_yticks([0.5]) + ax.set_yticklabels(["very long label"]) + + make_axes_area_auto_adjustable(ax) + + def ex2(): + + plt.figure(2) + ax1 = plt.axes([0, 0, 1, 0.5]) + ax2 = plt.axes([0, 0.5, 1, 0.5]) + + ax1.set_yticks([0.5]) + ax1.set_yticklabels(["very long label"]) + ax1.set_ylabel("Y label") + + ax2.set_title("Title") + + make_axes_area_auto_adjustable(ax1, pad=0.1, use_axes=[ax1, ax2]) + make_axes_area_auto_adjustable(ax2, pad=0.1, use_axes=[ax1, ax2]) + + def ex3(): + + fig = plt.figure(3) + ax1 = plt.axes([0, 0, 1, 1]) + divider = make_axes_locatable(ax1) + + ax2 = divider.new_horizontal("100%", pad=0.3, sharey=ax1) + ax2.tick_params(labelleft=False) + fig.add_axes(ax2) + + divider.add_auto_adjustable_area(use_axes=[ax1], pad=0.1, + adjust_dirs=["left"]) + divider.add_auto_adjustable_area(use_axes=[ax2], pad=0.1, + adjust_dirs=["right"]) + divider.add_auto_adjustable_area(use_axes=[ax1, ax2], pad=0.1, + adjust_dirs=["top", "bottom"]) + + ax1.set_yticks([0.5]) + ax1.set_yticklabels(["very long label"]) + + ax2.set_title("Title") + ax2.set_xlabel("X - Label") + + ex1() + ex2() + ex3() + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: make_room_for_ylabel_using_axesgrid.py](https://matplotlib.org/_downloads/make_room_for_ylabel_using_axesgrid.py) +- [下载Jupyter notebook: make_room_for_ylabel_using_axesgrid.ipynb](https://matplotlib.org/_downloads/make_room_for_ylabel_using_axesgrid.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/parasite_simple.md b/Python/matplotlab/gallery/axes_grid1/parasite_simple.md new file mode 100644 index 00000000..d4341f57 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/parasite_simple.md @@ -0,0 +1,34 @@ +# 简单寄生示例 + +![简单寄生示例](https://matplotlib.org/_images/sphx_glr_parasite_simple_001.png) + +```python +from mpl_toolkits.axes_grid1 import host_subplot +import matplotlib.pyplot as plt + +host = host_subplot(111) + +par = host.twinx() + +host.set_xlabel("Distance") +host.set_ylabel("Density") +par.set_ylabel("Temperature") + +p1, = host.plot([0, 1, 2], [0, 1, 2], label="Density") +p2, = par.plot([0, 1, 2], [0, 3, 2], label="Temperature") + +leg = plt.legend() + +host.yaxis.get_label().set_color(p1.get_color()) +leg.texts[0].set_color(p1.get_color()) + +par.yaxis.get_label().set_color(p2.get_color()) +leg.texts[1].set_color(p2.get_color()) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: parasite_simple.py](https://matplotlib.org/_downloads/parasite_simple.py) +- [下载Jupyter notebook: parasite_simple.ipynb](https://matplotlib.org/_downloads/parasite_simple.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/parasite_simple2.md b/Python/matplotlab/gallery/axes_grid1/parasite_simple2.md new file mode 100644 index 00000000..f9d86f38 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/parasite_simple2.md @@ -0,0 +1,52 @@ +# 简单寄生示例2 + +![简单寄生示例2](https://matplotlib.org/_images/sphx_glr_parasite_simple2_001.png) + +```python +import matplotlib.transforms as mtransforms +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1.parasite_axes import SubplotHost + +obs = [["01_S1", 3.88, 0.14, 1970, 63], + ["01_S4", 5.6, 0.82, 1622, 150], + ["02_S1", 2.4, 0.54, 1570, 40], + ["03_S1", 4.1, 0.62, 2380, 170]] + + +fig = plt.figure() + +ax_kms = SubplotHost(fig, 1, 1, 1, aspect=1.) + +# angular proper motion("/yr) to linear velocity(km/s) at distance=2.3kpc +pm_to_kms = 1./206265.*2300*3.085e18/3.15e7/1.e5 + +aux_trans = mtransforms.Affine2D().scale(pm_to_kms, 1.) +ax_pm = ax_kms.twin(aux_trans) +ax_pm.set_viewlim_mode("transform") + +fig.add_subplot(ax_kms) + +for n, ds, dse, w, we in obs: + time = ((2007 + (10. + 4/30.)/12) - 1988.5) + v = ds / time * pm_to_kms + ve = dse / time * pm_to_kms + ax_kms.errorbar([v], [w], xerr=[ve], yerr=[we], color="k") + + +ax_kms.axis["bottom"].set_label("Linear velocity at 2.3 kpc [km/s]") +ax_kms.axis["left"].set_label("FWHM [km/s]") +ax_pm.axis["top"].set_label(r"Proper Motion [$''$/yr]") +ax_pm.axis["top"].label.set_visible(True) +ax_pm.axis["right"].major_ticklabels.set_visible(False) + +ax_kms.set_xlim(950, 3700) +ax_kms.set_ylim(950, 3100) +# xlim and ylim of ax_pms will be automatically adjusted. + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: parasite_simple2.py](https://matplotlib.org/_downloads/parasite_simple2.py) +- [下载Jupyter notebook: parasite_simple2.ipynb](https://matplotlib.org/_downloads/parasite_simple2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/scatter_hist_locatable_axes.md b/Python/matplotlab/gallery/axes_grid1/scatter_hist_locatable_axes.md new file mode 100644 index 00000000..2b71e2aa --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/scatter_hist_locatable_axes.md @@ -0,0 +1,59 @@ +# 散点图 + +![散点图示例](https://matplotlib.org/_images/sphx_glr_scatter_hist_locatable_axes_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import make_axes_locatable + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# the random data +x = np.random.randn(1000) +y = np.random.randn(1000) + + +fig, axScatter = plt.subplots(figsize=(5.5, 5.5)) + +# the scatter plot: +axScatter.scatter(x, y) +axScatter.set_aspect(1.) + +# create new axes on the right and on the top of the current axes +# The first argument of the new_vertical(new_horizontal) method is +# the height (width) of the axes to be created in inches. +divider = make_axes_locatable(axScatter) +axHistx = divider.append_axes("top", 1.2, pad=0.1, sharex=axScatter) +axHisty = divider.append_axes("right", 1.2, pad=0.1, sharey=axScatter) + +# make some labels invisible +axHistx.xaxis.set_tick_params(labelbottom=False) +axHisty.yaxis.set_tick_params(labelleft=False) + +# now determine nice limits by hand: +binwidth = 0.25 +xymax = max(np.max(np.abs(x)), np.max(np.abs(y))) +lim = (int(xymax/binwidth) + 1)*binwidth + +bins = np.arange(-lim, lim + binwidth, binwidth) +axHistx.hist(x, bins=bins) +axHisty.hist(y, bins=bins, orientation='horizontal') + +# the xaxis of axHistx and yaxis of axHisty are shared with axScatter, +# thus there is no need to manually adjust the xlim and ylim of these +# axis. + +axHistx.set_yticks([0, 50, 100]) + +axHisty.set_xticks([0, 50, 100]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter_hist_locatable_axes.py](https://matplotlib.org/_downloads/scatter_hist_locatable_axes.py) +- [下载Jupyter notebook: scatter_hist_locatable_axes.ipynb](https://matplotlib.org/_downloads/scatter_hist_locatable_axes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_anchored_artists.md b/Python/matplotlab/gallery/axes_grid1/simple_anchored_artists.md new file mode 100644 index 00000000..60e15510 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_anchored_artists.md @@ -0,0 +1,86 @@ +# 简单锚定艺术家对象示例 + +此示例说明如何使用在 [offsetbox](https://matplotlib.org/api/offsetbox_api.html#module-matplotlib.offsetbox) 和 [Matplotlib axes_grid1 Toolkit](https://matplotlib.org/api/toolkits/axes_grid1.html#toolkit-axesgrid1-index) 中找到的锚定辅助对象类。类似图形的实现,但不使用工具包,可以在[锚定的艺术家对象](https://matplotlib.org/gallery/misc/anchored_artists.html)中找到。 + +![简单锚定艺术家对象示例](https://matplotlib.org/_images/sphx_glr_simple_anchored_artists_001.png) + +```python +import matplotlib.pyplot as plt + + +def draw_text(ax): + """ + Draw two text-boxes, anchored by different corners to the upper-left + corner of the figure. + """ + from matplotlib.offsetbox import AnchoredText + at = AnchoredText("Figure 1a", + loc='upper left', prop=dict(size=8), frameon=True, + ) + at.patch.set_boxstyle("round,pad=0.,rounding_size=0.2") + ax.add_artist(at) + + at2 = AnchoredText("Figure 1(b)", + loc='lower left', prop=dict(size=8), frameon=True, + bbox_to_anchor=(0., 1.), + bbox_transform=ax.transAxes + ) + at2.patch.set_boxstyle("round,pad=0.,rounding_size=0.2") + ax.add_artist(at2) + + +def draw_circle(ax): + """ + Draw a circle in axis coordinates + """ + from mpl_toolkits.axes_grid1.anchored_artists import AnchoredDrawingArea + from matplotlib.patches import Circle + ada = AnchoredDrawingArea(20, 20, 0, 0, + loc='upper right', pad=0., frameon=False) + p = Circle((10, 10), 10) + ada.da.add_artist(p) + ax.add_artist(ada) + + +def draw_ellipse(ax): + """ + Draw an ellipse of width=0.1, height=0.15 in data coordinates + """ + from mpl_toolkits.axes_grid1.anchored_artists import AnchoredEllipse + ae = AnchoredEllipse(ax.transData, width=0.1, height=0.15, angle=0., + loc='lower left', pad=0.5, borderpad=0.4, + frameon=True) + + ax.add_artist(ae) + + +def draw_sizebar(ax): + """ + Draw a horizontal bar with length of 0.1 in data coordinates, + with a fixed label underneath. + """ + from mpl_toolkits.axes_grid1.anchored_artists import AnchoredSizeBar + asb = AnchoredSizeBar(ax.transData, + 0.1, + r"1$^{\prime}$", + loc='lower center', + pad=0.1, borderpad=0.5, sep=5, + frameon=False) + ax.add_artist(asb) + + +ax = plt.gca() +ax.set_aspect(1.) + +draw_text(ax) +draw_circle(ax) +draw_ellipse(ax) +draw_sizebar(ax) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_anchored_artists.py](https://matplotlib.org/_downloads/simple_anchored_artists.py) +- [下载Jupyter notebook: simple_anchored_artists.ipynb](https://matplotlib.org/_downloads/simple_anchored_artists.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_axes_divider1.md b/Python/matplotlab/gallery/axes_grid1/simple_axes_divider1.md new file mode 100644 index 00000000..b07f2673 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_axes_divider1.md @@ -0,0 +1,38 @@ +# 简单轴分割器示例1 + +![简单轴分割器示例1](https://matplotlib.org/_images/sphx_glr_simple_axes_divider1_001.png) + +```python +from mpl_toolkits.axes_grid1 import Size, Divider +import matplotlib.pyplot as plt + + +fig1 = plt.figure(1, (6, 6)) + +# fixed size in inch +horiz = [Size.Fixed(1.), Size.Fixed(.5), Size.Fixed(1.5), + Size.Fixed(.5)] +vert = [Size.Fixed(1.5), Size.Fixed(.5), Size.Fixed(1.)] + +rect = (0.1, 0.1, 0.8, 0.8) +# divide the axes rectangle into grid whose size is specified by horiz * vert +divider = Divider(fig1, rect, horiz, vert, aspect=False) + +# the rect parameter will be ignore as we will set axes_locator +ax1 = fig1.add_axes(rect, label="1") +ax2 = fig1.add_axes(rect, label="2") +ax3 = fig1.add_axes(rect, label="3") +ax4 = fig1.add_axes(rect, label="4") + +ax1.set_axes_locator(divider.new_locator(nx=0, ny=0)) +ax2.set_axes_locator(divider.new_locator(nx=0, ny=2)) +ax3.set_axes_locator(divider.new_locator(nx=2, ny=2)) +ax4.set_axes_locator(divider.new_locator(nx=2, nx1=4, ny=0)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axes_divider1.py](https://matplotlib.org/_downloads/simple_axes_divider1.py) +- [下载Jupyter notebook: simple_axes_divider1.ipynb](https://matplotlib.org/_downloads/simple_axes_divider1.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_axes_divider2.md b/Python/matplotlab/gallery/axes_grid1/simple_axes_divider2.md new file mode 100644 index 00000000..60055d69 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_axes_divider2.md @@ -0,0 +1,38 @@ +# 简单轴分割器示例2 + +![简单轴分割器示例2](https://matplotlib.org/_images/sphx_glr_simple_axes_divider2_001.png) + +```python +import mpl_toolkits.axes_grid1.axes_size as Size +from mpl_toolkits.axes_grid1 import Divider +import matplotlib.pyplot as plt + +fig1 = plt.figure(1, (5.5, 4.)) + +# the rect parameter will be ignore as we will set axes_locator +rect = (0.1, 0.1, 0.8, 0.8) +ax = [fig1.add_axes(rect, label="%d" % i) for i in range(4)] + +horiz = [Size.Scaled(1.5), Size.Fixed(.5), Size.Scaled(1.), + Size.Scaled(.5)] + +vert = [Size.Scaled(1.), Size.Fixed(.5), Size.Scaled(1.5)] + +# divide the axes rectangle into grid whose size is specified by horiz * vert +divider = Divider(fig1, rect, horiz, vert, aspect=False) + +ax[0].set_axes_locator(divider.new_locator(nx=0, ny=0)) +ax[1].set_axes_locator(divider.new_locator(nx=0, ny=2)) +ax[2].set_axes_locator(divider.new_locator(nx=2, ny=2)) +ax[3].set_axes_locator(divider.new_locator(nx=2, nx1=4, ny=0)) + +for ax1 in ax: + ax1.tick_params(labelbottom=False, labelleft=False) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axes_divider2.py](https://matplotlib.org/_downloads/simple_axes_divider2.py) +- [下载Jupyter notebook: simple_axes_divider2.ipynb](https://matplotlib.org/_downloads/simple_axes_divider2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_axes_divider3.md b/Python/matplotlab/gallery/axes_grid1/simple_axes_divider3.md new file mode 100644 index 00000000..145cb51e --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_axes_divider3.md @@ -0,0 +1,47 @@ +# 简单轴分割器示例3 + +![简单轴分割器示例3](https://matplotlib.org/_images/sphx_glr_simple_axes_divider3_001.png) + +```python +import mpl_toolkits.axes_grid1.axes_size as Size +from mpl_toolkits.axes_grid1 import Divider +import matplotlib.pyplot as plt + + +fig1 = plt.figure(1, (5.5, 4)) + +# the rect parameter will be ignore as we will set axes_locator +rect = (0.1, 0.1, 0.8, 0.8) +ax = [fig1.add_axes(rect, label="%d" % i) for i in range(4)] + + +horiz = [Size.AxesX(ax[0]), Size.Fixed(.5), Size.AxesX(ax[1])] +vert = [Size.AxesY(ax[0]), Size.Fixed(.5), Size.AxesY(ax[2])] + +# divide the axes rectangle into grid whose size is specified by horiz * vert +divider = Divider(fig1, rect, horiz, vert, aspect=False) + + +ax[0].set_axes_locator(divider.new_locator(nx=0, ny=0)) +ax[1].set_axes_locator(divider.new_locator(nx=2, ny=0)) +ax[2].set_axes_locator(divider.new_locator(nx=0, ny=2)) +ax[3].set_axes_locator(divider.new_locator(nx=2, ny=2)) + +ax[0].set_xlim(0, 2) +ax[1].set_xlim(0, 1) + +ax[0].set_ylim(0, 1) +ax[2].set_ylim(0, 2) + +divider.set_aspect(1.) + +for ax1 in ax: + ax1.tick_params(labelbottom=False, labelleft=False) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axes_divider3.py](https://matplotlib.org/_downloads/simple_axes_divider3.py) +- [下载Jupyter notebook: simple_axes_divider3.ipynb](https://matplotlib.org/_downloads/simple_axes_divider3.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_axesgrid.md b/Python/matplotlab/gallery/axes_grid1/simple_axesgrid.md new file mode 100644 index 00000000..c1a0b256 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_axesgrid.md @@ -0,0 +1,28 @@ +# 简单的轴线网格 + +![简单的轴线网格](https://matplotlib.org/_images/sphx_glr_simple_axesgrid_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import ImageGrid +import numpy as np + +im = np.arange(100).reshape((10, 10)) + +fig = plt.figure(1, (4., 4.)) +grid = ImageGrid(fig, 111, # similar to subplot(111) + nrows_ncols=(2, 2), # creates 2x2 grid of axes + axes_pad=0.1, # pad between axes in inch. + ) + +for i in range(4): + grid[i].imshow(im) # The AxesGrid object work as a list of axes. + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axesgrid.py](https://matplotlib.org/_downloads/simple_axesgrid.py) +- [下载Jupyter notebook: simple_axesgrid.ipynb](https://matplotlib.org/_downloads/simple_axesgrid.ipynb) + diff --git a/Python/matplotlab/gallery/axes_grid1/simple_axesgrid2.md b/Python/matplotlab/gallery/axes_grid1/simple_axesgrid2.md new file mode 100644 index 00000000..41b100a2 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_axesgrid2.md @@ -0,0 +1,43 @@ +# 简单的轴线网格2 + +![简单的轴线网格2](https://matplotlib.org/_images/sphx_glr_simple_axesgrid2_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import ImageGrid + + +def get_demo_image(): + import numpy as np + from matplotlib.cbook import get_sample_data + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + +F = plt.figure(1, (5.5, 3.5)) +grid = ImageGrid(F, 111, # similar to subplot(111) + nrows_ncols=(1, 3), + axes_pad=0.1, + add_all=True, + label_mode="L", + ) + +Z, extent = get_demo_image() # demo image + +im1 = Z +im2 = Z[:, :10] +im3 = Z[:, 10:] +vmin, vmax = Z.min(), Z.max() +for i, im in enumerate([im1, im2, im3]): + ax = grid[i] + ax.imshow(im, origin="lower", vmin=vmin, + vmax=vmax, interpolation="nearest") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axesgrid2.py](https://matplotlib.org/_downloads/simple_axesgrid2.py) +- [下载Jupyter notebook: simple_axesgrid2.ipynb](https://matplotlib.org/_downloads/simple_axesgrid2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_axisline4.md b/Python/matplotlab/gallery/axes_grid1/simple_axisline4.md new file mode 100644 index 00000000..9b0569a6 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_axisline4.md @@ -0,0 +1,28 @@ +# 简单的 Axisline4 + +![简单的 Axisline4](https://matplotlib.org/_images/sphx_glr_simple_axisline4_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import host_subplot +import numpy as np + +ax = host_subplot(111) +xx = np.arange(0, 2*np.pi, 0.01) +ax.plot(xx, np.sin(xx)) + +ax2 = ax.twin() # ax2 is responsible for "top" axis and "right" axis +ax2.set_xticks([0., .5*np.pi, np.pi, 1.5*np.pi, 2*np.pi]) +ax2.set_xticklabels(["$0$", r"$\frac{1}{2}\pi$", + r"$\pi$", r"$\frac{3}{2}\pi$", r"$2\pi$"]) + +ax2.axis["right"].major_ticklabels.set_visible(False) +ax2.axis["top"].major_ticklabels.set_visible(True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axisline4.py](https://matplotlib.org/_downloads/simple_axisline4.py) +- [下载Jupyter notebook: simple_axisline4.ipynb](https://matplotlib.org/_downloads/simple_axisline4.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_colorbar.md b/Python/matplotlab/gallery/axes_grid1/simple_colorbar.md new file mode 100644 index 00000000..9b4a6e65 --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_colorbar.md @@ -0,0 +1,24 @@ +# 简单的彩色条实现 + +![简单的彩色条实现示例](https://matplotlib.org/_images/sphx_glr_simple_colorbar_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1 import make_axes_locatable +import numpy as np + +ax = plt.subplot(111) +im = ax.imshow(np.arange(100).reshape((10, 10))) + +# create an axes on the right side of ax. The width of cax will be 5% +# of ax and the padding between cax and ax will be fixed at 0.05 inch. +divider = make_axes_locatable(ax) +cax = divider.append_axes("right", size="5%", pad=0.05) + +plt.colorbar(im, cax=cax) +``` + +## 下载这个示例 + +- [下载python源码: simple_colorbar.py](https://matplotlib.org/_downloads/simple_colorbar.py) +- [下载Jupyter notebook: simple_colorbar.ipynb](https://matplotlib.org/_downloads/simple_colorbar.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axes_grid1/simple_rgb.md b/Python/matplotlab/gallery/axes_grid1/simple_rgb.md new file mode 100644 index 00000000..e6d8ccdc --- /dev/null +++ b/Python/matplotlab/gallery/axes_grid1/simple_rgb.md @@ -0,0 +1,49 @@ +# 简单的 RGB + +![简单的RGB示例](https://matplotlib.org/_images/sphx_glr_simple_rgb_001.png) + +```python +import matplotlib.pyplot as plt + +from mpl_toolkits.axes_grid1.axes_rgb import RGBAxes + + +def get_demo_image(): + import numpy as np + from matplotlib.cbook import get_sample_data + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + + +def get_rgb(): + Z, extent = get_demo_image() + + Z[Z < 0] = 0. + Z = Z / Z.max() + + R = Z[:13, :13] + G = Z[2:, 2:] + B = Z[:13, 2:] + + return R, G, B + + +fig = plt.figure(1) +ax = RGBAxes(fig, [0.1, 0.1, 0.8, 0.8]) + +r, g, b = get_rgb() +kwargs = dict(origin="lower", interpolation="nearest") +ax.imshow_rgb(r, g, b, **kwargs) + +ax.RGB.set_xlim(0., 9.5) +ax.RGB.set_ylim(0.9, 10.6) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_rgb.py](https://matplotlib.org/_downloads/simple_rgb.py) +- [下载Jupyter notebook: simple_rgb.ipynb](https://matplotlib.org/_downloads/simple_rgb.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/axis_direction_demo_step01.md b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step01.md new file mode 100644 index 00000000..b951c895 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step01.md @@ -0,0 +1,37 @@ +# 轴方向演示步骤01 + +![轴方向演示步骤01示例](https://matplotlib.org/_images/sphx_glr_axis_direction_demo_step01_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + + +def setup_axes(fig, rect): + ax = axisartist.Subplot(fig, rect) + fig.add_axes(ax) + + ax.set_ylim(-0.1, 1.5) + ax.set_yticks([0, 1]) + + ax.axis[:].set_visible(False) + + ax.axis["x"] = ax.new_floating_axis(1, 0.5) + ax.axis["x"].set_axisline_style("->", size=1.5) + + return ax + + +fig = plt.figure(figsize=(3, 2.5)) +fig.subplots_adjust(top=0.8) +ax1 = setup_axes(fig, "111") + +ax1.axis["x"].set_axis_direction("left") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axis_direction_demo_step01.py](https://matplotlib.org/_downloads/axis_direction_demo_step01.py) +- [下载Jupyter notebook: axis_direction_demo_step01.ipynb](https://matplotlib.org/_downloads/axis_direction_demo_step01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/axis_direction_demo_step02.md b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step02.md new file mode 100644 index 00000000..dcc2da33 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step02.md @@ -0,0 +1,48 @@ +# 轴方向演示步骤02 + +![轴方向演示步骤02示例](https://matplotlib.org/_images/sphx_glr_axis_direction_demo_step02_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + + +def setup_axes(fig, rect): + ax = axisartist.Subplot(fig, rect) + fig.add_axes(ax) + + ax.set_ylim(-0.1, 1.5) + ax.set_yticks([0, 1]) + + #ax.axis[:].toggle(all=False) + #ax.axis[:].line.set_visible(False) + ax.axis[:].set_visible(False) + + ax.axis["x"] = ax.new_floating_axis(1, 0.5) + ax.axis["x"].set_axisline_style("->", size=1.5) + + return ax + + +fig = plt.figure(figsize=(6, 2.5)) +fig.subplots_adjust(bottom=0.2, top=0.8) + +ax1 = setup_axes(fig, "121") +ax1.axis["x"].set_ticklabel_direction("+") +ax1.annotate("ticklabel direction=$+$", (0.5, 0), xycoords="axes fraction", + xytext=(0, -10), textcoords="offset points", + va="top", ha="center") + +ax2 = setup_axes(fig, "122") +ax2.axis["x"].set_ticklabel_direction("-") +ax2.annotate("ticklabel direction=$-$", (0.5, 0), xycoords="axes fraction", + xytext=(0, -10), textcoords="offset points", + va="top", ha="center") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axis_direction_demo_step02.py](https://matplotlib.org/_downloads/axis_direction_demo_step02.py) +- [下载Jupyter notebook: axis_direction_demo_step02.ipynb](https://matplotlib.org/_downloads/axis_direction_demo_step02.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/axis_direction_demo_step03.md b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step03.md new file mode 100644 index 00000000..79a85586 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step03.md @@ -0,0 +1,52 @@ +# 轴方向演示步骤03 + +![轴方向演示步骤03示例](https://matplotlib.org/_images/sphx_glr_axis_direction_demo_step03_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + + +def setup_axes(fig, rect): + ax = axisartist.Subplot(fig, rect) + fig.add_axes(ax) + + ax.set_ylim(-0.1, 1.5) + ax.set_yticks([0, 1]) + + #ax.axis[:].toggle(all=False) + #ax.axis[:].line.set_visible(False) + ax.axis[:].set_visible(False) + + ax.axis["x"] = ax.new_floating_axis(1, 0.5) + ax.axis["x"].set_axisline_style("->", size=1.5) + + return ax + + +fig = plt.figure(figsize=(6, 2.5)) +fig.subplots_adjust(bottom=0.2, top=0.8) + +ax1 = setup_axes(fig, "121") +ax1.axis["x"].label.set_text("Label") +ax1.axis["x"].toggle(ticklabels=False) +ax1.axis["x"].set_axislabel_direction("+") +ax1.annotate("label direction=$+$", (0.5, 0), xycoords="axes fraction", + xytext=(0, -10), textcoords="offset points", + va="top", ha="center") + +ax2 = setup_axes(fig, "122") +ax2.axis["x"].label.set_text("Label") +ax2.axis["x"].toggle(ticklabels=False) +ax2.axis["x"].set_axislabel_direction("-") +ax2.annotate("label direction=$-$", (0.5, 0), xycoords="axes fraction", + xytext=(0, -10), textcoords="offset points", + va="top", ha="center") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axis_direction_demo_step03.py](https://matplotlib.org/_downloads/axis_direction_demo_step03.py) +- [下载Jupyter notebook: axis_direction_demo_step03.ipynb](https://matplotlib.org/_downloads/axis_direction_demo_step03.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/axis_direction_demo_step04.md b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step04.md new file mode 100644 index 00000000..f728ea10 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/axis_direction_demo_step04.md @@ -0,0 +1,65 @@ +# 轴方向演示步骤04 + +![轴方向演示步骤04示例](https://matplotlib.org/_images/sphx_glr_axis_direction_demo_step04_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + + +def setup_axes(fig, rect): + ax = axisartist.Subplot(fig, rect) + fig.add_axes(ax) + + ax.set_ylim(-0.1, 1.5) + ax.set_yticks([0, 1]) + + ax.axis[:].set_visible(False) + + ax.axis["x1"] = ax.new_floating_axis(1, 0.3) + ax.axis["x1"].set_axisline_style("->", size=1.5) + + ax.axis["x2"] = ax.new_floating_axis(1, 0.7) + ax.axis["x2"].set_axisline_style("->", size=1.5) + + return ax + + +fig = plt.figure(figsize=(6, 2.5)) +fig.subplots_adjust(bottom=0.2, top=0.8) + +ax1 = setup_axes(fig, "121") +ax1.axis["x1"].label.set_text("rotation=0") +ax1.axis["x1"].toggle(ticklabels=False) + +ax1.axis["x2"].label.set_text("rotation=10") +ax1.axis["x2"].label.set_rotation(10) +ax1.axis["x2"].toggle(ticklabels=False) + +ax1.annotate("label direction=$+$", (0.5, 0), xycoords="axes fraction", + xytext=(0, -10), textcoords="offset points", + va="top", ha="center") + +ax2 = setup_axes(fig, "122") + +ax2.axis["x1"].set_axislabel_direction("-") +ax2.axis["x2"].set_axislabel_direction("-") + +ax2.axis["x1"].label.set_text("rotation=0") +ax2.axis["x1"].toggle(ticklabels=False) + +ax2.axis["x2"].label.set_text("rotation=10") +ax2.axis["x2"].label.set_rotation(10) +ax2.axis["x2"].toggle(ticklabels=False) + +ax2.annotate("label direction=$-$", (0.5, 0), xycoords="axes fraction", + xytext=(0, -10), textcoords="offset points", + va="top", ha="center") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axis_direction_demo_step04.py](https://matplotlib.org/_downloads/axis_direction_demo_step04.py) +- [下载Jupyter notebook: axis_direction_demo_step04.ipynb](https://matplotlib.org/_downloads/axis_direction_demo_step04.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_axis_direction.md b/Python/matplotlab/gallery/axisartist/demo_axis_direction.md new file mode 100644 index 00000000..6916bd8b --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_axis_direction.md @@ -0,0 +1,101 @@ +# 演示轴方向 + +![演示轴方向示例](https://matplotlib.org/_images/sphx_glr_demo_axis_direction_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist.angle_helper as angle_helper +import mpl_toolkits.axisartist.grid_finder as grid_finder +from matplotlib.projections import PolarAxes +from matplotlib.transforms import Affine2D + +import mpl_toolkits.axisartist as axisartist + +from mpl_toolkits.axisartist.grid_helper_curvelinear import \ + GridHelperCurveLinear + + +def setup_axes(fig, rect): + """ + polar projection, but in a rectangular box. + """ + + # see demo_curvelinear_grid.py for details + tr = Affine2D().scale(np.pi/180., 1.) + PolarAxes.PolarTransform() + + extreme_finder = angle_helper.ExtremeFinderCycle(20, 20, + lon_cycle=360, + lat_cycle=None, + lon_minmax=None, + lat_minmax=(0, np.inf), + ) + + grid_locator1 = angle_helper.LocatorDMS(12) + grid_locator2 = grid_finder.MaxNLocator(5) + + tick_formatter1 = angle_helper.FormatterDMS() + + grid_helper = GridHelperCurveLinear(tr, + extreme_finder=extreme_finder, + grid_locator1=grid_locator1, + grid_locator2=grid_locator2, + tick_formatter1=tick_formatter1 + ) + + ax1 = axisartist.Subplot(fig, rect, grid_helper=grid_helper) + ax1.axis[:].toggle(ticklabels=False) + + fig.add_subplot(ax1) + + ax1.set_aspect(1.) + ax1.set_xlim(-5, 12) + ax1.set_ylim(-5, 10) + + return ax1 + + +def add_floating_axis1(ax1): + ax1.axis["lat"] = axis = ax1.new_floating_axis(0, 30) + axis.label.set_text(r"$\theta = 30^{\circ}$") + axis.label.set_visible(True) + + return axis + + +def add_floating_axis2(ax1): + ax1.axis["lon"] = axis = ax1.new_floating_axis(1, 6) + axis.label.set_text(r"$r = 6$") + axis.label.set_visible(True) + + return axis + + +fig = plt.figure(1, figsize=(8, 4)) +fig.clf() +fig.subplots_adjust(left=0.01, right=0.99, bottom=0.01, top=0.99, + wspace=0.01, hspace=0.01) + +for i, d in enumerate(["bottom", "left", "top", "right"]): + ax1 = setup_axes(fig, rect=241++i) + axis = add_floating_axis1(ax1) + axis.set_axis_direction(d) + ax1.annotate(d, (0, 1), (5, -5), + xycoords="axes fraction", textcoords="offset points", + va="top", ha="left") + +for i, d in enumerate(["bottom", "left", "top", "right"]): + ax1 = setup_axes(fig, rect=245++i) + axis = add_floating_axis2(ax1) + axis.set_axis_direction(d) + ax1.annotate(d, (0, 1), (5, -5), + xycoords="axes fraction", textcoords="offset points", + va="top", ha="left") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_axis_direction.py](https://matplotlib.org/_downloads/demo_axis_direction.py) +- [下载Jupyter notebook: demo_axis_direction.ipynb](https://matplotlib.org/_downloads/demo_axis_direction.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_axisline_style.md b/Python/matplotlab/gallery/axisartist/demo_axisline_style.md new file mode 100644 index 00000000..214bc31e --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_axisline_style.md @@ -0,0 +1,37 @@ +# 轴线样式 + +此示例显示了轴样式的一些配置。 + +![轴线样式示例](https://matplotlib.org/_images/sphx_glr_demo_axisline_style_001.png) + +```python +from mpl_toolkits.axisartist.axislines import SubplotZero +import matplotlib.pyplot as plt +import numpy as np + +if 1: + fig = plt.figure(1) + ax = SubplotZero(fig, 111) + fig.add_subplot(ax) + + for direction in ["xzero", "yzero"]: + # adds arrows at the ends of each axis + ax.axis[direction].set_axisline_style("-|>") + + # adds X and Y-axis from the origin + ax.axis[direction].set_visible(True) + + for direction in ["left", "right", "bottom", "top"]: + # hides borders + ax.axis[direction].set_visible(False) + + x = np.linspace(-0.5, 1., 100) + ax.plot(x, np.sin(x*np.pi)) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_axisline_style.py](https://matplotlib.org/_downloads/demo_axisline_style.py) +- [下载Jupyter notebook: demo_axisline_style.ipynb](https://matplotlib.org/_downloads/demo_axisline_style.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_curvelinear_grid.md b/Python/matplotlab/gallery/axisartist/demo_curvelinear_grid.md new file mode 100644 index 00000000..b0fa978c --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_curvelinear_grid.md @@ -0,0 +1,142 @@ +# 演示Curvelinear网格 + +自定义网格和记号行。 + +此示例演示如何通过在网格上应用转换,使用GridHelperCurve线性来定义自定义网格和注释行。这可以用作第二个打印上的演示,用于在矩形框中创建极轴投影。 + +![Curvelinear网格示例](https://matplotlib.org/_images/sphx_glr_demo_curvelinear_grid_0011.png) + +```python +import numpy as np + +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook + +from mpl_toolkits.axisartist import Subplot +from mpl_toolkits.axisartist import SubplotHost, \ + ParasiteAxesAuxTrans +from mpl_toolkits.axisartist.grid_helper_curvelinear import \ + GridHelperCurveLinear + + +def curvelinear_test1(fig): + """ + grid for custom transform. + """ + + def tr(x, y): + x, y = np.asarray(x), np.asarray(y) + return x, y - x + + def inv_tr(x, y): + x, y = np.asarray(x), np.asarray(y) + return x, y + x + + grid_helper = GridHelperCurveLinear((tr, inv_tr)) + + ax1 = Subplot(fig, 1, 2, 1, grid_helper=grid_helper) + # ax1 will have a ticks and gridlines defined by the given + # transform (+ transData of the Axes). Note that the transform of + # the Axes itself (i.e., transData) is not affected by the given + # transform. + + fig.add_subplot(ax1) + + xx, yy = tr([3, 6], [5.0, 10.]) + ax1.plot(xx, yy, linewidth=2.0) + + ax1.set_aspect(1.) + ax1.set_xlim(0, 10.) + ax1.set_ylim(0, 10.) + + ax1.axis["t"] = ax1.new_floating_axis(0, 3.) + ax1.axis["t2"] = ax1.new_floating_axis(1, 7.) + ax1.grid(True, zorder=0) + + +import mpl_toolkits.axisartist.angle_helper as angle_helper + +from matplotlib.projections import PolarAxes +from matplotlib.transforms import Affine2D + + +def curvelinear_test2(fig): + """ + polar projection, but in a rectangular box. + """ + + # PolarAxes.PolarTransform takes radian. However, we want our coordinate + # system in degree + tr = Affine2D().scale(np.pi/180., 1.) + PolarAxes.PolarTransform() + + # polar projection, which involves cycle, and also has limits in + # its coordinates, needs a special method to find the extremes + # (min, max of the coordinate within the view). + + # 20, 20 : number of sampling points along x, y direction + extreme_finder = angle_helper.ExtremeFinderCycle(20, 20, + lon_cycle=360, + lat_cycle=None, + lon_minmax=None, + lat_minmax=(0, np.inf), + ) + + grid_locator1 = angle_helper.LocatorDMS(12) + # Find a grid values appropriate for the coordinate (degree, + # minute, second). + + tick_formatter1 = angle_helper.FormatterDMS() + # And also uses an appropriate formatter. Note that,the + # acceptable Locator and Formatter class is a bit different than + # that of mpl's, and you cannot directly use mpl's Locator and + # Formatter here (but may be possible in the future). + + grid_helper = GridHelperCurveLinear(tr, + extreme_finder=extreme_finder, + grid_locator1=grid_locator1, + tick_formatter1=tick_formatter1 + ) + + ax1 = SubplotHost(fig, 1, 2, 2, grid_helper=grid_helper) + + # make ticklabels of right and top axis visible. + ax1.axis["right"].major_ticklabels.set_visible(True) + ax1.axis["top"].major_ticklabels.set_visible(True) + + # let right axis shows ticklabels for 1st coordinate (angle) + ax1.axis["right"].get_helper().nth_coord_ticks = 0 + # let bottom axis shows ticklabels for 2nd coordinate (radius) + ax1.axis["bottom"].get_helper().nth_coord_ticks = 1 + + fig.add_subplot(ax1) + + # A parasite axes with given transform + ax2 = ParasiteAxesAuxTrans(ax1, tr, "equal") + # note that ax2.transData == tr + ax1.transData + # Anything you draw in ax2 will match the ticks and grids of ax1. + ax1.parasites.append(ax2) + intp = cbook.simple_linear_interpolation + ax2.plot(intp(np.array([0, 30]), 50), + intp(np.array([10., 10.]), 50), + linewidth=2.0) + + ax1.set_aspect(1.) + ax1.set_xlim(-5, 12) + ax1.set_ylim(-5, 10) + + ax1.grid(True, zorder=0) + +if 1: + fig = plt.figure(1, figsize=(7, 4)) + fig.clf() + + curvelinear_test1(fig) + curvelinear_test2(fig) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_curvelinear_grid.py](https://matplotlib.org/_downloads/demo_curvelinear_grid.py) +- [下载Jupyter notebook: demo_curvelinear_grid.ipynb](https://matplotlib.org/_downloads/demo_curvelinear_grid.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_curvelinear_grid2.md b/Python/matplotlab/gallery/axisartist/demo_curvelinear_grid2.md new file mode 100644 index 00000000..16b7aa33 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_curvelinear_grid2.md @@ -0,0 +1,75 @@ +# 演示Curvelinear网格2 + +自定义网格和记号行。 + +此示例演示如何通过在网格上应用转换,使用GridHelperCurve线性来定义自定义网格和注释行。作为打印上的演示,轴上将显示5x5矩阵。 + +![Curvelinear网格2示例](https://matplotlib.org/_images/sphx_glr_demo_curvelinear_grid2_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +from mpl_toolkits.axisartist.grid_helper_curvelinear import \ + GridHelperCurveLinear +from mpl_toolkits.axisartist.axislines import Subplot + +import mpl_toolkits.axisartist.angle_helper as angle_helper + + +def curvelinear_test1(fig): + """ + grid for custom transform. + """ + + def tr(x, y): + sgn = np.sign(x) + x, y = np.abs(np.asarray(x)), np.asarray(y) + return sgn*x**.5, y + + def inv_tr(x, y): + sgn = np.sign(x) + x, y = np.asarray(x), np.asarray(y) + return sgn*x**2, y + + extreme_finder = angle_helper.ExtremeFinderCycle(20, 20, + lon_cycle=None, + lat_cycle=None, + # (0, np.inf), + lon_minmax=None, + lat_minmax=None, + ) + + grid_helper = GridHelperCurveLinear((tr, inv_tr), + extreme_finder=extreme_finder) + + ax1 = Subplot(fig, 111, grid_helper=grid_helper) + # ax1 will have a ticks and gridlines defined by the given + # transform (+ transData of the Axes). Note that the transform of + # the Axes itself (i.e., transData) is not affected by the given + # transform. + + fig.add_subplot(ax1) + + ax1.imshow(np.arange(25).reshape(5, 5), + vmax=50, cmap=plt.cm.gray_r, + interpolation="nearest", + origin="lower") + + # tick density + grid_helper.grid_finder.grid_locator1._nbins = 6 + grid_helper.grid_finder.grid_locator2._nbins = 6 + + +if 1: + fig = plt.figure(1, figsize=(7, 4)) + fig.clf() + + curvelinear_test1(fig) + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_curvelinear_grid2.py](https://matplotlib.org/_downloads/demo_curvelinear_grid2.py) +- [下载Jupyter notebook: demo_curvelinear_grid2.ipynb](https://matplotlib.org/_downloads/demo_curvelinear_grid2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_floating_axes.md b/Python/matplotlab/gallery/axisartist/demo_floating_axes.md new file mode 100644 index 00000000..7f2b9dba --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_floating_axes.md @@ -0,0 +1,160 @@ +# 演示浮动轴 + +浮动轴的演示。 + +```python +from matplotlib.transforms import Affine2D +import mpl_toolkits.axisartist.floating_axes as floating_axes +import numpy as np +import mpl_toolkits.axisartist.angle_helper as angle_helper +from matplotlib.projections import PolarAxes +from mpl_toolkits.axisartist.grid_finder import (FixedLocator, MaxNLocator, + DictFormatter) +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +def setup_axes1(fig, rect): + """ + A simple one. + """ + tr = Affine2D().scale(2, 1).rotate_deg(30) + + grid_helper = floating_axes.GridHelperCurveLinear( + tr, extremes=(-0.5, 3.5, 0, 4)) + + ax1 = floating_axes.FloatingSubplot(fig, rect, grid_helper=grid_helper) + fig.add_subplot(ax1) + + aux_ax = ax1.get_aux_axes(tr) + + grid_helper.grid_finder.grid_locator1._nbins = 4 + grid_helper.grid_finder.grid_locator2._nbins = 4 + + return ax1, aux_ax + + +def setup_axes2(fig, rect): + """ + With custom locator and formatter. + Note that the extreme values are swapped. + """ + tr = PolarAxes.PolarTransform() + + pi = np.pi + angle_ticks = [(0, r"$0$"), + (.25*pi, r"$\frac{1}{4}\pi$"), + (.5*pi, r"$\frac{1}{2}\pi$")] + grid_locator1 = FixedLocator([v for v, s in angle_ticks]) + tick_formatter1 = DictFormatter(dict(angle_ticks)) + + grid_locator2 = MaxNLocator(2) + + grid_helper = floating_axes.GridHelperCurveLinear( + tr, extremes=(.5*pi, 0, 2, 1), + grid_locator1=grid_locator1, + grid_locator2=grid_locator2, + tick_formatter1=tick_formatter1, + tick_formatter2=None) + + ax1 = floating_axes.FloatingSubplot(fig, rect, grid_helper=grid_helper) + fig.add_subplot(ax1) + + # create a parasite axes whose transData in RA, cz + aux_ax = ax1.get_aux_axes(tr) + + aux_ax.patch = ax1.patch # for aux_ax to have a clip path as in ax + ax1.patch.zorder = 0.9 # but this has a side effect that the patch is + # drawn twice, and possibly over some other + # artists. So, we decrease the zorder a bit to + # prevent this. + + return ax1, aux_ax + + +def setup_axes3(fig, rect): + """ + Sometimes, things like axis_direction need to be adjusted. + """ + + # rotate a bit for better orientation + tr_rotate = Affine2D().translate(-95, 0) + + # scale degree to radians + tr_scale = Affine2D().scale(np.pi/180., 1.) + + tr = tr_rotate + tr_scale + PolarAxes.PolarTransform() + + grid_locator1 = angle_helper.LocatorHMS(4) + tick_formatter1 = angle_helper.FormatterHMS() + + grid_locator2 = MaxNLocator(3) + + # Specify theta limits in degrees + ra0, ra1 = 8.*15, 14.*15 + # Specify radial limits + cz0, cz1 = 0, 14000 + grid_helper = floating_axes.GridHelperCurveLinear( + tr, extremes=(ra0, ra1, cz0, cz1), + grid_locator1=grid_locator1, + grid_locator2=grid_locator2, + tick_formatter1=tick_formatter1, + tick_formatter2=None) + + ax1 = floating_axes.FloatingSubplot(fig, rect, grid_helper=grid_helper) + fig.add_subplot(ax1) + + # adjust axis + ax1.axis["left"].set_axis_direction("bottom") + ax1.axis["right"].set_axis_direction("top") + + ax1.axis["bottom"].set_visible(False) + ax1.axis["top"].set_axis_direction("bottom") + ax1.axis["top"].toggle(ticklabels=True, label=True) + ax1.axis["top"].major_ticklabels.set_axis_direction("top") + ax1.axis["top"].label.set_axis_direction("top") + + ax1.axis["left"].label.set_text(r"cz [km$^{-1}$]") + ax1.axis["top"].label.set_text(r"$\alpha_{1950}$") + + # create a parasite axes whose transData in RA, cz + aux_ax = ax1.get_aux_axes(tr) + + aux_ax.patch = ax1.patch # for aux_ax to have a clip path as in ax + ax1.patch.zorder = 0.9 # but this has a side effect that the patch is + # drawn twice, and possibly over some other + # artists. So, we decrease the zorder a bit to + # prevent this. + + return ax1, aux_ax +``` + +```python +fig = plt.figure(1, figsize=(8, 4)) +fig.subplots_adjust(wspace=0.3, left=0.05, right=0.95) + +ax1, aux_ax1 = setup_axes1(fig, 131) +aux_ax1.bar([0, 1, 2, 3], [3, 2, 1, 3]) + +ax2, aux_ax2 = setup_axes2(fig, 132) +theta = np.random.rand(10)*.5*np.pi +radius = np.random.rand(10) + 1. +aux_ax2.scatter(theta, radius) + +ax3, aux_ax3 = setup_axes3(fig, 133) + +theta = (8 + np.random.rand(10)*(14 - 8))*15. # in degrees +radius = np.random.rand(10)*14000. +aux_ax3.scatter(theta, radius) + +plt.show() +``` + +![演示浮动轴](https://matplotlib.org/_images/sphx_glr_demo_floating_axes_001.png) + +## 下载这个示例 + +- [下载python源码: demo_floating_axes.py](https://matplotlib.org/_downloads/demo_floating_axes.py) +- [下载Jupyter notebook: demo_floating_axes.ipynb](https://matplotlib.org/_downloads/demo_floating_axes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_floating_axis.md b/Python/matplotlab/gallery/axisartist/demo_floating_axis.md new file mode 100644 index 00000000..c51bbd77 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_floating_axis.md @@ -0,0 +1,76 @@ +# 演示浮动轴2 + +轴在矩形框内 + +以下代码演示了如何将浮动极坐标曲线放在矩形框内。 为了更好地了解极坐标曲线,请查看demo_curvelinear_grid.py。 + +![演示浮动轴2](https://matplotlib.org/_images/sphx_glr_demo_floating_axis_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist.angle_helper as angle_helper +from matplotlib.projections import PolarAxes +from matplotlib.transforms import Affine2D +from mpl_toolkits.axisartist import SubplotHost +from mpl_toolkits.axisartist import GridHelperCurveLinear + + +def curvelinear_test2(fig): + """Polar projection, but in a rectangular box. + """ + # see demo_curvelinear_grid.py for details + tr = Affine2D().scale(np.pi / 180., 1.) + PolarAxes.PolarTransform() + + extreme_finder = angle_helper.ExtremeFinderCycle(20, + 20, + lon_cycle=360, + lat_cycle=None, + lon_minmax=None, + lat_minmax=(0, + np.inf), + ) + + grid_locator1 = angle_helper.LocatorDMS(12) + + tick_formatter1 = angle_helper.FormatterDMS() + + grid_helper = GridHelperCurveLinear(tr, + extreme_finder=extreme_finder, + grid_locator1=grid_locator1, + tick_formatter1=tick_formatter1 + ) + + ax1 = SubplotHost(fig, 1, 1, 1, grid_helper=grid_helper) + + fig.add_subplot(ax1) + + # Now creates floating axis + + # floating axis whose first coordinate (theta) is fixed at 60 + ax1.axis["lat"] = axis = ax1.new_floating_axis(0, 60) + axis.label.set_text(r"$\theta = 60^{\circ}$") + axis.label.set_visible(True) + + # floating axis whose second coordinate (r) is fixed at 6 + ax1.axis["lon"] = axis = ax1.new_floating_axis(1, 6) + axis.label.set_text(r"$r = 6$") + + ax1.set_aspect(1.) + ax1.set_xlim(-5, 12) + ax1.set_ylim(-5, 10) + + ax1.grid(True) + +fig = plt.figure(1, figsize=(5, 5)) +fig.clf() + +curvelinear_test2(fig) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_floating_axis.py](https://matplotlib.org/_downloads/demo_floating_axis.py) +- [下载Jupyter notebook: demo_floating_axis.ipynb](https://matplotlib.org/_downloads/demo_floating_axis.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_parasite_axes.md b/Python/matplotlab/gallery/axisartist/demo_parasite_axes.md new file mode 100644 index 00000000..cc99eca5 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_parasite_axes.md @@ -0,0 +1,65 @@ +# 演示寄生轴 + +创建寄生轴。这些轴将与主体轴共享x比例,但在y方向显示不同的比例。 + +请注意,此方法使用 [parasite_axes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.parasite_axes.html#module-mpl_toolkits.axes_grid1.parasite_axes) 的 [HostAxes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.parasite_axes.HostAxes.html#mpl_toolkits.axes_grid1.parasite_axes.HostAxes) 和 [ParasiteAxes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.parasite_axes.ParasiteAxes.html#mpl_toolkits.axes_grid1.parasite_axes.ParasiteAxes)。 在 [Demo Parasite Axes2](https://matplotlib.org/api/toolkits/axes_grid1.html#toolkit-axesgrid1-index) 示例中可以找到使用[Matplotlib axes_grid1 Toolkit](https://matplotlib.org/api/toolkits/axisartist.html#toolkit-axisartist-index) 和 Matplotlib axisartist Toolkit的替代方法。 使用通常的matplotlib子图的替代方法显示在 [Multiple Yaxis With Spines](https://matplotlib.org/gallery/ticks_and_spines/multiple_yaxis_with_spines.html) 示例中。 + +![演示寄生轴](https://matplotlib.org/_images/sphx_glr_demo_parasite_axes_001.png) + +```python +from mpl_toolkits.axisartist.parasite_axes import HostAxes, ParasiteAxes +import matplotlib.pyplot as plt + + +fig = plt.figure(1) + +host = HostAxes(fig, [0.15, 0.1, 0.65, 0.8]) +par1 = ParasiteAxes(host, sharex=host) +par2 = ParasiteAxes(host, sharex=host) +host.parasites.append(par1) +host.parasites.append(par2) + +host.set_ylabel("Density") +host.set_xlabel("Distance") + +host.axis["right"].set_visible(False) +par1.axis["right"].set_visible(True) +par1.set_ylabel("Temperature") + +par1.axis["right"].major_ticklabels.set_visible(True) +par1.axis["right"].label.set_visible(True) + +par2.set_ylabel("Velocity") +offset = (60, 0) +new_axisline = par2._grid_helper.new_fixed_axis +par2.axis["right2"] = new_axisline(loc="right", axes=par2, offset=offset) + +fig.add_axes(host) + +host.set_xlim(0, 2) +host.set_ylim(0, 2) + +host.set_xlabel("Distance") +host.set_ylabel("Density") +par1.set_ylabel("Temperature") + +p1, = host.plot([0, 1, 2], [0, 1, 2], label="Density") +p2, = par1.plot([0, 1, 2], [0, 3, 2], label="Temperature") +p3, = par2.plot([0, 1, 2], [50, 30, 15], label="Velocity") + +par1.set_ylim(0, 4) +par2.set_ylim(1, 65) + +host.legend() + +host.axis["left"].label.set_color(p1.get_color()) +par1.axis["right"].label.set_color(p2.get_color()) +par2.axis["right2"].label.set_color(p3.get_color()) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_parasite_axes.py](https://matplotlib.org/_downloads/demo_parasite_axes.py) +- [下载Jupyter notebook: demo_parasite_axes.ipynb](https://matplotlib.org/_downloads/demo_parasite_axes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_parasite_axes2.md b/Python/matplotlab/gallery/axisartist/demo_parasite_axes2.md new file mode 100644 index 00000000..757ccac3 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_parasite_axes2.md @@ -0,0 +1,58 @@ +# 演示寄生轴2 + +寄生轴的演示。 + +以下代码是寄生虫轴的示例。 它旨在展示如何在一个单独的图上绘制多个不同的值。 请注意,在此示例中,par1和par2都调用twinx,这意味着两者都直接绑定到x轴。 从那里,这两个轴中的每一个可以彼此分开地表现,这意味着它们可以从它们自身以及x轴上获取单独的值。 + +请注意,此方法使用 [parasite_axes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.parasite_axes.html#module-mpl_toolkits.axes_grid1.parasite_axes) 的 [HostAxes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.parasite_axes.HostAxes.html#mpl_toolkits.axes_grid1.parasite_axes.HostAxes) 和 [ParasiteAxes](https://matplotlib.org/api/_as_gen/mpl_toolkits.axes_grid1.parasite_axes.ParasiteAxes.html#mpl_toolkits.axes_grid1.parasite_axes.ParasiteAxes)。 在 [Demo Parasite Axes2](https://matplotlib.org/api/toolkits/axes_grid1.html#toolkit-axesgrid1-index) 示例中可以找到使用[Matplotlib axes_grid1 Toolkit](https://matplotlib.org/api/toolkits/axisartist.html#toolkit-axisartist-index) 和 Matplotlib axisartist Toolkit的替代方法。 使用通常的matplotlib子图的替代方法显示在 [Multiple Yaxis With Spines](https://matplotlib.org/gallery/ticks_and_spines/multiple_yaxis_with_spines.html) 示例中。 + +![演示寄生轴2](https://matplotlib.org/_images/sphx_glr_demo_parasite_axes2_001.png) + +```python +from mpl_toolkits.axes_grid1 import host_subplot +import mpl_toolkits.axisartist as AA +import matplotlib.pyplot as plt + +host = host_subplot(111, axes_class=AA.Axes) +plt.subplots_adjust(right=0.75) + +par1 = host.twinx() +par2 = host.twinx() + +offset = 60 +new_fixed_axis = par2.get_grid_helper().new_fixed_axis +par2.axis["right"] = new_fixed_axis(loc="right", + axes=par2, + offset=(offset, 0)) + +par1.axis["right"].toggle(all=True) +par2.axis["right"].toggle(all=True) + +host.set_xlim(0, 2) +host.set_ylim(0, 2) + +host.set_xlabel("Distance") +host.set_ylabel("Density") +par1.set_ylabel("Temperature") +par2.set_ylabel("Velocity") + +p1, = host.plot([0, 1, 2], [0, 1, 2], label="Density") +p2, = par1.plot([0, 1, 2], [0, 3, 2], label="Temperature") +p3, = par2.plot([0, 1, 2], [50, 30, 15], label="Velocity") + +par1.set_ylim(0, 4) +par2.set_ylim(1, 65) + +host.legend() + +host.axis["left"].label.set_color(p1.get_color()) +par1.axis["right"].label.set_color(p2.get_color()) +par2.axis["right"].label.set_color(p3.get_color()) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_parasite_axes2.py](https://matplotlib.org/_downloads/demo_parasite_axes2.py) +- [下载Jupyter notebook: demo_parasite_axes2.ipynb](https://matplotlib.org/_downloads/demo_parasite_axes2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_ticklabel_alignment.md b/Python/matplotlab/gallery/axisartist/demo_ticklabel_alignment.md new file mode 100644 index 00000000..ca4ef911 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_ticklabel_alignment.md @@ -0,0 +1,47 @@ +# Ticklabel对齐演示 + +![Ticklabel对齐演示](https://matplotlib.org/_images/sphx_glr_demo_ticklabel_alignment_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + + +def setup_axes(fig, rect): + ax = axisartist.Subplot(fig, rect) + fig.add_subplot(ax) + + ax.set_yticks([0.2, 0.8]) + ax.set_yticklabels(["short", "loooong"]) + ax.set_xticks([0.2, 0.8]) + ax.set_xticklabels([r"$\frac{1}{2}\pi$", r"$\pi$"]) + + return ax + + +fig = plt.figure(1, figsize=(3, 5)) +fig.subplots_adjust(left=0.5, hspace=0.7) + +ax = setup_axes(fig, 311) +ax.set_ylabel("ha=right") +ax.set_xlabel("va=baseline") + +ax = setup_axes(fig, 312) +ax.axis["left"].major_ticklabels.set_ha("center") +ax.axis["bottom"].major_ticklabels.set_va("top") +ax.set_ylabel("ha=center") +ax.set_xlabel("va=top") + +ax = setup_axes(fig, 313) +ax.axis["left"].major_ticklabels.set_ha("left") +ax.axis["bottom"].major_ticklabels.set_va("bottom") +ax.set_ylabel("ha=left") +ax.set_xlabel("va=bottom") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_ticklabel_alignment.py](https://matplotlib.org/_downloads/demo_ticklabel_alignment.py) +- [下载Jupyter notebook: demo_ticklabel_alignment.ipynb](https://matplotlib.org/_downloads/demo_ticklabel_alignment.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/demo_ticklabel_direction.md b/Python/matplotlab/gallery/axisartist/demo_ticklabel_direction.md new file mode 100644 index 00000000..3ad1077b --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/demo_ticklabel_direction.md @@ -0,0 +1,47 @@ +# Ticklabel方向演示 + +![Ticklabel方向演示](https://matplotlib.org/_images/sphx_glr_demo_ticklabel_alignment_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + + +def setup_axes(fig, rect): + ax = axisartist.Subplot(fig, rect) + fig.add_subplot(ax) + + ax.set_yticks([0.2, 0.8]) + ax.set_yticklabels(["short", "loooong"]) + ax.set_xticks([0.2, 0.8]) + ax.set_xticklabels([r"$\frac{1}{2}\pi$", r"$\pi$"]) + + return ax + + +fig = plt.figure(1, figsize=(3, 5)) +fig.subplots_adjust(left=0.5, hspace=0.7) + +ax = setup_axes(fig, 311) +ax.set_ylabel("ha=right") +ax.set_xlabel("va=baseline") + +ax = setup_axes(fig, 312) +ax.axis["left"].major_ticklabels.set_ha("center") +ax.axis["bottom"].major_ticklabels.set_va("top") +ax.set_ylabel("ha=center") +ax.set_xlabel("va=top") + +ax = setup_axes(fig, 313) +ax.axis["left"].major_ticklabels.set_ha("left") +ax.axis["bottom"].major_ticklabels.set_va("bottom") +ax.set_ylabel("ha=left") +ax.set_xlabel("va=bottom") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_ticklabel_alignment.py](https://matplotlib.org/_downloads/demo_ticklabel_alignment.py) +- [下载Jupyter notebook: demo_ticklabel_alignment.ipynb](https://matplotlib.org/_downloads/demo_ticklabel_alignment.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/simple_axis_direction01.md b/Python/matplotlab/gallery/axisartist/simple_axis_direction01.md new file mode 100644 index 00000000..950b14d4 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/simple_axis_direction01.md @@ -0,0 +1,26 @@ +# 简单轴方向01 + +![简单轴方向01](https://matplotlib.org/_images/sphx_glr_simple_axis_direction01_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + +fig = plt.figure(figsize=(4, 2.5)) +ax1 = fig.add_subplot(axisartist.Subplot(fig, "111")) +fig.subplots_adjust(right=0.8) + +ax1.axis["left"].major_ticklabels.set_axis_direction("top") +ax1.axis["left"].label.set_text("Label") + +ax1.axis["right"].label.set_visible(True) +ax1.axis["right"].label.set_text("Label") +ax1.axis["right"].label.set_axis_direction("left") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axis_direction01.py](https://matplotlib.org/_downloads/simple_axis_direction01.py) +- [下载Jupyter notebook: simple_axis_direction01.ipynb](https://matplotlib.org/_downloads/simple_axis_direction01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/simple_axis_direction03.md b/Python/matplotlab/gallery/axisartist/simple_axis_direction03.md new file mode 100644 index 00000000..fa960dfb --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/simple_axis_direction03.md @@ -0,0 +1,41 @@ +# 简单轴方向03 + +![简单轴方向03](https://matplotlib.org/_images/sphx_glr_simple_axis_direction03_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as axisartist + + +def setup_axes(fig, rect): + ax = axisartist.Subplot(fig, rect) + fig.add_subplot(ax) + + ax.set_yticks([0.2, 0.8]) + ax.set_xticks([0.2, 0.8]) + + return ax + + +fig = plt.figure(1, figsize=(5, 2)) +fig.subplots_adjust(wspace=0.4, bottom=0.3) + +ax1 = setup_axes(fig, "121") +ax1.set_xlabel("X-label") +ax1.set_ylabel("Y-label") + +ax1.axis[:].invert_ticklabel_direction() + +ax2 = setup_axes(fig, "122") +ax2.set_xlabel("X-label") +ax2.set_ylabel("Y-label") + +ax2.axis[:].major_ticks.set_tick_out(True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axis_direction03.py](https://matplotlib.org/_downloads/simple_axis_direction03.py) +- [下载Jupyter notebook: simple_axis_direction03.ipynb](https://matplotlib.org/_downloads/simple_axis_direction03.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/simple_axis_pad.md b/Python/matplotlab/gallery/axisartist/simple_axis_pad.md new file mode 100644 index 00000000..0774fd6e --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/simple_axis_pad.md @@ -0,0 +1,114 @@ +# 简单的轴垫 + +![简单的轴垫](https://matplotlib.org/_images/sphx_glr_simple_axis_pad_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist.angle_helper as angle_helper +import mpl_toolkits.axisartist.grid_finder as grid_finder +from matplotlib.projections import PolarAxes +from matplotlib.transforms import Affine2D + +import mpl_toolkits.axisartist as axisartist + +from mpl_toolkits.axisartist.grid_helper_curvelinear import \ + GridHelperCurveLinear + + +def setup_axes(fig, rect): + """ + polar projection, but in a rectangular box. + """ + + # see demo_curvelinear_grid.py for details + tr = Affine2D().scale(np.pi/180., 1.) + PolarAxes.PolarTransform() + + extreme_finder = angle_helper.ExtremeFinderCycle(20, 20, + lon_cycle=360, + lat_cycle=None, + lon_minmax=None, + lat_minmax=(0, np.inf), + ) + + grid_locator1 = angle_helper.LocatorDMS(12) + grid_locator2 = grid_finder.MaxNLocator(5) + + tick_formatter1 = angle_helper.FormatterDMS() + + grid_helper = GridHelperCurveLinear(tr, + extreme_finder=extreme_finder, + grid_locator1=grid_locator1, + grid_locator2=grid_locator2, + tick_formatter1=tick_formatter1 + ) + + ax1 = axisartist.Subplot(fig, rect, grid_helper=grid_helper) + ax1.axis[:].set_visible(False) + + fig.add_subplot(ax1) + + ax1.set_aspect(1.) + ax1.set_xlim(-5, 12) + ax1.set_ylim(-5, 10) + + return ax1 + + +def add_floating_axis1(ax1): + ax1.axis["lat"] = axis = ax1.new_floating_axis(0, 30) + axis.label.set_text(r"$\theta = 30^{\circ}$") + axis.label.set_visible(True) + + return axis + + +def add_floating_axis2(ax1): + ax1.axis["lon"] = axis = ax1.new_floating_axis(1, 6) + axis.label.set_text(r"$r = 6$") + axis.label.set_visible(True) + + return axis + + +fig = plt.figure(1, figsize=(9, 3.)) +fig.clf() +fig.subplots_adjust(left=0.01, right=0.99, bottom=0.01, top=0.99, + wspace=0.01, hspace=0.01) + + +def ann(ax1, d): + if plt.rcParams["text.usetex"]: + d = d.replace("_", r"\_") + + ax1.annotate(d, (0.5, 1), (5, -5), + xycoords="axes fraction", textcoords="offset points", + va="top", ha="center") + + +ax1 = setup_axes(fig, rect=141) +axis = add_floating_axis1(ax1) +ann(ax1, r"default") + +ax1 = setup_axes(fig, rect=142) +axis = add_floating_axis1(ax1) +axis.major_ticklabels.set_pad(10) +ann(ax1, r"ticklabels.set_pad(10)") + +ax1 = setup_axes(fig, rect=143) +axis = add_floating_axis1(ax1) +axis.label.set_pad(20) +ann(ax1, r"label.set_pad(20)") + +ax1 = setup_axes(fig, rect=144) +axis = add_floating_axis1(ax1) +axis.major_ticks.set_tick_out(True) +ann(ax1, "ticks.set_tick_out(True)") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axis_pad.py](https://matplotlib.org/_downloads/simple_axis_pad.py) +- [下载Jupyter notebook: simple_axis_pad.ipynb](https://matplotlib.org/_downloads/simple_axis_pad.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/simple_axisartist1.md b/Python/matplotlab/gallery/axisartist/simple_axisartist1.md new file mode 100644 index 00000000..cabad3b5 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/simple_axisartist1.md @@ -0,0 +1,32 @@ +# 简单的Axisartist1 + +![简单的Axisartist1](https://matplotlib.org/_images/sphx_glr_simple_axisartist1_001.png) + +```python +import matplotlib.pyplot as plt +import mpl_toolkits.axisartist as AA + +fig = plt.figure(1) +fig.subplots_adjust(right=0.85) +ax = AA.Subplot(fig, 1, 1, 1) +fig.add_subplot(ax) + +# make some axis invisible +ax.axis["bottom", "top", "right"].set_visible(False) + +# make an new axis along the first axis axis (x-axis) which pass +# through y=0. +ax.axis["y=0"] = ax.new_floating_axis(nth_coord=0, value=0, + axis_direction="bottom") +ax.axis["y=0"].toggle(all=True) +ax.axis["y=0"].label.set_text("y = 0") + +ax.set_ylim(-2, 4) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axisartist1.py](https://matplotlib.org/_downloads/simple_axisartist1.py) +- [下载Jupyter notebook: simple_axisartist1.ipynb](https://matplotlib.org/_downloads/simple_axisartist1.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/simple_axisline.md b/Python/matplotlab/gallery/axisartist/simple_axisline.md new file mode 100644 index 00000000..6eb47626 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/simple_axisline.md @@ -0,0 +1,45 @@ +# 简单的轴线 + +![简单的轴线](https://matplotlib.org/_images/sphx_glr_simple_axisline_001.png) + +```python +import matplotlib.pyplot as plt + +from mpl_toolkits.axisartist.axislines import SubplotZero + + +fig = plt.figure(1) +fig.subplots_adjust(right=0.85) +ax = SubplotZero(fig, 1, 1, 1) +fig.add_subplot(ax) + +# make right and top axis invisible +ax.axis["right"].set_visible(False) +ax.axis["top"].set_visible(False) + +# make xzero axis (horizontal axis line through y=0) visible. +ax.axis["xzero"].set_visible(True) +ax.axis["xzero"].label.set_text("Axis Zero") + +ax.set_ylim(-2, 4) +ax.set_xlabel("Label X") +ax.set_ylabel("Label Y") +# or +#ax.axis["bottom"].label.set_text("Label X") +#ax.axis["left"].label.set_text("Label Y") + +# make new (right-side) yaxis, but with some offset +offset = (20, 0) +new_axisline = ax.get_grid_helper().new_fixed_axis + +ax.axis["right2"] = new_axisline(loc="right", offset=offset, axes=ax) +ax.axis["right2"].label.set_text("Label Y2") + +ax.plot([-2, 3, 2]) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axisline.py](https://matplotlib.org/_downloads/simple_axisline.py) +- [下载Jupyter notebook: simple_axisline.ipynb](https://matplotlib.org/_downloads/simple_axisline.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/simple_axisline2.md b/Python/matplotlab/gallery/axisartist/simple_axisline2.md new file mode 100644 index 00000000..2955fdb9 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/simple_axisline2.md @@ -0,0 +1,34 @@ +# 简单的轴线2 + +![简单的轴线2](https://matplotlib.org/_images/sphx_glr_simple_axisline2_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axisartist.axislines import SubplotZero +import numpy as np + +fig = plt.figure(1, (4, 3)) + +# a subplot with two additional axis, "xzero" and "yzero". "xzero" is +# y=0 line, and "yzero" is x=0 line. +ax = SubplotZero(fig, 1, 1, 1) +fig.add_subplot(ax) + +# make xzero axis (horizontal axis line through y=0) visible. +ax.axis["xzero"].set_visible(True) +ax.axis["xzero"].label.set_text("Axis Zero") + +# make other axis (bottom, top, right) invisible. +for n in ["bottom", "top", "right"]: + ax.axis[n].set_visible(False) + +xx = np.arange(0, 2*np.pi, 0.01) +ax.plot(xx, np.sin(xx)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axisline2.py](https://matplotlib.org/_downloads/simple_axisline2.py) +- [下载Jupyter notebook: simple_axisline2.ipynb](https://matplotlib.org/_downloads/simple_axisline2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/axisartist/simple_axisline3.md b/Python/matplotlab/gallery/axisartist/simple_axisline3.md new file mode 100644 index 00000000..32461803 --- /dev/null +++ b/Python/matplotlab/gallery/axisartist/simple_axisline3.md @@ -0,0 +1,23 @@ +# 简单的轴线3 + +![简单的轴线3](https://matplotlib.org/_images/sphx_glr_simple_axisline3_001.png) + +```python +import matplotlib.pyplot as plt +from mpl_toolkits.axisartist.axislines import Subplot + +fig = plt.figure(1, (3, 3)) + +ax = Subplot(fig, 111) +fig.add_subplot(ax) + +ax.axis["right"].set_visible(False) +ax.axis["top"].set_visible(False) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_axisline3.py](https://matplotlib.org/_downloads/simple_axisline3.py) +- [下载Jupyter notebook: simple_axisline3.ipynb](https://matplotlib.org/_downloads/simple_axisline3.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/color_by_yvalue.md b/Python/matplotlab/gallery/color/color_by_yvalue.md new file mode 100644 index 00000000..8df23a9a --- /dev/null +++ b/Python/matplotlab/gallery/color/color_by_yvalue.md @@ -0,0 +1,40 @@ +# 通过y-value绘制颜色 + +使用掩码数组以y值绘制具有不同颜色的线。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +t = np.arange(0.0, 2.0, 0.01) +s = np.sin(2 * np.pi * t) + +upper = 0.77 +lower = -0.77 + + +supper = np.ma.masked_where(s < upper, s) +slower = np.ma.masked_where(s > lower, s) +smiddle = np.ma.masked_where(np.logical_or(s < lower, s > upper), s) + +fig, ax = plt.subplots() +ax.plot(t, smiddle, t, slower, t, supper) +plt.show() +``` + +![y-value绘制颜色示例](https://matplotlib.org/_images/sphx_glr_color_by_yvalue_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.plot +matplotlib.pyplot.plot +``` + +## 下载这个示例 + +- [下载python源码: color_by_yvalue.py](https://matplotlib.org/_downloads/color_by_yvalue.py) +- [下载Jupyter notebook: color_by_yvalue.ipynb](https://matplotlib.org/_downloads/color_by_yvalue.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/color_cycle_default.md b/Python/matplotlab/gallery/color/color_cycle_default.md new file mode 100644 index 00000000..a2aaadd9 --- /dev/null +++ b/Python/matplotlab/gallery/color/color_cycle_default.md @@ -0,0 +1,60 @@ +# 默认属性循环中的颜色 + +显示默认prop_cycle中的颜色,该颜色是从[rc参数](https://matplotlib.org/tutorials/introductory/customizing.html)中获取的。 + +```python +import numpy as np +import matplotlib.pyplot as plt + + +prop_cycle = plt.rcParams['axes.prop_cycle'] +colors = prop_cycle.by_key()['color'] + +lwbase = plt.rcParams['lines.linewidth'] +thin = lwbase / 2 +thick = lwbase * 3 + +fig, axs = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True) +for icol in range(2): + if icol == 0: + lwx, lwy = thin, lwbase + else: + lwx, lwy = lwbase, thick + for irow in range(2): + for i, color in enumerate(colors): + axs[irow, icol].axhline(i, color=color, lw=lwx) + axs[irow, icol].axvline(i, color=color, lw=lwy) + + axs[1, icol].set_facecolor('k') + axs[1, icol].xaxis.set_ticks(np.arange(0, 10, 2)) + axs[0, icol].set_title('line widths (pts): %g, %g' % (lwx, lwy), + fontsize='medium') + +for irow in range(2): + axs[irow, 0].yaxis.set_ticks(np.arange(0, 10, 2)) + +fig.suptitle('Colors in the default prop_cycle', fontsize='large') + +plt.show() +``` + +![默认属性循环中的颜色示例](https://matplotlib.org/_images/sphx_glr_color_cycle_default_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.axhline +matplotlib.axes.Axes.axvline +matplotlib.pyplot.axhline +matplotlib.pyplot.axvline +matplotlib.axes.Axes.set_facecolor +matplotlib.figure.Figure.suptitle +``` + +## 下载这个示例 + +- [下载python源码: color_cycle_default.py](https://matplotlib.org/_downloads/color_cycle_default.py) +- [下载Jupyter notebook: color_cycle_default.ipynb](https://matplotlib.org/_downloads/color_cycle_default.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/color_cycler.md b/Python/matplotlab/gallery/color/color_cycler.md new file mode 100644 index 00000000..c4ee5c02 --- /dev/null +++ b/Python/matplotlab/gallery/color/color_cycler.md @@ -0,0 +1,54 @@ +# 用cycler定型 + +演示自定义特性-循环设置以控制多行绘制的颜色和其他样式特性。 + +此示例演示了两种不同的API: + +```python +from cycler import cycler +import numpy as np +import matplotlib.pyplot as plt + + +x = np.linspace(0, 2 * np.pi) +offsets = np.linspace(0, 2*np.pi, 4, endpoint=False) +# Create array with shifted-sine curve along each column +yy = np.transpose([np.sin(x + phi) for phi in offsets]) + +# 1. Setting prop cycle on default rc parameter +plt.rc('lines', linewidth=4) +plt.rc('axes', prop_cycle=(cycler(color=['r', 'g', 'b', 'y']) + + cycler(linestyle=['-', '--', ':', '-.']))) +fig, (ax0, ax1) = plt.subplots(nrows=2, constrained_layout=True) +ax0.plot(yy) +ax0.set_title('Set default color cycle to rgby') + +# 2. Define prop cycle for single set of axes +# For the most general use-case, you can provide a cycler to +# `.set_prop_cycle`. +# Here, we use the convenient shortcut that we can alternatively pass +# one or more properties as keyword arguements. This creates and sets +# a cycler iterating simultaneously over all properties. +ax1.set_prop_cycle(color=['c', 'm', 'y', 'k'], lw=[1, 2, 3, 4]) +ax1.plot(yy) +ax1.set_title('Set axes color cycle to cmyk') + +plt.show() +``` + +![cycler定型示例](https://matplotlib.org/_images/sphx_glr_color_cycler_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.plot +matplotlib.axes.Axes.set_prop_cycle +``` + +## 下载这个示例 + +- [下载python源码: color_cycler.py](https://matplotlib.org/_downloads/color_cycler.py) +- [下载Jupyter notebook: color_cycler.ipynb](https://matplotlib.org/_downloads/color_cycler.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/color_demo.md b/Python/matplotlab/gallery/color/color_demo.md new file mode 100644 index 00000000..6a492bba --- /dev/null +++ b/Python/matplotlab/gallery/color/color_demo.md @@ -0,0 +1,68 @@ +# 基本颜色演示 + +Matplotlib为您提供了8种指定颜色的方法: + +1. 在[0, 1]中的浮点值的RGB或RGBA元组(例如 (0.1, 0.2, 0.5) 或 (0.1, 0.2, 0.5, 0.3))。RGBA是红色,绿色,蓝色,Alpha的缩写; +1. 十六进制RGB或RGBA字符串 ``(例如: '#0F0F0F' 或者 '#0F0F0F0F')``; +1. [0, 1]中浮点值的字符串表示,包括灰度级(例如,'0.5'); +1. 单字母字符串,例如这些其中之一:``{'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'}``; +1. 一个 X11/CSS4 ("html") 颜色名称, 例如:"blue"; +1. 来自[xkcd的颜色调研](https://xkcd.com/color/rgb/)的名称,前缀为 'xkcd:' (例如:“xkcd:sky blue”); +1. 一个 “Cn” 颜色规范,即 'C' 后跟一个数字,这是默认属性循环的索引(``matplotlib.rcParams['axes.prop_cycle']``); 索引在艺术家对象创建时发生,如果循环不包括颜色,则默认为黑色。 +1. 其中一个 ``{'tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple', 'tab:brown', 'tab:pink', 'tab:gray', 'tab:olive', 'tab:cyan'}``,它们是'tab10'分类调色板中的Tableau颜色(这是默认的颜色循环); + +有关matplotlib中颜色的更多信息,请参阅: + +- [指定颜色](https://matplotlib.org/tutorials/colors/colors.html)教程; +- [matplotlib.colors](https://matplotlib.org/api/colors_api.html#module-matplotlib.colors) API; +- [可视化命名颜色](https://matplotlib.org/gallery/color/named_colors.html)示例。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +t = np.linspace(0.0, 2.0, 201) +s = np.sin(2 * np.pi * t) + +# 1) RGB tuple: +fig, ax = plt.subplots(facecolor=(.18, .31, .31)) +# 2) hex string: +ax.set_facecolor('#eafff5') +# 3) gray level string: +ax.set_title('Voltage vs. time chart', color='0.7') +# 4) single letter color string +ax.set_xlabel('time (s)', color='c') +# 5) a named color: +ax.set_ylabel('voltage (mV)', color='peachpuff') +# 6) a named xkcd color: +ax.plot(t, s, 'xkcd:crimson') +# 7) Cn notation: +ax.plot(t, .7*s, color='C4', linestyle='--') +# 8) tab notation: +ax.tick_params(labelcolor='tab:orange') + + +plt.show() +``` + +![基本颜色演示](https://matplotlib.org/_images/sphx_glr_color_demo_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.colors +matplotlib.axes.Axes.plot +matplotlib.axes.Axes.set_facecolor +matplotlib.axes.Axes.set_title +matplotlib.axes.Axes.set_xlabel +matplotlib.axes.Axes.set_ylabel +matplotlib.axes.Axes.tick_params +``` + +## 下载这个示例 + +- [下载python源码: color_demo.py](https://matplotlib.org/_downloads/color_demo.py) +- [下载Jupyter notebook: color_demo.ipynb](https://matplotlib.org/_downloads/color_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/colorbar_basics.md b/Python/matplotlab/gallery/color/colorbar_basics.md new file mode 100644 index 00000000..6a9db787 --- /dev/null +++ b/Python/matplotlab/gallery/color/colorbar_basics.md @@ -0,0 +1,63 @@ +# 颜色条 + +通过指定可映射对象(此处为[imshow](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow)返回的 [AxesImage](https://matplotlib.org/api/image_api.html#matplotlib.image.AxesImage) )和要将颜色条附加到的轴来使用 [colorbar](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.colorbar)。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# setup some generic data +N = 37 +x, y = np.mgrid[:N, :N] +Z = (np.cos(x*0.2) + np.sin(y*0.3)) + +# mask out the negative and positive values, respectively +Zpos = np.ma.masked_less(Z, 0) +Zneg = np.ma.masked_greater(Z, 0) + +fig, (ax1, ax2, ax3) = plt.subplots(figsize=(13, 3), ncols=3) + +# plot just the positive data and save the +# color "mappable" object returned by ax1.imshow +pos = ax1.imshow(Zpos, cmap='Blues', interpolation='none') + +# add the colorbar using the figure's method, +# telling which mappable we're talking about and +# which axes object it should be near +fig.colorbar(pos, ax=ax1) + +# repeat everything above for the negative data +neg = ax2.imshow(Zneg, cmap='Reds_r', interpolation='none') +fig.colorbar(neg, ax=ax2) + +# Plot both positive and negative values betwen +/- 1.2 +pos_neg_clipped = ax3.imshow(Z, cmap='RdBu', vmin=-1.2, vmax=1.2, + interpolation='none') +# Add minorticks on the colorbar to make it easy to read the +# values off the colorbar. +cbar = fig.colorbar(pos_neg_clipped, ax=ax3, extend='both') +cbar.minorticks_on() +plt.show() +``` + +![颜色条示例](https://matplotlib.org/_images/sphx_glr_colorbar_basics_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +import matplotlib.colorbar +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colorbar.Colorbar.minorticks_on +matplotlib.colorbar.Colorbar.minorticks_off +``` + +## 下载这个示例 + +- [下载python源码: colorbar_basics.py](https://matplotlib.org/_downloads/colorbar_basics.py) +- [下载Jupyter notebook: colorbar_basics.ipynb](https://matplotlib.org/_downloads/colorbar_basics.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/colormap_reference.md b/Python/matplotlab/gallery/color/colormap_reference.md new file mode 100644 index 00000000..f9f9447e --- /dev/null +++ b/Python/matplotlab/gallery/color/colormap_reference.md @@ -0,0 +1,96 @@ +# Colormap参考 + +Matplotlib附带的色彩映射参考。 + +通过将 ``_r`` 附加到名称(例如,``viridis_r``),可以获得每个这些颜色映射的反转版本。 + +请参阅在[Matplotlib中选择Colormaps](https://matplotlib.org/tutorials/colors/colormaps.html)以深入讨论色彩映射,包括colorblind-friendlyliness。 + +```python +import numpy as np +import matplotlib.pyplot as plt + + +cmaps = [('Perceptually Uniform Sequential', [ + 'viridis', 'plasma', 'inferno', 'magma', 'cividis']), + ('Sequential', [ + 'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds', + 'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu', + 'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']), + ('Sequential (2)', [ + 'binary', 'gist_yarg', 'gist_gray', 'gray', 'bone', 'pink', + 'spring', 'summer', 'autumn', 'winter', 'cool', 'Wistia', + 'hot', 'afmhot', 'gist_heat', 'copper']), + ('Diverging', [ + 'PiYG', 'PRGn', 'BrBG', 'PuOr', 'RdGy', 'RdBu', + 'RdYlBu', 'RdYlGn', 'Spectral', 'coolwarm', 'bwr', 'seismic']), + ('Cyclic', ['twilight', 'twilight_shifted', 'hsv']), + ('Qualitative', [ + 'Pastel1', 'Pastel2', 'Paired', 'Accent', + 'Dark2', 'Set1', 'Set2', 'Set3', + 'tab10', 'tab20', 'tab20b', 'tab20c']), + ('Miscellaneous', [ + 'flag', 'prism', 'ocean', 'gist_earth', 'terrain', 'gist_stern', + 'gnuplot', 'gnuplot2', 'CMRmap', 'cubehelix', 'brg', + 'gist_rainbow', 'rainbow', 'jet', 'nipy_spectral', 'gist_ncar'])] + + +gradient = np.linspace(0, 1, 256) +gradient = np.vstack((gradient, gradient)) + + +def plot_color_gradients(cmap_category, cmap_list): + # Create figure and adjust figure height to number of colormaps + nrows = len(cmap_list) + figh = 0.35 + 0.15 + (nrows + (nrows-1)*0.1)*0.22 + fig, axes = plt.subplots(nrows=nrows, figsize=(6.4, figh)) + fig.subplots_adjust(top=1-.35/figh, bottom=.15/figh, left=0.2, right=0.99) + + axes[0].set_title(cmap_category + ' colormaps', fontsize=14) + + for ax, name in zip(axes, cmap_list): + ax.imshow(gradient, aspect='auto', cmap=plt.get_cmap(name)) + ax.text(-.01, .5, name, va='center', ha='right', fontsize=10, + transform=ax.transAxes) + + # Turn off *all* ticks & spines, not just the ones with colormaps. + for ax in axes: + ax.set_axis_off() + + +for cmap_category, cmap_list in cmaps: + plot_color_gradients(cmap_category, cmap_list) + +plt.show() +``` + +![Colormap参考示例](https://matplotlib.org/_images/sphx_glr_colormap_reference_001.png) + +![Colormap参考示例2](https://matplotlib.org/_images/sphx_glr_colormap_reference_002.png) + +![Colormap参考示例3](https://matplotlib.org/_images/sphx_glr_colormap_reference_003.png) + +![Colormap参考示例4](https://matplotlib.org/_images/sphx_glr_colormap_reference_004.png) + +![Colormap参考示例5](https://matplotlib.org/_images/sphx_glr_colormap_reference_005.png) + +![Colormap参考示例6](https://matplotlib.org/_images/sphx_glr_colormap_reference_006.png) + +![Colormap参考示例7](https://matplotlib.org/_images/sphx_glr_colormap_reference_007.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.colors +matplotlib.axes.Axes.imshow +matplotlib.figure.Figure.text +matplotlib.axes.Axes.set_axis_off +``` + +## 下载这个示例 + +- [下载python源码: colormap_reference.py](https://matplotlib.org/_downloads/colormap_reference.py) +- [下载Jupyter notebook: colormap_reference.ipynb](https://matplotlib.org/_downloads/colormap_reference.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/custom_cmap.md b/Python/matplotlab/gallery/color/custom_cmap.md new file mode 100644 index 00000000..6fd39908 --- /dev/null +++ b/Python/matplotlab/gallery/color/custom_cmap.md @@ -0,0 +1,229 @@ +# 从颜色列表创建颜色映射 + +有关创建和操作色彩映射的更多详细信息,请参阅[在Matplotlib中创建色彩映射](https://matplotlib.org/tutorials/colors/colormap-manipulation.html)。 + +可以使用LinearSegmentedColormap的[from_list()](https://matplotlib.org/api/_as_gen/matplotlib.colors.LinearSegmentedColormap.html#matplotlib.colors.LinearSegmentedColormap.from_list)方法从颜色列表创建[颜色映射](https://matplotlib.org/tutorials/colors/colormaps.html)。您必须传递一个RGB元组列表,用于定义从0到1的颜色混合。 + +## 创建自定义色彩映射 + +也可以为色彩映射创建自定义映射。 这是通过创建字典来实现的,该字典指定RGB通道如何从cmap的一端变为另一端。 + +示例:假设您希望红色在下半部分从0增加到1,绿色在中间半部分增加到相同,而在上半部分则为蓝色。 然后你会用: + +```python +cdict = {'red': ((0.0, 0.0, 0.0), + (0.5, 1.0, 1.0), + (1.0, 1.0, 1.0)), + + 'green': ((0.0, 0.0, 0.0), + (0.25, 0.0, 0.0), + (0.75, 1.0, 1.0), + (1.0, 1.0, 1.0)), + + 'blue': ((0.0, 0.0, 0.0), + (0.5, 0.0, 0.0), + (1.0, 1.0, 1.0))} +``` + +如果在这个例子中,r,g和b组件中没有不连续性,那么它很简单:上面每个元组的第二个和第三个元素是相同的 - 称之为“y”。 第一个元素(“x”)定义了0到1整个范围内的插值间隔,它必须跨越整个范围。换句话说,x的值将0到1范围划分为一组段,并且y给出每个段的端点颜色值。 + +现在考虑绿色。cdict['green']表示对于0 <= x <= 0.25,y为零; 没有绿色。0.25 < x <= 0.75,y从0到1线性变化.x > 0.75,y保持为1,全绿色。 + +如果存在不连续性,则会更复杂一些。将给定颜色的cdict条目中每行中的3个元素标记为(x, y0, y1)。然后,对于x[i] 和 x[i + 1]之间的x值,在 y1[i] 和 y0[i + 1] 之间内插颜色值。 + +回到指南里的例子,看看cdict['red']; 因为y0!= y1,它表示对于x从0到0.5,红色从0增加到1,但随后它向下跳跃,因此对于x从0.5到1,红色从0.7增加到1.绿色斜坡从0开始 当x从0变为0.5时变为1,然后跳回0,当x从0.5变为1时,斜坡变回1: + +```python +row i: x y0 y1 + / + / +row i+1: x y0 y1 +``` + +以上是试图表明对于x[i]到 x[i + 1] 范围内的x,插值在 y1[i] 和 y0[i + 1] 之间。因此,永远不会使用y0[0] 和 y1[-1]。 + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.colors import LinearSegmentedColormap + +# Make some illustrative fake data: + +x = np.arange(0, np.pi, 0.1) +y = np.arange(0, 2 * np.pi, 0.1) +X, Y = np.meshgrid(x, y) +Z = np.cos(X) * np.sin(Y) * 10 +``` + +--- 列表中的色彩映射 --- + +```python +colors = [(1, 0, 0), (0, 1, 0), (0, 0, 1)] # R -> G -> B +n_bins = [3, 6, 10, 100] # Discretizes the interpolation into bins +cmap_name = 'my_list' +fig, axs = plt.subplots(2, 2, figsize=(6, 9)) +fig.subplots_adjust(left=0.02, bottom=0.06, right=0.95, top=0.94, wspace=0.05) +for n_bin, ax in zip(n_bins, axs.ravel()): + # Create the colormap + cm = LinearSegmentedColormap.from_list( + cmap_name, colors, N=n_bin) + # Fewer bins will result in "coarser" colomap interpolation + im = ax.imshow(Z, interpolation='nearest', origin='lower', cmap=cm) + ax.set_title("N bins: %s" % n_bin) + fig.colorbar(im, ax=ax) +``` + +![从颜色列表创建颜色映射示例](https://matplotlib.org/_images/sphx_glr_custom_cmap_001.png) + +--- 自定义色彩映射 --- + +```python +cdict1 = {'red': ((0.0, 0.0, 0.0), + (0.5, 0.0, 0.1), + (1.0, 1.0, 1.0)), + + 'green': ((0.0, 0.0, 0.0), + (1.0, 0.0, 0.0)), + + 'blue': ((0.0, 0.0, 1.0), + (0.5, 0.1, 0.0), + (1.0, 0.0, 0.0)) + } + +cdict2 = {'red': ((0.0, 0.0, 0.0), + (0.5, 0.0, 1.0), + (1.0, 0.1, 1.0)), + + 'green': ((0.0, 0.0, 0.0), + (1.0, 0.0, 0.0)), + + 'blue': ((0.0, 0.0, 0.1), + (0.5, 1.0, 0.0), + (1.0, 0.0, 0.0)) + } + +cdict3 = {'red': ((0.0, 0.0, 0.0), + (0.25, 0.0, 0.0), + (0.5, 0.8, 1.0), + (0.75, 1.0, 1.0), + (1.0, 0.4, 1.0)), + + 'green': ((0.0, 0.0, 0.0), + (0.25, 0.0, 0.0), + (0.5, 0.9, 0.9), + (0.75, 0.0, 0.0), + (1.0, 0.0, 0.0)), + + 'blue': ((0.0, 0.0, 0.4), + (0.25, 1.0, 1.0), + (0.5, 1.0, 0.8), + (0.75, 0.0, 0.0), + (1.0, 0.0, 0.0)) + } + +# Make a modified version of cdict3 with some transparency +# in the middle of the range. +cdict4 = {**cdict3, + 'alpha': ((0.0, 1.0, 1.0), + # (0.25,1.0, 1.0), + (0.5, 0.3, 0.3), + # (0.75,1.0, 1.0), + (1.0, 1.0, 1.0)), + } +``` + +现在我们将使用此示例来说明处理自定义色彩映射的3种方法。首先,最直接和明确的: + +```python +blue_red1 = LinearSegmentedColormap('BlueRed1', cdict1) +``` + +其次,显式创建地图并注册它。与第一种方法一样,此方法适用于任何类型的Colormap,而不仅仅是LinearSegmentedColormap: + +```python +blue_red2 = LinearSegmentedColormap('BlueRed2', cdict2) +plt.register_cmap(cmap=blue_red2) +``` + +第三,仅对于LinearSegmentedColormap,将所有内容保留为register_cmap: + +```python +plt.register_cmap(name='BlueRed3', data=cdict3) # optional lut kwarg +plt.register_cmap(name='BlueRedAlpha', data=cdict4) +``` + +制作图: + +```python +fig, axs = plt.subplots(2, 2, figsize=(6, 9)) +fig.subplots_adjust(left=0.02, bottom=0.06, right=0.95, top=0.94, wspace=0.05) + +# Make 4 subplots: + +im1 = axs[0, 0].imshow(Z, interpolation='nearest', cmap=blue_red1) +fig.colorbar(im1, ax=axs[0, 0]) + +cmap = plt.get_cmap('BlueRed2') +im2 = axs[1, 0].imshow(Z, interpolation='nearest', cmap=cmap) +fig.colorbar(im2, ax=axs[1, 0]) + +# Now we will set the third cmap as the default. One would +# not normally do this in the middle of a script like this; +# it is done here just to illustrate the method. + +plt.rcParams['image.cmap'] = 'BlueRed3' + +im3 = axs[0, 1].imshow(Z, interpolation='nearest') +fig.colorbar(im3, ax=axs[0, 1]) +axs[0, 1].set_title("Alpha = 1") + +# Or as yet another variation, we can replace the rcParams +# specification *before* the imshow with the following *after* +# imshow. +# This sets the new default *and* sets the colormap of the last +# image-like item plotted via pyplot, if any. +# + +# Draw a line with low zorder so it will be behind the image. +axs[1, 1].plot([0, 10 * np.pi], [0, 20 * np.pi], color='c', lw=20, zorder=-1) + +im4 = axs[1, 1].imshow(Z, interpolation='nearest') +fig.colorbar(im4, ax=axs[1, 1]) + +# Here it is: changing the colormap for the current image and its +# colorbar after they have been plotted. +im4.set_cmap('BlueRedAlpha') +axs[1, 1].set_title("Varying alpha") +# + +fig.suptitle('Custom Blue-Red colormaps', fontsize=16) +fig.subplots_adjust(top=0.9) + +plt.show() +``` + +![从颜色列表创建颜色映射示例2](https://matplotlib.org/_images/sphx_glr_custom_cmap_002.png) + +### 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colors +matplotlib.colors.LinearSegmentedColormap +matplotlib.colors.LinearSegmentedColormap.from_list +matplotlib.cm +matplotlib.cm.ScalarMappable.set_cmap +matplotlib.pyplot.register_cmap +matplotlib.cm.register_cmap +``` + +## 下载这个示例 + +- [下载python源码: custom_cmap.py](https://matplotlib.org/_downloads/custom_cmap.py) +- [下载Jupyter notebook: custom_cmap.ipynb](https://matplotlib.org/_downloads/custom_cmap.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/index.md b/Python/matplotlab/gallery/color/index.md new file mode 100644 index 00000000..609e620f --- /dev/null +++ b/Python/matplotlab/gallery/color/index.md @@ -0,0 +1,3 @@ +# 颜色相关示例 + +有关matplotlib中可用的色彩映射的更深入信息及其属性的说明,请参阅[colormaps教程](https://matplotlib.org/tutorials/index.html#tutorials-colors)。 \ No newline at end of file diff --git a/Python/matplotlab/gallery/color/named_colors.md b/Python/matplotlab/gallery/color/named_colors.md new file mode 100644 index 00000000..2201ee6f --- /dev/null +++ b/Python/matplotlab/gallery/color/named_colors.md @@ -0,0 +1,106 @@ +# 可视化命名颜色 + +这绘制了matplotlib中支持的命名颜色列表。 请注意,也支持[xkcd颜色](https://matplotlib.org/tutorials/colors/colors.html#xkcd-colors),但为简洁起见,此处未列出。 + +有关matplotlib中颜色的更多信息,请参阅: + +- [指定颜色](https://matplotlib.org/tutorials/colors/colors.html)教程; +- [matplotlib.colors](https://matplotlib.org/api/colors_api.html#module-matplotlib.colors) API; +- [颜色演示](https://matplotlib.org/gallery/color/color_demo.html)。 + +```python +import matplotlib.pyplot as plt +import matplotlib.colors as mcolors + + +def plot_colortable(colors, title, sort_colors=True, emptycols=0): + + cell_width = 212 + cell_height = 22 + swatch_width = 48 + margin = 12 + topmargin = 40 + + # Sort colors by hue, saturation, value and name. + by_hsv = ((tuple(mcolors.rgb_to_hsv(mcolors.to_rgba(color)[:3])), name) + for name, color in colors.items()) + if sort_colors is True: + by_hsv = sorted(by_hsv) + names = [name for hsv, name in by_hsv] + + n = len(names) + ncols = 4 - emptycols + nrows = n // ncols + int(n % ncols > 0) + + width = cell_width * 4 + 2 * margin + height = cell_height * nrows + margin + topmargin + dpi = 72 + + fig, ax = plt.subplots(figsize=(width / dpi, height / dpi), dpi=dpi) + fig.subplots_adjust(margin/width, margin/height, + (width-margin)/width, (height-topmargin)/height) + ax.set_xlim(0, cell_width * 4) + ax.set_ylim(cell_height * (nrows-0.5), -cell_height/2.) + ax.yaxis.set_visible(False) + ax.xaxis.set_visible(False) + ax.set_axis_off() + ax.set_title(title, fontsize=24, loc="left", pad=10) + + for i, name in enumerate(names): + row = i % nrows + col = i // nrows + y = row * cell_height + + swatch_start_x = cell_width * col + swatch_end_x = cell_width * col + swatch_width + text_pos_x = cell_width * col + swatch_width + 7 + + ax.text(text_pos_x, y, name, fontsize=14, + horizontalalignment='left', + verticalalignment='center') + + ax.hlines(y, swatch_start_x, swatch_end_x, + color=colors[name], linewidth=18) + + return fig + +plot_colortable(mcolors.BASE_COLORS, "Base Colors", + sort_colors=False, emptycols=1) +plot_colortable(mcolors.TABLEAU_COLORS, "Tableau Palette", + sort_colors=False, emptycols=2) + +#sphinx_gallery_thumbnail_number = 3 +plot_colortable(mcolors.CSS4_COLORS, "CSS Colors") + +# Optionally plot the XKCD colors (Caution: will produce large figure) +#xkcd_fig = plot_colortable(mcolors.XKCD_COLORS, "XKCD Colors") +#xkcd_fig.savefig("XKCD_Colors.png") + +plt.show() +``` + +![可视化命名颜色示例](https://matplotlib.org/_images/sphx_glr_named_colors_001.png) + +![可视化命名颜色示例2](https://matplotlib.org/_images/sphx_glr_named_colors_002.png) + +![可视化命名颜色示例3](https://matplotlib.org/_images/sphx_glr_named_colors_003.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.colors +matplotlib.colors.rgb_to_hsv +matplotlib.colors.to_rgba +matplotlib.figure.Figure.get_size_inches +matplotlib.figure.Figure.subplots_adjust +matplotlib.axes.Axes.text +matplotlib.axes.Axes.hlines +``` + +## 下载这个示例 + +- [下载python源码: named_colors.py](https://matplotlib.org/_downloads/named_colors.py) +- [下载Jupyter notebook: named_colors.ipynb](https://matplotlib.org/_downloads/named_colors.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/download.md b/Python/matplotlab/gallery/download.md new file mode 100644 index 00000000..2af13ed9 --- /dev/null +++ b/Python/matplotlab/gallery/download.md @@ -0,0 +1,4 @@ +# 所有示例的下载 + +- [下载python源码: tutorials_python.py](https://matplotlib.org/_downloads/1e213ff4bcc6ccf3128b453a862c04d2/tutorials_python.zip) +- [下载Jupyter notebook: tutorials_jupyter.ipynb](https://matplotlib.org/_downloads/535f1c08124c14d72d66ebe258383fbe/tutorials_jupyter.zip) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/close_event.md b/Python/matplotlab/gallery/event_handling/close_event.md new file mode 100644 index 00000000..8d164e72 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/close_event.md @@ -0,0 +1,24 @@ +# 关闭事件 + +显示图形关闭时发生的连接事件的示例。 + +![关闭事件示例](https://matplotlib.org/_images/sphx_glr_close_event_001.png) + +```python +import matplotlib.pyplot as plt + + +def handle_close(evt): + print('Closed Figure!') + +fig = plt.figure() +fig.canvas.mpl_connect('close_event', handle_close) + +plt.text(0.35, 0.5, 'Close Me!', dict(size=30)) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: close_event.py](https://matplotlib.org/_downloads/close_event.py) +- [下载Jupyter notebook: close_event.ipynb](https://matplotlib.org/_downloads/close_event.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/coords_demo.md b/Python/matplotlab/gallery/event_handling/coords_demo.md new file mode 100644 index 00000000..cb07afaf --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/coords_demo.md @@ -0,0 +1,48 @@ +# Coords 演示 + +如何通过连接到移动和单击事件来与绘图画布交互的示例 + +![Coords 演示](https://matplotlib.org/_images/sphx_glr_coords_demo_001.png) + +```python +import sys +import matplotlib.pyplot as plt +import numpy as np + +t = np.arange(0.0, 1.0, 0.01) +s = np.sin(2 * np.pi * t) +fig, ax = plt.subplots() +ax.plot(t, s) + + +def on_move(event): + # get the x and y pixel coords + x, y = event.x, event.y + + if event.inaxes: + ax = event.inaxes # the axes instance + print('data coords %f %f' % (event.xdata, event.ydata)) + + +def on_click(event): + # get the x and y coords, flip y from top to bottom + x, y = event.x, event.y + if event.button == 1: + if event.inaxes is not None: + print('data coords %f %f' % (event.xdata, event.ydata)) + + +binding_id = plt.connect('motion_notify_event', on_move) +plt.connect('button_press_event', on_click) + +if "test_disconnect" in sys.argv: + print("disconnecting console coordinate printout...") + plt.disconnect(binding_id) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: coords_demo.py](https://matplotlib.org/_downloads/coords_demo.py) +- [下载Jupyter notebook: coords_demo.ipynb](https://matplotlib.org/_downloads/coords_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/data_browser.md b/Python/matplotlab/gallery/event_handling/data_browser.md new file mode 100644 index 00000000..dc787c75 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/data_browser.md @@ -0,0 +1,105 @@ +# 数据浏览器 + +在多个画布之间连接数据。 + +此示例介绍了如何与多个画布交互数据。这样,您可以选择并突出显示一个轴上的点,并在另一个轴上生成该点的数据。 + +![数据浏览器示例](https://matplotlib.org/_images/sphx_glr_data_browser_001.png) + +```python +import numpy as np + + +class PointBrowser(object): + """ + Click on a point to select and highlight it -- the data that + generated the point will be shown in the lower axes. Use the 'n' + and 'p' keys to browse through the next and previous points + """ + + def __init__(self): + self.lastind = 0 + + self.text = ax.text(0.05, 0.95, 'selected: none', + transform=ax.transAxes, va='top') + self.selected, = ax.plot([xs[0]], [ys[0]], 'o', ms=12, alpha=0.4, + color='yellow', visible=False) + + def onpress(self, event): + if self.lastind is None: + return + if event.key not in ('n', 'p'): + return + if event.key == 'n': + inc = 1 + else: + inc = -1 + + self.lastind += inc + self.lastind = np.clip(self.lastind, 0, len(xs) - 1) + self.update() + + def onpick(self, event): + + if event.artist != line: + return True + + N = len(event.ind) + if not N: + return True + + # the click locations + x = event.mouseevent.xdata + y = event.mouseevent.ydata + + distances = np.hypot(x - xs[event.ind], y - ys[event.ind]) + indmin = distances.argmin() + dataind = event.ind[indmin] + + self.lastind = dataind + self.update() + + def update(self): + if self.lastind is None: + return + + dataind = self.lastind + + ax2.cla() + ax2.plot(X[dataind]) + + ax2.text(0.05, 0.9, 'mu=%1.3f\nsigma=%1.3f' % (xs[dataind], ys[dataind]), + transform=ax2.transAxes, va='top') + ax2.set_ylim(-0.5, 1.5) + self.selected.set_visible(True) + self.selected.set_data(xs[dataind], ys[dataind]) + + self.text.set_text('selected: %d' % dataind) + fig.canvas.draw() + + +if __name__ == '__main__': + import matplotlib.pyplot as plt + # Fixing random state for reproducibility + np.random.seed(19680801) + + X = np.random.rand(100, 200) + xs = np.mean(X, axis=1) + ys = np.std(X, axis=1) + + fig, (ax, ax2) = plt.subplots(2, 1) + ax.set_title('click on point to plot time series') + line, = ax.plot(xs, ys, 'o', picker=5) # 5 points tolerance + + browser = PointBrowser() + + fig.canvas.mpl_connect('pick_event', browser.onpick) + fig.canvas.mpl_connect('key_press_event', browser.onpress) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: data_browser.py](https://matplotlib.org/_downloads/data_browser.py) +- [下载Jupyter notebook: data_browser.ipynb](https://matplotlib.org/_downloads/data_browser.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/figure_axes_enter_leave.md b/Python/matplotlab/gallery/event_handling/figure_axes_enter_leave.md new file mode 100644 index 00000000..6ae2b96b --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/figure_axes_enter_leave.md @@ -0,0 +1,62 @@ +# 图轴的进入和离开 + +通过更改进入和离开时的框架颜色来说明图形和轴进入和离开事件 + +```python +import matplotlib.pyplot as plt + + +def enter_axes(event): + print('enter_axes', event.inaxes) + event.inaxes.patch.set_facecolor('yellow') + event.canvas.draw() + + +def leave_axes(event): + print('leave_axes', event.inaxes) + event.inaxes.patch.set_facecolor('white') + event.canvas.draw() + + +def enter_figure(event): + print('enter_figure', event.canvas.figure) + event.canvas.figure.patch.set_facecolor('red') + event.canvas.draw() + + +def leave_figure(event): + print('leave_figure', event.canvas.figure) + event.canvas.figure.patch.set_facecolor('grey') + event.canvas.draw() +``` + +```python +fig1, (ax, ax2) = plt.subplots(2, 1) +fig1.suptitle('mouse hover over figure or axes to trigger events') + +fig1.canvas.mpl_connect('figure_enter_event', enter_figure) +fig1.canvas.mpl_connect('figure_leave_event', leave_figure) +fig1.canvas.mpl_connect('axes_enter_event', enter_axes) +fig1.canvas.mpl_connect('axes_leave_event', leave_axes) +``` + +![图轴的进入和离开示例](https://matplotlib.org/_images/sphx_glr_figure_axes_enter_leave_001.png) + +```python +fig2, (ax, ax2) = plt.subplots(2, 1) +fig2.suptitle('mouse hover over figure or axes to trigger events') + +fig2.canvas.mpl_connect('figure_enter_event', enter_figure) +fig2.canvas.mpl_connect('figure_leave_event', leave_figure) +fig2.canvas.mpl_connect('axes_enter_event', enter_axes) +fig2.canvas.mpl_connect('axes_leave_event', leave_axes) + +plt.show() +``` + +![图轴的进入和离开示例2](https://matplotlib.org/_images/sphx_glr_figure_axes_enter_leave_002.png) + +## 下载这个示例 + +- [下载python源码: figure_axes_enter_leave.py](https://matplotlib.org/_downloads/figure_axes_enter_leave.py) +- [下载Jupyter notebook: figure_axes_enter_leave.ipynb](https://matplotlib.org/_downloads/figure_axes_enter_leave.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/ginput_demo_sgskip.md b/Python/matplotlab/gallery/event_handling/ginput_demo_sgskip.md new file mode 100644 index 00000000..5c568a35 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/ginput_demo_sgskip.md @@ -0,0 +1,19 @@ +# Ginput 演示 + +这提供了交互功能的使用示例,例如ginput。 + +```python +import matplotlib.pyplot as plt +import numpy as np +t = np.arange(10) +plt.plot(t, np.sin(t)) +print("Please click") +x = plt.ginput(3) +print("clicked", x) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: ginput_demo_sgskip.py](https://matplotlib.org/_downloads/ginput_demo_sgskip.py) +- [下载Jupyter notebook: ginput_demo_sgskip.ipynb](https://matplotlib.org/_downloads/ginput_demo_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/ginput_manual_clabel_sgskip.md b/Python/matplotlab/gallery/event_handling/ginput_manual_clabel_sgskip.md new file mode 100644 index 00000000..8e172242 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/ginput_manual_clabel_sgskip.md @@ -0,0 +1,97 @@ +# 交互功能 + +这提供了交互功能的使用示例,例如ginput,waitforbuttonpress和手动clabel放置。 + +必须使用具有图形用户界面的后端以交互方式运行此脚本(例如,使用GTK3Agg后端,而不是PS后端)。 + +另见: ginput_demo.py + +```python +import time + +import numpy as np +import matplotlib.pyplot as plt + + +def tellme(s): + print(s) + plt.title(s, fontsize=16) + plt.draw() +``` + +单击三个点定义三角形 + +```python +plt.clf() +plt.axis([-1., 1., -1., 1.]) +plt.setp(plt.gca(), autoscale_on=False) + +tellme('You will define a triangle, click to begin') + +plt.waitforbuttonpress() + +while True: + pts = [] + while len(pts) < 3: + tellme('Select 3 corners with mouse') + pts = np.asarray(plt.ginput(3, timeout=-1)) + if len(pts) < 3: + tellme('Too few points, starting over') + time.sleep(1) # Wait a second + + ph = plt.fill(pts[:, 0], pts[:, 1], 'r', lw=2) + + tellme('Happy? Key click for yes, mouse click for no') + + if plt.waitforbuttonpress(): + break + + # Get rid of fill + for p in ph: + p.remove() +``` + +现在轮廓根据三角形角的距离 - 只是一个例子 + +```python +# Define a nice function of distance from individual pts +def f(x, y, pts): + z = np.zeros_like(x) + for p in pts: + z = z + 1/(np.sqrt((x - p[0])**2 + (y - p[1])**2)) + return 1/z + + +X, Y = np.meshgrid(np.linspace(-1, 1, 51), np.linspace(-1, 1, 51)) +Z = f(X, Y, pts) + +CS = plt.contour(X, Y, Z, 20) + +tellme('Use mouse to select contour label locations, middle button to finish') +CL = plt.clabel(CS, manual=True) +``` + +现在做一个缩放 + +```python +tellme('Now do a nested zoom, click to begin') +plt.waitforbuttonpress() + +while True: + tellme('Select two corners of zoom, middle mouse button to finish') + pts = np.asarray(plt.ginput(2, timeout=-1)) + + if len(pts) < 2: + break + + pts = np.sort(pts, axis=0) + plt.axis(pts.T.ravel()) + +tellme('All Done!') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: ginput_manual_clabel_sgskip.py](https://matplotlib.org/_downloads/ginput_manual_clabel_sgskip.py) +- [下载Jupyter notebook: ginput_manual_clabel_sgskip.ipynb](https://matplotlib.org/_downloads/ginput_manual_clabel_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/image_slices_viewer.md b/Python/matplotlab/gallery/event_handling/image_slices_viewer.md new file mode 100644 index 00000000..528feaaa --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/image_slices_viewer.md @@ -0,0 +1,52 @@ +# 图像切片查看器 + +滚动三维阵列的二维图像切片。 + +![图像切片查看器](https://matplotlib.org/_images/sphx_glr_image_slices_viewer_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +class IndexTracker(object): + def __init__(self, ax, X): + self.ax = ax + ax.set_title('use scroll wheel to navigate images') + + self.X = X + rows, cols, self.slices = X.shape + self.ind = self.slices//2 + + self.im = ax.imshow(self.X[:, :, self.ind]) + self.update() + + def onscroll(self, event): + print("%s %s" % (event.button, event.step)) + if event.button == 'up': + self.ind = (self.ind + 1) % self.slices + else: + self.ind = (self.ind - 1) % self.slices + self.update() + + def update(self): + self.im.set_data(self.X[:, :, self.ind]) + ax.set_ylabel('slice %s' % self.ind) + self.im.axes.figure.canvas.draw() + + +fig, ax = plt.subplots(1, 1) + +X = np.random.rand(20, 20, 40) + +tracker = IndexTracker(ax, X) + + +fig.canvas.mpl_connect('scroll_event', tracker.onscroll) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: image_slices_viewer.py](https://matplotlib.org/_downloads/image_slices_viewer.py) +- [下载Jupyter notebook: image_slices_viewer.ipynb](https://matplotlib.org/_downloads/image_slices_viewer.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/index.md b/Python/matplotlab/gallery/event_handling/index.md new file mode 100644 index 00000000..44fb1d81 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/index.md @@ -0,0 +1,3 @@ +# 事件处理 + +Matplotlib支持使用GUI中立事件模型进行[事件处理](https://matplotlib.org/users/event_handling.html),因此您可以连接到Matplotlib事件,而无需了解Matplotlib最终将插入哪个用户界面。 这有两个好处:你编写的代码将更加可移植,Matplotlib事件就像数据坐标空间和事件发生在哪些轴之类的东西,所以你不必混淆低级转换细节来自画布空间到数据空间。还包括对象拾取示例。 \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/keypress_demo.md b/Python/matplotlab/gallery/event_handling/keypress_demo.md new file mode 100644 index 00000000..ccbe7f36 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/keypress_demo.md @@ -0,0 +1,38 @@ +# 按键演示 + +显示如何连接到按键事件 + +![按键演示](https://matplotlib.org/_images/sphx_glr_keypress_demo_001.png) + +```python +import sys +import numpy as np +import matplotlib.pyplot as plt + + +def press(event): + print('press', event.key) + sys.stdout.flush() + if event.key == 'x': + visible = xl.get_visible() + xl.set_visible(not visible) + fig.canvas.draw() + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig, ax = plt.subplots() + +fig.canvas.mpl_connect('key_press_event', press) + +ax.plot(np.random.rand(12), np.random.rand(12), 'go') +xl = ax.set_xlabel('easy come, easy go') +ax.set_title('Press a key') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: keypress_demo.py](https://matplotlib.org/_downloads/keypress_demo.py) +- [下载Jupyter notebook: keypress_demo.ipynb](https://matplotlib.org/_downloads/keypress_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/lasso_demo.md b/Python/matplotlab/gallery/event_handling/lasso_demo.md new file mode 100644 index 00000000..12dae610 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/lasso_demo.md @@ -0,0 +1,92 @@ +# 套索演示 + +演示如何使用套索选择一组点并获取所选点的索引。回调用于更改所选点的颜色。 + +这是一个概念验证实现(尽管它可以按原样使用)。将对API进行一些改进。 + +![套索演示](https://matplotlib.org/_images/sphx_glr_lasso_demo_001.png) + +```python +from matplotlib import colors as mcolors, path +from matplotlib.collections import RegularPolyCollection +import matplotlib.pyplot as plt +from matplotlib.widgets import Lasso +import numpy as np + + +class Datum(object): + colorin = mcolors.to_rgba("red") + colorout = mcolors.to_rgba("blue") + + def __init__(self, x, y, include=False): + self.x = x + self.y = y + if include: + self.color = self.colorin + else: + self.color = self.colorout + + +class LassoManager(object): + def __init__(self, ax, data): + self.axes = ax + self.canvas = ax.figure.canvas + self.data = data + + self.Nxy = len(data) + + facecolors = [d.color for d in data] + self.xys = [(d.x, d.y) for d in data] + self.collection = RegularPolyCollection( + 6, sizes=(100,), + facecolors=facecolors, + offsets=self.xys, + transOffset=ax.transData) + + ax.add_collection(self.collection) + + self.cid = self.canvas.mpl_connect('button_press_event', self.onpress) + + def callback(self, verts): + facecolors = self.collection.get_facecolors() + p = path.Path(verts) + ind = p.contains_points(self.xys) + for i in range(len(self.xys)): + if ind[i]: + facecolors[i] = Datum.colorin + else: + facecolors[i] = Datum.colorout + + self.canvas.draw_idle() + self.canvas.widgetlock.release(self.lasso) + del self.lasso + + def onpress(self, event): + if self.canvas.widgetlock.locked(): + return + if event.inaxes is None: + return + self.lasso = Lasso(event.inaxes, + (event.xdata, event.ydata), + self.callback) + # acquire a lock on the widget drawing + self.canvas.widgetlock(self.lasso) + + +if __name__ == '__main__': + + np.random.seed(19680801) + + data = [Datum(*xy) for xy in np.random.rand(100, 2)] + ax = plt.axes(xlim=(0, 1), ylim=(0, 1), autoscale_on=False) + ax.set_title('Lasso points using left mouse button') + + lman = LassoManager(ax, data) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: lasso_demo.py](https://matplotlib.org/_downloads/lasso_demo.py) +- [下载Jupyter notebook: lasso_demo.ipynb](https://matplotlib.org/_downloads/lasso_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/legend_picking.md b/Python/matplotlab/gallery/event_handling/legend_picking.md new file mode 100644 index 00000000..e38aa6b5 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/legend_picking.md @@ -0,0 +1,55 @@ +# 图例选择 + +启用图例上的拾取以打开和关闭原始线。 + +![图例选择](https://matplotlib.org/_images/sphx_glr_legend_picking_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +t = np.arange(0.0, 0.2, 0.1) +y1 = 2*np.sin(2*np.pi*t) +y2 = 4*np.sin(2*np.pi*2*t) + +fig, ax = plt.subplots() +ax.set_title('Click on legend line to toggle line on/off') +line1, = ax.plot(t, y1, lw=2, color='red', label='1 HZ') +line2, = ax.plot(t, y2, lw=2, color='blue', label='2 HZ') +leg = ax.legend(loc='upper left', fancybox=True, shadow=True) +leg.get_frame().set_alpha(0.4) + + +# we will set up a dict mapping legend line to orig line, and enable +# picking on the legend line +lines = [line1, line2] +lined = dict() +for legline, origline in zip(leg.get_lines(), lines): + legline.set_picker(5) # 5 pts tolerance + lined[legline] = origline + + +def onpick(event): + # on the pick event, find the orig line corresponding to the + # legend proxy line, and toggle the visibility + legline = event.artist + origline = lined[legline] + vis = not origline.get_visible() + origline.set_visible(vis) + # Change the alpha on the line in the legend so we can see what lines + # have been toggled + if vis: + legline.set_alpha(1.0) + else: + legline.set_alpha(0.2) + fig.canvas.draw() + +fig.canvas.mpl_connect('pick_event', onpick) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: legend_picking.py](https://matplotlib.org/_downloads/legend_picking.py) +- [下载Jupyter notebook: legend_picking.ipynb](https://matplotlib.org/_downloads/legend_picking.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/looking_glass.md b/Python/matplotlab/gallery/event_handling/looking_glass.md new file mode 100644 index 00000000..63192c53 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/looking_glass.md @@ -0,0 +1,65 @@ +# 镜子 + +例如,使用鼠标事件模拟用于检查数据的镜子。 + +![镜子示例](https://matplotlib.org/_images/sphx_glr_looking_glass_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.patches as patches + +# Fixing random state for reproducibility +np.random.seed(19680801) + +x, y = np.random.rand(2, 200) + +fig, ax = plt.subplots() +circ = patches.Circle((0.5, 0.5), 0.25, alpha=0.8, fc='yellow') +ax.add_patch(circ) + + +ax.plot(x, y, alpha=0.2) +line, = ax.plot(x, y, alpha=1.0, clip_path=circ) +ax.set_title("Left click and drag to move looking glass") + + +class EventHandler(object): + def __init__(self): + fig.canvas.mpl_connect('button_press_event', self.onpress) + fig.canvas.mpl_connect('button_release_event', self.onrelease) + fig.canvas.mpl_connect('motion_notify_event', self.onmove) + self.x0, self.y0 = circ.center + self.pressevent = None + + def onpress(self, event): + if event.inaxes != ax: + return + + if not circ.contains(event)[0]: + return + + self.pressevent = event + + def onrelease(self, event): + self.pressevent = None + self.x0, self.y0 = circ.center + + def onmove(self, event): + if self.pressevent is None or event.inaxes != self.pressevent.inaxes: + return + + dx = event.xdata - self.pressevent.xdata + dy = event.ydata - self.pressevent.ydata + circ.center = self.x0 + dx, self.y0 + dy + line.set_clip_path(circ) + fig.canvas.draw() + +handler = EventHandler() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: looking_glass.py](https://matplotlib.org/_downloads/looking_glass.py) +- [下载Jupyter notebook: looking_glass.ipynb](https://matplotlib.org/_downloads/looking_glass.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/path_editor.md b/Python/matplotlab/gallery/event_handling/path_editor.md new file mode 100644 index 00000000..874d08fb --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/path_editor.md @@ -0,0 +1,164 @@ +# 路径编辑器 + +跨GUI共享事件。 + +此示例演示了使用Matplotlib事件处理与画布上的对象进行交互和修改对象的跨GUI应用程序。 + +![路径编辑器示例](https://matplotlib.org/_images/sphx_glr_path_editor_001.png) + +```python +import numpy as np +import matplotlib.path as mpath +import matplotlib.patches as mpatches +import matplotlib.pyplot as plt + +Path = mpath.Path + +fig, ax = plt.subplots() + +pathdata = [ + (Path.MOVETO, (1.58, -2.57)), + (Path.CURVE4, (0.35, -1.1)), + (Path.CURVE4, (-1.75, 2.0)), + (Path.CURVE4, (0.375, 2.0)), + (Path.LINETO, (0.85, 1.15)), + (Path.CURVE4, (2.2, 3.2)), + (Path.CURVE4, (3, 0.05)), + (Path.CURVE4, (2.0, -0.5)), + (Path.CLOSEPOLY, (1.58, -2.57)), + ] + +codes, verts = zip(*pathdata) +path = mpath.Path(verts, codes) +patch = mpatches.PathPatch(path, facecolor='green', edgecolor='yellow', alpha=0.5) +ax.add_patch(patch) + + +class PathInteractor(object): + """ + An path editor. + + Key-bindings + + 't' toggle vertex markers on and off. When vertex markers are on, + you can move them, delete them + + + """ + + showverts = True + epsilon = 5 # max pixel distance to count as a vertex hit + + def __init__(self, pathpatch): + + self.ax = pathpatch.axes + canvas = self.ax.figure.canvas + self.pathpatch = pathpatch + self.pathpatch.set_animated(True) + + x, y = zip(*self.pathpatch.get_path().vertices) + + self.line, = ax.plot(x, y, marker='o', markerfacecolor='r', animated=True) + + self._ind = None # the active vert + + canvas.mpl_connect('draw_event', self.draw_callback) + canvas.mpl_connect('button_press_event', self.button_press_callback) + canvas.mpl_connect('key_press_event', self.key_press_callback) + canvas.mpl_connect('button_release_event', self.button_release_callback) + canvas.mpl_connect('motion_notify_event', self.motion_notify_callback) + self.canvas = canvas + + def draw_callback(self, event): + self.background = self.canvas.copy_from_bbox(self.ax.bbox) + self.ax.draw_artist(self.pathpatch) + self.ax.draw_artist(self.line) + self.canvas.blit(self.ax.bbox) + + def pathpatch_changed(self, pathpatch): + 'this method is called whenever the pathpatchgon object is called' + # only copy the artist props to the line (except visibility) + vis = self.line.get_visible() + plt.Artist.update_from(self.line, pathpatch) + self.line.set_visible(vis) # don't use the pathpatch visibility state + + def get_ind_under_point(self, event): + 'get the index of the vertex under point if within epsilon tolerance' + + # display coords + xy = np.asarray(self.pathpatch.get_path().vertices) + xyt = self.pathpatch.get_transform().transform(xy) + xt, yt = xyt[:, 0], xyt[:, 1] + d = np.sqrt((xt - event.x)**2 + (yt - event.y)**2) + ind = d.argmin() + + if d[ind] >= self.epsilon: + ind = None + + return ind + + def button_press_callback(self, event): + 'whenever a mouse button is pressed' + if not self.showverts: + return + if event.inaxes is None: + return + if event.button != 1: + return + self._ind = self.get_ind_under_point(event) + + def button_release_callback(self, event): + 'whenever a mouse button is released' + if not self.showverts: + return + if event.button != 1: + return + self._ind = None + + def key_press_callback(self, event): + 'whenever a key is pressed' + if not event.inaxes: + return + if event.key == 't': + self.showverts = not self.showverts + self.line.set_visible(self.showverts) + if not self.showverts: + self._ind = None + + self.canvas.draw() + + def motion_notify_callback(self, event): + 'on mouse movement' + if not self.showverts: + return + if self._ind is None: + return + if event.inaxes is None: + return + if event.button != 1: + return + x, y = event.xdata, event.ydata + + vertices = self.pathpatch.get_path().vertices + + vertices[self._ind] = x, y + self.line.set_data(zip(*vertices)) + + self.canvas.restore_region(self.background) + self.ax.draw_artist(self.pathpatch) + self.ax.draw_artist(self.line) + self.canvas.blit(self.ax.bbox) + + +interactor = PathInteractor(patch) +ax.set_title('drag vertices to update path') +ax.set_xlim(-3, 4) +ax.set_ylim(-3, 4) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: path_editor.py](https://matplotlib.org/_downloads/path_editor.py) +- [下载Jupyter notebook: path_editor.ipynb](https://matplotlib.org/_downloads/path_editor.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/pick_event_demo.md b/Python/matplotlab/gallery/event_handling/pick_event_demo.md new file mode 100644 index 00000000..22e19088 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/pick_event_demo.md @@ -0,0 +1,154 @@ +# 选择事件演示 + +您可以通过设置艺术家的“选择器”属性来启用拾取(例如,matplotlib Line2D,Text,Patch,Polygon,AxesImage等...) + +选择器属性有多种含义 + +- None - 此艺术家对象的选择功能已停用(默认) +- boolean - 如果为True,则启用拾取,如果鼠标事件在艺术家上方,艺术家将触发拾取事件 +- float - 如果选择器是一个数字,则它被解释为以点为单位的epsilon容差,如果事件的数据在鼠标事件的epsilon内,则艺术家将触发事件。 对于某些艺术家(如线条和补丁集合),艺术家可能会为生成的挑选事件提供其他数据,例如,挑选事件的epsilon中的数据索引 +- function - 如果选择器是可调用的,则它是用户提供的函数,用于确定艺术家是否被鼠标事件命中。 + + hit, props = picker(artist, mouseevent) + + 确定命中测试。 如果鼠标事件在艺术家上方,则返回hit = True,props是要添加到PickEvent属性的属性字典 + +通过设置“选取器”属性启用艺术家进行拾取后,您需要连接到图形画布pick_event以获取鼠标按下事件的拾取回调。 例如, + + def pick_handler(event): + + mouseevent = event.mouseevent artist = event.artist # now do something with this... + +传递给回调的pick事件(matplotlib.backend_bases.PickEvent)始终使用两个属性触发: + +- mouseevent - 生成拾取事件的鼠标事件。 鼠标事件又具有x和y(显示空间中的坐标,如左下角的像素)和xdata,ydata(数据空间中的坐标)等属性。 此外,您可以获取有关按下哪些按钮,按下哪些键,鼠标所在的轴等的信息。有关详细信息,请参阅matplotlib.backend_bases.MouseEvent。 + +- artist - 生成pick事件的matplotlib.artist。 + +此外,某些艺术家(如Line2D和PatchCollection)可能会将其他元数据(如索引)附加到符合选择器条件的数据中(例如,行中指定的epsilon容差范围内的所有点) + +以下示例说明了这些方法中的每一种。 + +![选择事件示例](https://matplotlib.org/_images/sphx_glr_pick_event_demo_001.png) + +![选择事件示例2](https://matplotlib.org/_images/sphx_glr_pick_event_demo_002.png) + +![选择事件示例3](https://matplotlib.org/_images/sphx_glr_pick_event_demo_003.png) + +![选择事件示例4](https://matplotlib.org/_images/sphx_glr_pick_event_demo_004.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.lines import Line2D +from matplotlib.patches import Rectangle +from matplotlib.text import Text +from matplotlib.image import AxesImage +import numpy as np +from numpy.random import rand + +if 1: # simple picking, lines, rectangles and text + fig, (ax1, ax2) = plt.subplots(2, 1) + ax1.set_title('click on points, rectangles or text', picker=True) + ax1.set_ylabel('ylabel', picker=True, bbox=dict(facecolor='red')) + line, = ax1.plot(rand(100), 'o', picker=5) # 5 points tolerance + + # pick the rectangle + bars = ax2.bar(range(10), rand(10), picker=True) + for label in ax2.get_xticklabels(): # make the xtick labels pickable + label.set_picker(True) + + def onpick1(event): + if isinstance(event.artist, Line2D): + thisline = event.artist + xdata = thisline.get_xdata() + ydata = thisline.get_ydata() + ind = event.ind + print('onpick1 line:', zip(np.take(xdata, ind), np.take(ydata, ind))) + elif isinstance(event.artist, Rectangle): + patch = event.artist + print('onpick1 patch:', patch.get_path()) + elif isinstance(event.artist, Text): + text = event.artist + print('onpick1 text:', text.get_text()) + + fig.canvas.mpl_connect('pick_event', onpick1) + +if 1: # picking with a custom hit test function + # you can define custom pickers by setting picker to a callable + # function. The function has the signature + # + # hit, props = func(artist, mouseevent) + # + # to determine the hit test. if the mouse event is over the artist, + # return hit=True and props is a dictionary of + # properties you want added to the PickEvent attributes + + def line_picker(line, mouseevent): + """ + find the points within a certain distance from the mouseclick in + data coords and attach some extra attributes, pickx and picky + which are the data points that were picked + """ + if mouseevent.xdata is None: + return False, dict() + xdata = line.get_xdata() + ydata = line.get_ydata() + maxd = 0.05 + d = np.sqrt((xdata - mouseevent.xdata)**2. + (ydata - mouseevent.ydata)**2.) + + ind = np.nonzero(np.less_equal(d, maxd)) + if len(ind): + pickx = np.take(xdata, ind) + picky = np.take(ydata, ind) + props = dict(ind=ind, pickx=pickx, picky=picky) + return True, props + else: + return False, dict() + + def onpick2(event): + print('onpick2 line:', event.pickx, event.picky) + + fig, ax = plt.subplots() + ax.set_title('custom picker for line data') + line, = ax.plot(rand(100), rand(100), 'o', picker=line_picker) + fig.canvas.mpl_connect('pick_event', onpick2) + + +if 1: # picking on a scatter plot (matplotlib.collections.RegularPolyCollection) + + x, y, c, s = rand(4, 100) + + def onpick3(event): + ind = event.ind + print('onpick3 scatter:', ind, np.take(x, ind), np.take(y, ind)) + + fig, ax = plt.subplots() + col = ax.scatter(x, y, 100*s, c, picker=True) + #fig.savefig('pscoll.eps') + fig.canvas.mpl_connect('pick_event', onpick3) + +if 1: # picking images (matplotlib.image.AxesImage) + fig, ax = plt.subplots() + im1 = ax.imshow(rand(10, 5), extent=(1, 2, 1, 2), picker=True) + im2 = ax.imshow(rand(5, 10), extent=(3, 4, 1, 2), picker=True) + im3 = ax.imshow(rand(20, 25), extent=(1, 2, 3, 4), picker=True) + im4 = ax.imshow(rand(30, 12), extent=(3, 4, 3, 4), picker=True) + ax.axis([0, 5, 0, 5]) + + def onpick4(event): + artist = event.artist + if isinstance(artist, AxesImage): + im = artist + A = im.get_array() + print('onpick4 image', A.shape) + + fig.canvas.mpl_connect('pick_event', onpick4) + + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: pick_event_demo.py](https://matplotlib.org/_downloads/pick_event_demo.py) +- [下载Jupyter notebook: pick_event_demo.ipynb](https://matplotlib.org/_downloads/pick_event_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/pick_event_demo2.md b/Python/matplotlab/gallery/event_handling/pick_event_demo2.md new file mode 100644 index 00000000..8710db65 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/pick_event_demo2.md @@ -0,0 +1,47 @@ +# 选择事件演示2 + +计算100个数据集的平均值和标准差(stddev),并绘制平均值vs stddev。单击其中一个mu,sigma点时,绘制生成均值和stddev的数据集中的原始数据。 + +![选择事件演示2](https://matplotlib.org/_images/sphx_glr_pick_event_demo2_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +X = np.random.rand(100, 1000) +xs = np.mean(X, axis=1) +ys = np.std(X, axis=1) + +fig, ax = plt.subplots() +ax.set_title('click on point to plot time series') +line, = ax.plot(xs, ys, 'o', picker=5) # 5 points tolerance + + +def onpick(event): + + if event.artist != line: + return True + + N = len(event.ind) + if not N: + return True + + figi, axs = plt.subplots(N, squeeze=False) + for ax, dataind in zip(axs.flat, event.ind): + ax.plot(X[dataind]) + ax.text(.05, .9, 'mu=%1.3f\nsigma=%1.3f' % (xs[dataind], ys[dataind]), + transform=ax.transAxes, va='top') + ax.set_ylim(-0.5, 1.5) + figi.show() + return True + +fig.canvas.mpl_connect('pick_event', onpick) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: pick_event_demo2.py](https://matplotlib.org/_downloads/pick_event_demo2.py) +- [下载Jupyter notebook: pick_event_demo2.ipynb](https://matplotlib.org/_downloads/pick_event_demo2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/pipong.md b/Python/matplotlab/gallery/event_handling/pipong.md new file mode 100644 index 00000000..397a401b --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/pipong.md @@ -0,0 +1,291 @@ +# Pipong + +一个基于Matplotlib的Pong游戏,说明了一种编写交互动画的方法,它很容易移植到多个后端pipong.py由Paul Ivanov撰写 + +```python +import numpy as np +import matplotlib.pyplot as plt +from numpy.random import randn, randint +from matplotlib.font_manager import FontProperties + +instructions = """ +Player A: Player B: + 'e' up 'i' + 'd' down 'k' + +press 't' -- close these instructions + (animation will be much faster) +press 'a' -- add a puck +press 'A' -- remove a puck +press '1' -- slow down all pucks +press '2' -- speed up all pucks +press '3' -- slow down distractors +press '4' -- speed up distractors +press ' ' -- reset the first puck +press 'n' -- toggle distractors on/off +press 'g' -- toggle the game on/off + + """ + + +class Pad(object): + def __init__(self, disp, x, y, type='l'): + self.disp = disp + self.x = x + self.y = y + self.w = .3 + self.score = 0 + self.xoffset = 0.3 + self.yoffset = 0.1 + if type == 'r': + self.xoffset *= -1.0 + + if type == 'l' or type == 'r': + self.signx = -1.0 + self.signy = 1.0 + else: + self.signx = 1.0 + self.signy = -1.0 + + def contains(self, loc): + return self.disp.get_bbox().contains(loc.x, loc.y) + + +class Puck(object): + def __init__(self, disp, pad, field): + self.vmax = .2 + self.disp = disp + self.field = field + self._reset(pad) + + def _reset(self, pad): + self.x = pad.x + pad.xoffset + if pad.y < 0: + self.y = pad.y + pad.yoffset + else: + self.y = pad.y - pad.yoffset + self.vx = pad.x - self.x + self.vy = pad.y + pad.w/2 - self.y + self._speedlimit() + self._slower() + self._slower() + + def update(self, pads): + self.x += self.vx + self.y += self.vy + for pad in pads: + if pad.contains(self): + self.vx *= 1.2 * pad.signx + self.vy *= 1.2 * pad.signy + fudge = .001 + # probably cleaner with something like... + if self.x < fudge: + pads[1].score += 1 + self._reset(pads[0]) + return True + if self.x > 7 - fudge: + pads[0].score += 1 + self._reset(pads[1]) + return True + if self.y < -1 + fudge or self.y > 1 - fudge: + self.vy *= -1.0 + # add some randomness, just to make it interesting + self.vy -= (randn()/300.0 + 1/300.0) * np.sign(self.vy) + self._speedlimit() + return False + + def _slower(self): + self.vx /= 5.0 + self.vy /= 5.0 + + def _faster(self): + self.vx *= 5.0 + self.vy *= 5.0 + + def _speedlimit(self): + if self.vx > self.vmax: + self.vx = self.vmax + if self.vx < -self.vmax: + self.vx = -self.vmax + + if self.vy > self.vmax: + self.vy = self.vmax + if self.vy < -self.vmax: + self.vy = -self.vmax + + +class Game(object): + def __init__(self, ax): + # create the initial line + self.ax = ax + ax.set_ylim([-1, 1]) + ax.set_xlim([0, 7]) + padAx = 0 + padBx = .50 + padAy = padBy = .30 + padBx += 6.3 + + # pads + pA, = self.ax.barh(padAy, .2, + height=.3, color='k', alpha=.5, edgecolor='b', + lw=2, label="Player B", + animated=True) + pB, = self.ax.barh(padBy, .2, + height=.3, left=padBx, color='k', alpha=.5, + edgecolor='r', lw=2, label="Player A", + animated=True) + + # distractors + self.x = np.arange(0, 2.22*np.pi, 0.01) + self.line, = self.ax.plot(self.x, np.sin(self.x), "r", + animated=True, lw=4) + self.line2, = self.ax.plot(self.x, np.cos(self.x), "g", + animated=True, lw=4) + self.line3, = self.ax.plot(self.x, np.cos(self.x), "g", + animated=True, lw=4) + self.line4, = self.ax.plot(self.x, np.cos(self.x), "r", + animated=True, lw=4) + + # center line + self.centerline, = self.ax.plot([3.5, 3.5], [1, -1], 'k', + alpha=.5, animated=True, lw=8) + + # puck (s) + self.puckdisp = self.ax.scatter([1], [1], label='_nolegend_', + s=200, c='g', + alpha=.9, animated=True) + + self.canvas = self.ax.figure.canvas + self.background = None + self.cnt = 0 + self.distract = True + self.res = 100.0 + self.on = False + self.inst = True # show instructions from the beginning + self.background = None + self.pads = [] + self.pads.append(Pad(pA, padAx, padAy)) + self.pads.append(Pad(pB, padBx, padBy, 'r')) + self.pucks = [] + self.i = self.ax.annotate(instructions, (.5, 0.5), + name='monospace', + verticalalignment='center', + horizontalalignment='center', + multialignment='left', + textcoords='axes fraction', + animated=False) + self.canvas.mpl_connect('key_press_event', self.key_press) + + def draw(self, evt): + draw_artist = self.ax.draw_artist + if self.background is None: + self.background = self.canvas.copy_from_bbox(self.ax.bbox) + + # restore the clean slate background + self.canvas.restore_region(self.background) + + # show the distractors + if self.distract: + self.line.set_ydata(np.sin(self.x + self.cnt/self.res)) + self.line2.set_ydata(np.cos(self.x - self.cnt/self.res)) + self.line3.set_ydata(np.tan(self.x + self.cnt/self.res)) + self.line4.set_ydata(np.tan(self.x - self.cnt/self.res)) + draw_artist(self.line) + draw_artist(self.line2) + draw_artist(self.line3) + draw_artist(self.line4) + + # pucks and pads + if self.on: + self.ax.draw_artist(self.centerline) + for pad in self.pads: + pad.disp.set_y(pad.y) + pad.disp.set_x(pad.x) + self.ax.draw_artist(pad.disp) + + for puck in self.pucks: + if puck.update(self.pads): + # we only get here if someone scored + self.pads[0].disp.set_label( + " " + str(self.pads[0].score)) + self.pads[1].disp.set_label( + " " + str(self.pads[1].score)) + self.ax.legend(loc='center', framealpha=.2, + facecolor='0.5', + prop=FontProperties(size='xx-large', + weight='bold')) + + self.background = None + self.ax.figure.canvas.draw_idle() + return True + puck.disp.set_offsets([[puck.x, puck.y]]) + self.ax.draw_artist(puck.disp) + + # just redraw the axes rectangle + self.canvas.blit(self.ax.bbox) + self.canvas.flush_events() + if self.cnt == 50000: + # just so we don't get carried away + print("...and you've been playing for too long!!!") + plt.close() + + self.cnt += 1 + return True + + def key_press(self, event): + if event.key == '3': + self.res *= 5.0 + if event.key == '4': + self.res /= 5.0 + + if event.key == 'e': + self.pads[0].y += .1 + if self.pads[0].y > 1 - .3: + self.pads[0].y = 1 - .3 + if event.key == 'd': + self.pads[0].y -= .1 + if self.pads[0].y < -1: + self.pads[0].y = -1 + + if event.key == 'i': + self.pads[1].y += .1 + if self.pads[1].y > 1 - .3: + self.pads[1].y = 1 - .3 + if event.key == 'k': + self.pads[1].y -= .1 + if self.pads[1].y < -1: + self.pads[1].y = -1 + + if event.key == 'a': + self.pucks.append(Puck(self.puckdisp, + self.pads[randint(2)], + self.ax.bbox)) + if event.key == 'A' and len(self.pucks): + self.pucks.pop() + if event.key == ' ' and len(self.pucks): + self.pucks[0]._reset(self.pads[randint(2)]) + if event.key == '1': + for p in self.pucks: + p._slower() + if event.key == '2': + for p in self.pucks: + p._faster() + + if event.key == 'n': + self.distract = not self.distract + + if event.key == 'g': + self.on = not self.on + if event.key == 't': + self.inst = not self.inst + self.i.set_visible(not self.i.get_visible()) + self.background = None + self.canvas.draw_idle() + if event.key == 'q': + plt.close() +``` + +## 下载这个示例 + +- [下载python源码: pipong.py](https://matplotlib.org/_downloads/pipong.py) +- [下载Jupyter notebook: pipong.ipynb](https://matplotlib.org/_downloads/pipong.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/poly_editor.md b/Python/matplotlab/gallery/event_handling/poly_editor.md new file mode 100644 index 00000000..694f8fb2 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/poly_editor.md @@ -0,0 +1,187 @@ +# 综合编辑器 + +这是一个示例,展示如何使用Matplotlib事件处理来构建跨GUI应用程序,以与画布上的对象进行交互。 + +![综合编辑器示例](https://matplotlib.org/_images/sphx_glr_poly_editor_001.png) + +```python +import numpy as np +from matplotlib.lines import Line2D +from matplotlib.artist import Artist +from matplotlib.mlab import dist_point_to_segment + + +class PolygonInteractor(object): + """ + A polygon editor. + + Key-bindings + + 't' toggle vertex markers on and off. When vertex markers are on, + you can move them, delete them + + 'd' delete the vertex under point + + 'i' insert a vertex at point. You must be within epsilon of the + line connecting two existing vertices + + """ + + showverts = True + epsilon = 5 # max pixel distance to count as a vertex hit + + def __init__(self, ax, poly): + if poly.figure is None: + raise RuntimeError('You must first add the polygon to a figure ' + 'or canvas before defining the interactor') + self.ax = ax + canvas = poly.figure.canvas + self.poly = poly + + x, y = zip(*self.poly.xy) + self.line = Line2D(x, y, + marker='o', markerfacecolor='r', + animated=True) + self.ax.add_line(self.line) + + self.cid = self.poly.add_callback(self.poly_changed) + self._ind = None # the active vert + + canvas.mpl_connect('draw_event', self.draw_callback) + canvas.mpl_connect('button_press_event', self.button_press_callback) + canvas.mpl_connect('key_press_event', self.key_press_callback) + canvas.mpl_connect('button_release_event', self.button_release_callback) + canvas.mpl_connect('motion_notify_event', self.motion_notify_callback) + self.canvas = canvas + + def draw_callback(self, event): + self.background = self.canvas.copy_from_bbox(self.ax.bbox) + self.ax.draw_artist(self.poly) + self.ax.draw_artist(self.line) + # do not need to blit here, this will fire before the screen is + # updated + + def poly_changed(self, poly): + 'this method is called whenever the polygon object is called' + # only copy the artist props to the line (except visibility) + vis = self.line.get_visible() + Artist.update_from(self.line, poly) + self.line.set_visible(vis) # don't use the poly visibility state + + def get_ind_under_point(self, event): + 'get the index of the vertex under point if within epsilon tolerance' + + # display coords + xy = np.asarray(self.poly.xy) + xyt = self.poly.get_transform().transform(xy) + xt, yt = xyt[:, 0], xyt[:, 1] + d = np.hypot(xt - event.x, yt - event.y) + indseq, = np.nonzero(d == d.min()) + ind = indseq[0] + + if d[ind] >= self.epsilon: + ind = None + + return ind + + def button_press_callback(self, event): + 'whenever a mouse button is pressed' + if not self.showverts: + return + if event.inaxes is None: + return + if event.button != 1: + return + self._ind = self.get_ind_under_point(event) + + def button_release_callback(self, event): + 'whenever a mouse button is released' + if not self.showverts: + return + if event.button != 1: + return + self._ind = None + + def key_press_callback(self, event): + 'whenever a key is pressed' + if not event.inaxes: + return + if event.key == 't': + self.showverts = not self.showverts + self.line.set_visible(self.showverts) + if not self.showverts: + self._ind = None + elif event.key == 'd': + ind = self.get_ind_under_point(event) + if ind is not None: + self.poly.xy = np.delete(self.poly.xy, + ind, axis=0) + self.line.set_data(zip(*self.poly.xy)) + elif event.key == 'i': + xys = self.poly.get_transform().transform(self.poly.xy) + p = event.x, event.y # display coords + for i in range(len(xys) - 1): + s0 = xys[i] + s1 = xys[i + 1] + d = dist_point_to_segment(p, s0, s1) + if d <= self.epsilon: + self.poly.xy = np.insert( + self.poly.xy, i+1, + [event.xdata, event.ydata], + axis=0) + self.line.set_data(zip(*self.poly.xy)) + break + if self.line.stale: + self.canvas.draw_idle() + + def motion_notify_callback(self, event): + 'on mouse movement' + if not self.showverts: + return + if self._ind is None: + return + if event.inaxes is None: + return + if event.button != 1: + return + x, y = event.xdata, event.ydata + + self.poly.xy[self._ind] = x, y + if self._ind == 0: + self.poly.xy[-1] = x, y + elif self._ind == len(self.poly.xy) - 1: + self.poly.xy[0] = x, y + self.line.set_data(zip(*self.poly.xy)) + + self.canvas.restore_region(self.background) + self.ax.draw_artist(self.poly) + self.ax.draw_artist(self.line) + self.canvas.blit(self.ax.bbox) + + +if __name__ == '__main__': + import matplotlib.pyplot as plt + from matplotlib.patches import Polygon + + theta = np.arange(0, 2*np.pi, 0.1) + r = 1.5 + + xs = r * np.cos(theta) + ys = r * np.sin(theta) + + poly = Polygon(np.column_stack([xs, ys]), animated=True) + + fig, ax = plt.subplots() + ax.add_patch(poly) + p = PolygonInteractor(ax, poly) + + ax.set_title('Click and drag a point to move it') + ax.set_xlim((-2, 2)) + ax.set_ylim((-2, 2)) + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: poly_editor.py](https://matplotlib.org/_downloads/poly_editor.py) +- [下载Jupyter notebook: poly_editor.ipynb](https://matplotlib.org/_downloads/poly_editor.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/pong_sgskip.md b/Python/matplotlab/gallery/event_handling/pong_sgskip.md new file mode 100644 index 00000000..e591bbf4 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/pong_sgskip.md @@ -0,0 +1,54 @@ +# Pong + +一个使用Matplotlib的小游戏演示。 + +此示例需要[pipong.py](https://matplotlib.org/_downloads/9a2d2c527d869cd1b03d9560d75d6a71/pipong.py) + +```python +import time + + +import matplotlib.pyplot as plt +import pipong + + +fig, ax = plt.subplots() +canvas = ax.figure.canvas +animation = pipong.Game(ax) + +# disable the default key bindings +if fig.canvas.manager.key_press_handler_id is not None: + canvas.mpl_disconnect(fig.canvas.manager.key_press_handler_id) + + +# reset the blitting background on redraw +def handle_redraw(event): + animation.background = None + + +# bootstrap after the first draw +def start_anim(event): + canvas.mpl_disconnect(start_anim.cid) + + def local_draw(): + if animation.ax._cachedRenderer: + animation.draw(None) + start_anim.timer.add_callback(local_draw) + start_anim.timer.start() + canvas.mpl_connect('draw_event', handle_redraw) + + +start_anim.cid = canvas.mpl_connect('draw_event', start_anim) +start_anim.timer = animation.canvas.new_timer() +start_anim.timer.interval = 1 + +tstart = time.time() + +plt.show() +print('FPS: %f' % (animation.cnt/(time.time() - tstart))) +``` + +## 下载这个示例 + +- [下载python源码: pong_sgskip.py](https://matplotlib.org/_downloads/pong_sgskip.py) +- [下载Jupyter notebook: pong_sgskip.ipynb](https://matplotlib.org/_downloads/pong_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/resample.md b/Python/matplotlab/gallery/event_handling/resample.md new file mode 100644 index 00000000..34b5914b --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/resample.md @@ -0,0 +1,73 @@ +# 重采样数据 + +下采样会降低信号的采样率或采样大小。在本教程中,当通过拖动和缩放调整打印时,将对信号进行缩减采样。 + +![重采样数据示例](https://matplotlib.org/_images/sphx_glr_resample_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +# A class that will downsample the data and recompute when zoomed. +class DataDisplayDownsampler(object): + def __init__(self, xdata, ydata): + self.origYData = ydata + self.origXData = xdata + self.max_points = 50 + self.delta = xdata[-1] - xdata[0] + + def downsample(self, xstart, xend): + # get the points in the view range + mask = (self.origXData > xstart) & (self.origXData < xend) + # dilate the mask by one to catch the points just outside + # of the view range to not truncate the line + mask = np.convolve([1, 1], mask, mode='same').astype(bool) + # sort out how many points to drop + ratio = max(np.sum(mask) // self.max_points, 1) + + # mask data + xdata = self.origXData[mask] + ydata = self.origYData[mask] + + # downsample data + xdata = xdata[::ratio] + ydata = ydata[::ratio] + + print("using {} of {} visible points".format( + len(ydata), np.sum(mask))) + + return xdata, ydata + + def update(self, ax): + # Update the line + lims = ax.viewLim + if np.abs(lims.width - self.delta) > 1e-8: + self.delta = lims.width + xstart, xend = lims.intervalx + self.line.set_data(*self.downsample(xstart, xend)) + ax.figure.canvas.draw_idle() + + +# Create a signal +xdata = np.linspace(16, 365, (365-16)*4) +ydata = np.sin(2*np.pi*xdata/153) + np.cos(2*np.pi*xdata/127) + +d = DataDisplayDownsampler(xdata, ydata) + +fig, ax = plt.subplots() + +# Hook up the line +d.line, = ax.plot(xdata, ydata, 'o-') +ax.set_autoscale_on(False) # Otherwise, infinite loop + +# Connect for changing the view limits +ax.callbacks.connect('xlim_changed', d.update) +ax.set_xlim(16, 365) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: resample.py](https://matplotlib.org/_downloads/resample.py) +- [下载Jupyter notebook: resample.ipynb](https://matplotlib.org/_downloads/resample.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/timers.md b/Python/matplotlab/gallery/event_handling/timers.md new file mode 100644 index 00000000..8c049500 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/timers.md @@ -0,0 +1,40 @@ +# 计时器 + +使用通用计时器对象的简单示例。这用于更新图中标题的时间。 + +![计时器示例](https://matplotlib.org/_images/sphx_glr_timers_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np +from datetime import datetime + + +def update_title(axes): + axes.set_title(datetime.now()) + axes.figure.canvas.draw() + +fig, ax = plt.subplots() + +x = np.linspace(-3, 3) +ax.plot(x, x ** 2) + +# Create a new timer object. Set the interval to 100 milliseconds +# (1000 is default) and tell the timer what function should be called. +timer = fig.canvas.new_timer(interval=100) +timer.add_callback(update_title, ax) +timer.start() + +# Or could start the timer on first figure draw +#def start_timer(evt): +# timer.start() +# fig.canvas.mpl_disconnect(drawid) +#drawid = fig.canvas.mpl_connect('draw_event', start_timer) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: timers.py](https://matplotlib.org/_downloads/timers.py) +- [下载Jupyter notebook: timers.ipynb](https://matplotlib.org/_downloads/timers.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/trifinder_event_demo.md b/Python/matplotlab/gallery/event_handling/trifinder_event_demo.md new file mode 100644 index 00000000..633546a0 --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/trifinder_event_demo.md @@ -0,0 +1,65 @@ +# Trifinder 事件演示 + +显示使用TriFinder对象的示例。当鼠标在三角测量上移动时,光标下方的三角形将突出显示,三角形的索引将显示在图表标题中。 + +![Trifinder 事件演示](https://matplotlib.org/_images/sphx_glr_trifinder_event_demo_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.tri import Triangulation +from matplotlib.patches import Polygon +import numpy as np + + +def update_polygon(tri): + if tri == -1: + points = [0, 0, 0] + else: + points = triang.triangles[tri] + xs = triang.x[points] + ys = triang.y[points] + polygon.set_xy(np.column_stack([xs, ys])) + + +def motion_notify(event): + if event.inaxes is None: + tri = -1 + else: + tri = trifinder(event.xdata, event.ydata) + update_polygon(tri) + plt.title('In triangle %i' % tri) + event.canvas.draw() + + +# Create a Triangulation. +n_angles = 16 +n_radii = 5 +min_radius = 0.25 +radii = np.linspace(min_radius, 0.95, n_radii) +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles +x = (radii*np.cos(angles)).flatten() +y = (radii*np.sin(angles)).flatten() +triang = Triangulation(x, y) +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) + +# Use the triangulation's default TriFinder object. +trifinder = triang.get_trifinder() + +# Setup plot and callbacks. +plt.subplot(111, aspect='equal') +plt.triplot(triang, 'bo-') +polygon = Polygon([[0, 0], [0, 0]], facecolor='y') # dummy data for xs,ys +update_polygon(-1) +plt.gca().add_patch(polygon) +plt.gcf().canvas.mpl_connect('motion_notify_event', motion_notify) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: trifinder_event_demo.py](https://matplotlib.org/_downloads/trifinder_event_demo.py) +- [下载Jupyter notebook: trifinder_event_demo.ipynb](https://matplotlib.org/_downloads/trifinder_event_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/viewlims.md b/Python/matplotlab/gallery/event_handling/viewlims.md new file mode 100644 index 00000000..45bf0c6e --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/viewlims.md @@ -0,0 +1,90 @@ +# Viewlims + +创建两个相同的面板。在右侧面板上放大将在第一个面板中显示一个矩形,表示缩放的区域。 + +![Viewlims](https://matplotlib.org/_images/sphx_glr_viewlims_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.patches import Rectangle + + +# We just subclass Rectangle so that it can be called with an Axes +# instance, causing the rectangle to update its shape to match the +# bounds of the Axes +class UpdatingRect(Rectangle): + def __call__(self, ax): + self.set_bounds(*ax.viewLim.bounds) + ax.figure.canvas.draw_idle() + + +# A class that will regenerate a fractal set as we zoom in, so that you +# can actually see the increasing detail. A box in the left panel will show +# the area to which we are zoomed. +class MandelbrotDisplay(object): + def __init__(self, h=500, w=500, niter=50, radius=2., power=2): + self.height = h + self.width = w + self.niter = niter + self.radius = radius + self.power = power + + def __call__(self, xstart, xend, ystart, yend): + self.x = np.linspace(xstart, xend, self.width) + self.y = np.linspace(ystart, yend, self.height).reshape(-1, 1) + c = self.x + 1.0j * self.y + threshold_time = np.zeros((self.height, self.width)) + z = np.zeros(threshold_time.shape, dtype=complex) + mask = np.ones(threshold_time.shape, dtype=bool) + for i in range(self.niter): + z[mask] = z[mask]**self.power + c[mask] + mask = (np.abs(z) < self.radius) + threshold_time += mask + return threshold_time + + def ax_update(self, ax): + ax.set_autoscale_on(False) # Otherwise, infinite loop + + # Get the number of points from the number of pixels in the window + dims = ax.patch.get_window_extent().bounds + self.width = int(dims[2] + 0.5) + self.height = int(dims[2] + 0.5) + + # Get the range for the new area + xstart, ystart, xdelta, ydelta = ax.viewLim.bounds + xend = xstart + xdelta + yend = ystart + ydelta + + # Update the image object with our new data and extent + im = ax.images[-1] + im.set_data(self.__call__(xstart, xend, ystart, yend)) + im.set_extent((xstart, xend, ystart, yend)) + ax.figure.canvas.draw_idle() + +md = MandelbrotDisplay() +Z = md(-2., 0.5, -1.25, 1.25) + +fig1, (ax1, ax2) = plt.subplots(1, 2) +ax1.imshow(Z, origin='lower', extent=(md.x.min(), md.x.max(), md.y.min(), md.y.max())) +ax2.imshow(Z, origin='lower', extent=(md.x.min(), md.x.max(), md.y.min(), md.y.max())) + +rect = UpdatingRect([0, 0], 0, 0, facecolor='None', edgecolor='black', linewidth=1.0) +rect.set_bounds(*ax2.viewLim.bounds) +ax1.add_patch(rect) + +# Connect for changing the view limits +ax2.callbacks.connect('xlim_changed', rect) +ax2.callbacks.connect('ylim_changed', rect) + +ax2.callbacks.connect('xlim_changed', md.ax_update) +ax2.callbacks.connect('ylim_changed', md.ax_update) +ax2.set_title("Zoom here") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: viewlims.py](https://matplotlib.org/_downloads/viewlims.py) +- [下载Jupyter notebook: viewlims.ipynb](https://matplotlib.org/_downloads/viewlims.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/event_handling/zoom_window.md b/Python/matplotlab/gallery/event_handling/zoom_window.md new file mode 100644 index 00000000..ba13afba --- /dev/null +++ b/Python/matplotlab/gallery/event_handling/zoom_window.md @@ -0,0 +1,44 @@ +# 缩放窗口 + +此示例显示如何将一个窗口(例如鼠标按键)中的事件连接到另一个体形窗口。 + +如果单击第一个窗口中的某个点,将调整第二个窗口的z和y限制,以便第二个窗口中缩放的中心将是所单击点的x,y坐标。 + +请注意,散点图中圆的直径以点**2定义,因此它们的大小与缩放无关。 + +![缩放窗口示例](https://matplotlib.org/_images/sphx_glr_zoom_window_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +figsrc, axsrc = plt.subplots() +figzoom, axzoom = plt.subplots() +axsrc.set(xlim=(0, 1), ylim=(0, 1), autoscale_on=False, + title='Click to zoom') +axzoom.set(xlim=(0.45, 0.55), ylim=(0.4, 0.6), autoscale_on=False, + title='Zoom window') + +x, y, s, c = np.random.rand(4, 200) +s *= 200 + +axsrc.scatter(x, y, s, c) +axzoom.scatter(x, y, s, c) + + +def onpress(event): + if event.button != 1: + return + x, y = event.xdata, event.ydata + axzoom.set_xlim(x - 0.1, x + 0.1) + axzoom.set_ylim(y - 0.1, y + 0.1) + figzoom.canvas.draw() + +figsrc.canvas.mpl_connect('button_press_event', onpress) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: zoom_window.py](https://matplotlib.org/_downloads/zoom_window.py) +- [下载Jupyter notebook: zoom_window.ipynb](https://matplotlib.org/_downloads/zoom_window.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/frontpage/3D.md b/Python/matplotlab/gallery/frontpage/3D.md new file mode 100644 index 00000000..25d31fc5 --- /dev/null +++ b/Python/matplotlab/gallery/frontpage/3D.md @@ -0,0 +1,45 @@ +# Frontpage 3D示例 + +此示例再现Frontpage 3D示例。 + +![Frontpage 3D示例](https://matplotlib.org/_images/sphx_glr_3D_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +from matplotlib import cbook +from matplotlib import cm +from matplotlib.colors import LightSource +import matplotlib.pyplot as plt +import numpy as np + +filename = cbook.get_sample_data('jacksboro_fault_dem.npz', asfileobj=False) +with np.load(filename) as dem: + z = dem['elevation'] + nrows, ncols = z.shape + x = np.linspace(dem['xmin'], dem['xmax'], ncols) + y = np.linspace(dem['ymin'], dem['ymax'], nrows) + x, y = np.meshgrid(x, y) + +region = np.s_[5:50, 5:50] +x, y, z = x[region], y[region], z[region] + +fig, ax = plt.subplots(subplot_kw=dict(projection='3d')) + +ls = LightSource(270, 45) +# To use a custom hillshading mode, override the built-in shading and pass +# in the rgb colors of the shaded surface calculated from "shade". +rgb = ls.shade(z, cmap=cm.gist_earth, vert_exag=0.1, blend_mode='soft') +surf = ax.plot_surface(x, y, z, rstride=1, cstride=1, facecolors=rgb, + linewidth=0, antialiased=False, shade=False) +ax.set_xticks([]) +ax.set_yticks([]) +ax.set_zticks([]) +fig.savefig("surface3d_frontpage.png", dpi=25) # results in 160x120 px image +``` + +## 下载这个示例 + +- [下载python源码: 3D.py](https://matplotlib.org/_downloads/3D.py) +- [下载Jupyter notebook: 3D.ipynb](https://matplotlib.org/_downloads/3D.ipynb) diff --git a/Python/matplotlab/gallery/frontpage/contour.md b/Python/matplotlab/gallery/frontpage/contour.md new file mode 100644 index 00000000..38a3432a --- /dev/null +++ b/Python/matplotlab/gallery/frontpage/contour.md @@ -0,0 +1,39 @@ +# Frontpage 轮廓示例 + +此示例再现Frontpage 轮廓示例。 + +![Frontpage 轮廓示例](https://matplotlib.org/_images/sphx_glr_contour_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib import cm + +extent = (-3, 3, -3, 3) + +delta = 0.5 +x = np.arange(-3.0, 4.001, delta) +y = np.arange(-4.0, 3.001, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = Z1 - Z2 + +norm = cm.colors.Normalize(vmax=abs(Z).max(), vmin=-abs(Z).max()) + +fig, ax = plt.subplots() +cset1 = ax.contourf( + X, Y, Z, 40, + norm=norm) +ax.set_xlim(-2, 2) +ax.set_ylim(-2, 2) +ax.set_xticks([]) +ax.set_yticks([]) +fig.savefig("contour_frontpage.png", dpi=25) # results in 160x120 px image +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: contour.py](https://matplotlib.org/_downloads/contour.py) +- [下载Jupyter notebook: contour.ipynb](https://matplotlib.org/_downloads/contour.ipynb) diff --git a/Python/matplotlab/gallery/frontpage/histogram.md b/Python/matplotlab/gallery/frontpage/histogram.md new file mode 100644 index 00000000..96da4a29 --- /dev/null +++ b/Python/matplotlab/gallery/frontpage/histogram.md @@ -0,0 +1,27 @@ +# Frontpage 直方图示例 + +此示例再现Frontpage 直方图示例。 + +![Frontpage 直方图示例](https://matplotlib.org/_images/sphx_glr_histogram_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +random_state = np.random.RandomState(19680801) +X = random_state.randn(10000) + +fig, ax = plt.subplots() +ax.hist(X, bins=25, density=True) +x = np.linspace(-5, 5, 1000) +ax.plot(x, 1 / np.sqrt(2*np.pi) * np.exp(-(x**2)/2), linewidth=4) +ax.set_xticks([]) +ax.set_yticks([]) +fig.savefig("histogram_frontpage.png", dpi=25) # results in 160x120 px image +``` + +## 下载这个示例 + +- [下载python源码: histogram.py](https://matplotlib.org/_downloads/histogram.py) +- [下载Jupyter notebook: histogram.ipynb](https://matplotlib.org/_downloads/histogram.ipynb) diff --git a/Python/matplotlab/gallery/frontpage/membrane.md b/Python/matplotlab/gallery/frontpage/membrane.md new file mode 100644 index 00000000..4701b3cd --- /dev/null +++ b/Python/matplotlab/gallery/frontpage/membrane.md @@ -0,0 +1,29 @@ +# Frontpage 绘图示例 + +此示例再现Frontpage 绘图示例。 + +![Frontpage 绘图示例](https://matplotlib.org/_images/sphx_glr_membrane_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook +import numpy as np + + +with cbook.get_sample_data('membrane.dat') as datafile: + x = np.fromfile(datafile, np.float32) +# 0.0005 is the sample interval + +fig, ax = plt.subplots() +ax.plot(x, linewidth=4) +ax.set_xlim(5000, 6000) +ax.set_ylim(-0.6, 0.1) +ax.set_xticks([]) +ax.set_yticks([]) +fig.savefig("membrane_frontpage.png", dpi=25) # results in 160x120 px image +``` + +## 下载这个示例 + +- [下载python源码: membrane.py](https://matplotlib.org/_downloads/membrane.py) +- [下载Jupyter notebook: membrane.ipynb](https://matplotlib.org/_downloads/membrane.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/affine_image.md b/Python/matplotlab/gallery/images_contours_and_fields/affine_image.md new file mode 100644 index 00000000..774f1c02 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/affine_image.md @@ -0,0 +1,75 @@ +# 图像的仿射变换 + +将仿射变换(Affine2D)预先添加到图像的数据变换允许操纵图像的形状和方向。这是变换链的概念的一个例子。 + +对于支持具有可选仿射变换的draw_image的后端(例如,agg,ps后端),输出的图像应该使其边界与虚线黄色矩形匹配。 + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.transforms as mtransforms + + +def get_image(): + delta = 0.25 + x = y = np.arange(-3.0, 3.0, delta) + X, Y = np.meshgrid(x, y) + Z1 = np.exp(-X**2 - Y**2) + Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) + Z = (Z1 - Z2) + return Z + + +def do_plot(ax, Z, transform): + im = ax.imshow(Z, interpolation='none', + origin='lower', + extent=[-2, 4, -3, 2], clip_on=True) + + trans_data = transform + ax.transData + im.set_transform(trans_data) + + # display intended extent of the image + x1, x2, y1, y2 = im.get_extent() + ax.plot([x1, x2, x2, x1, x1], [y1, y1, y2, y2, y1], "y--", + transform=trans_data) + ax.set_xlim(-5, 5) + ax.set_ylim(-4, 4) + + +# prepare image and figure +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) +Z = get_image() + +# image rotation +do_plot(ax1, Z, mtransforms.Affine2D().rotate_deg(30)) + +# image skew +do_plot(ax2, Z, mtransforms.Affine2D().skew_deg(30, 15)) + +# scale and reflection +do_plot(ax3, Z, mtransforms.Affine2D().scale(-1, .5)) + +# everything and a translation +do_plot(ax4, Z, mtransforms.Affine2D(). + rotate_deg(30).skew_deg(30, 15).scale(-1, .5).translate(.5, -1)) + +plt.show() +``` + +![图像的仿射变换图示](https://matplotlib.org/_images/sphx_glr_affine_image_001.png) + +## 参考 + +此示例中显示了以下函数,方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.transforms.Affine2D +``` + +## 下载这个示例 + +- [下载python源码: affine_image.py](https://matplotlib.org/_downloads/affine_image.py) +- [下载Jupyter notebook: affine_image.ipynb](https://matplotlib.org/_downloads/affine_image.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/barb_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/barb_demo.md new file mode 100644 index 00000000..361707f5 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/barb_demo.md @@ -0,0 +1,71 @@ +# 倒勾图示例 + +倒勾图的示例: + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-5, 5, 5) +X, Y = np.meshgrid(x, x) +U, V = 12 * X, 12 * Y + +data = [(-1.5, .5, -6, -6), + (1, -1, -46, 46), + (-3, -1, 11, -11), + (1, 1.5, 80, 80), + (0.5, 0.25, 25, 15), + (-1.5, -0.5, -5, 40)] + +data = np.array(data, dtype=[('x', np.float32), ('y', np.float32), + ('u', np.float32), ('v', np.float32)]) + +fig1, axs1 = plt.subplots(nrows=2, ncols=2) +# Default parameters, uniform grid +axs1[0, 0].barbs(X, Y, U, V) + +# Arbitrary set of vectors, make them longer and change the pivot point +# (point around which they're rotated) to be the middle +axs1[0, 1].barbs(data['x'], data['y'], data['u'], data['v'], length=8, pivot='middle') + +# Showing colormapping with uniform grid. Fill the circle for an empty barb, +# don't round the values, and change some of the size parameters +axs1[1, 0].barbs(X, Y, U, V, np.sqrt(U * U + V * V), fill_empty=True, rounding=False, + sizes=dict(emptybarb=0.25, spacing=0.2, height=0.3)) + +# Change colors as well as the increments for parts of the barbs +axs1[1, 1].barbs(data['x'], data['y'], data['u'], data['v'], flagcolor='r', + barbcolor=['b', 'g'], flip_barb=True, + barb_increments=dict(half=10, full=20, flag=100)) + +# Masked arrays are also supported +masked_u = np.ma.masked_array(data['u']) +masked_u[4] = 1000 # Bad value that should not be plotted when masked +masked_u[4] = np.ma.masked + +# Identical plot to panel 2 in the first figure, but with the point at +# (0.5, 0.25) missing (masked) +fig2, ax2 = plt.subplots() +ax2.barbs(data['x'], data['y'], masked_u, data['v'], length=8, pivot='middle') + +plt.show() +``` + +![倒勾图示例](https://matplotlib.org/_images/sphx_glr_barb_demo_001.png) + +![倒勾图示例2](https://matplotlib.org/_images/sphx_glr_barb_demo_002.png) + +## 参考 + +此示例中显示了以下函数,方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.barbs +matplotlib.pyplot.barbs +``` + +## 下载这个示例 + +- [下载python源码: barb_demo.py](https://matplotlib.org/_downloads/barb_demo.py) +- [下载Jupyter notebook: barb_demo.ipynb](https://matplotlib.org/_downloads/barb_demo.ipynb) diff --git a/Python/matplotlab/gallery/images_contours_and_fields/barcode_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/barcode_demo.md new file mode 100644 index 00000000..a045d757 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/barcode_demo.md @@ -0,0 +1,48 @@ +# 条形码示例 + +该演示展示了如何生成一维图像或“条形码”。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# the bar +x = np.where(np.random.rand(500) > 0.7, 1.0, 0.0) + +axprops = dict(xticks=[], yticks=[]) +barprops = dict(aspect='auto', cmap=plt.cm.binary, interpolation='nearest') + +fig = plt.figure() + +# a vertical barcode +ax1 = fig.add_axes([0.1, 0.3, 0.1, 0.6], **axprops) +ax1.imshow(x.reshape((-1, 1)), **barprops) + +# a horizontal barcode +ax2 = fig.add_axes([0.3, 0.1, 0.6, 0.1], **axprops) +ax2.imshow(x.reshape((1, -1)), **barprops) + + +plt.show() +``` + +![条形码示例](https://matplotlib.org/_images/sphx_glr_barcode_demo_001.png) + +## 参考 + +此示例中显示了以下函数,方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +``` + +## 下载这个示例 + +- [下载python源码: barcode_demo.py](https://matplotlib.org/_downloads/barcode_demo.py) +- [下载Jupyter notebook: barcode_demo.ipynb](https://matplotlib.org/_downloads/barcode_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/contour_corner_mask.md b/Python/matplotlab/gallery/images_contours_and_fields/contour_corner_mask.md new file mode 100644 index 00000000..b0193637 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/contour_corner_mask.md @@ -0,0 +1,55 @@ +# 等高线角遮盖 + +此示例中显示了以下函数,方法和类的使用: + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Data to plot. +x, y = np.meshgrid(np.arange(7), np.arange(10)) +z = np.sin(0.5 * x) * np.cos(0.52 * y) + +# Mask various z values. +mask = np.zeros_like(z, dtype=bool) +mask[2, 3:5] = True +mask[3:5, 4] = True +mask[7, 2] = True +mask[5, 0] = True +mask[0, 6] = True +z = np.ma.array(z, mask=mask) + +corner_masks = [False, True] +fig, axs = plt.subplots(ncols=2) +for ax, corner_mask in zip(axs, corner_masks): + cs = ax.contourf(x, y, z, corner_mask=corner_mask) + ax.contour(cs, colors='k') + ax.set_title('corner_mask = {0}'.format(corner_mask)) + + # Plot grid. + ax.grid(c='k', ls='-', alpha=0.3) + + # Indicate masked points with red circles. + ax.plot(np.ma.array(x, mask=~mask), y, 'ro') + +plt.show() +``` + +![条形码示例](https://matplotlib.org/_images/sphx_glr_contour_corner_mask_001.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.axes.Axes.contourf +matplotlib.pyplot.contourf +``` + +## 下载这个示例 + +- [下载python源码: contour_corner_mask.py](https://matplotlib.org/_downloads/contour_corner_mask.py) +- [下载Jupyter notebook: contour_corner_mask.ipynb](https://matplotlib.org/_downloads/contour_corner_mask.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/contour_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/contour_demo.md new file mode 100644 index 00000000..f11e417f --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/contour_demo.md @@ -0,0 +1,143 @@ +# 等高线演示 + +演示简单的等高线绘制,图像上的等高线带有等高线的颜色条,并标出等高线。 + +另见[轮廓图像示例](https://matplotlib.org/gallery/images_contours_and_fields/contour_image.html)。 + +```python +import matplotlib +import numpy as np +import matplotlib.cm as cm +import matplotlib.pyplot as plt + + +delta = 0.025 +x = np.arange(-3.0, 3.0, delta) +y = np.arange(-2.0, 2.0, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 +``` + +使用默认颜色创建带有标签的简单等高线图。clabel的内联参数将控制标签是否画在轮廓的线段上,移除标签下面的线。 + +```python +fig, ax = plt.subplots() +CS = ax.contour(X, Y, Z) +ax.clabel(CS, inline=1, fontsize=10) +ax.set_title('Simplest default with labels') +``` + +![等高线演示示例](https://matplotlib.org/_images/sphx_glr_contour_demo_001.png) + +等高线的标签可以手动放置,通过提供位置列表(在数据坐标中)。有关交互式布局,请参见ginput_book_clabel.py。 + +```python +fig, ax = plt.subplots() +CS = ax.contour(X, Y, Z) +manual_locations = [(-1, -1.4), (-0.62, -0.7), (-2, 0.5), (1.7, 1.2), (2.0, 1.4), (2.4, 1.7)] +ax.clabel(CS, inline=1, fontsize=10, manual=manual_locations) +ax.set_title('labels at selected locations') +``` + +![等高线演示示例2](https://matplotlib.org/_images/sphx_glr_contour_demo_002.png) + +你可以强制所有的等高线是相同的颜色。 + +```python +fig, ax = plt.subplots() +CS = ax.contour(X, Y, Z, 6, + colors='k', # negative contours will be dashed by default + ) +ax.clabel(CS, fontsize=9, inline=1) +ax.set_title('Single color - negative contours dashed') +``` + +![等高线演示示例3](https://matplotlib.org/_images/sphx_glr_contour_demo_003.png) + +你可以将负轮廓设置为实线而不是虚线: + +```python +matplotlib.rcParams['contour.negative_linestyle'] = 'solid' +fig, ax = plt.subplots() +CS = ax.contour(X, Y, Z, 6, + colors='k', # negative contours will be dashed by default + ) +ax.clabel(CS, fontsize=9, inline=1) +ax.set_title('Single color - negative contours solid') +``` + +![等高线演示示例4](https://matplotlib.org/_images/sphx_glr_contour_demo_004.png) + +并且你可以手动指定轮廓的颜色。 + +```python +fig, ax = plt.subplots() +CS = ax.contour(X, Y, Z, 6, + linewidths=np.arange(.5, 4, .5), + colors=('r', 'green', 'blue', (1, 1, 0), '#afeeee', '0.5') + ) +ax.clabel(CS, fontsize=9, inline=1) +ax.set_title('Crazy lines') +``` + +![等高线演示示例5](https://matplotlib.org/_images/sphx_glr_contour_demo_005.png) + +也可以使用颜色图来指定颜色;默认的颜色图将用于等高线。 + +```python +fig, ax = plt.subplots() +im = ax.imshow(Z, interpolation='bilinear', origin='lower', + cmap=cm.gray, extent=(-3, 3, -2, 2)) +levels = np.arange(-1.2, 1.6, 0.2) +CS = ax.contour(Z, levels, origin='lower', cmap='flag', + linewidths=2, extent=(-3, 3, -2, 2)) + +# Thicken the zero contour. +zc = CS.collections[6] +plt.setp(zc, linewidth=4) + +ax.clabel(CS, levels[1::2], # label every second level + inline=1, fmt='%1.1f', fontsize=14) + +# make a colorbar for the contour lines +CB = fig.colorbar(CS, shrink=0.8, extend='both') + +ax.set_title('Lines with colorbar') + +# We can still add a colorbar for the image, too. +CBI = fig.colorbar(im, orientation='horizontal', shrink=0.8) + +# This makes the original colorbar look a bit out of place, +# so let's improve its position. + +l, b, w, h = ax.get_position().bounds +ll, bb, ww, hh = CB.ax.get_position().bounds +CB.ax.set_position([ll, b + 0.1*h, ww, h*0.8]) + +plt.show() +``` + +![等高线演示示例6](https://matplotlib.org/_images/sphx_glr_contour_demo_006.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.axes.Axes.clabel +matplotlib.pyplot.clabel +matplotlib.axes.Axes.set_position +matplotlib.axes.Axes.get_position +``` + +## 下载这个示例 + +- [下载python源码: contour_demo.py](https://matplotlib.org/_downloads/contour_demo.py) +- [下载Jupyter notebook: contour_demo.ipynb](https://matplotlib.org/_downloads/contour_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/contour_image.md b/Python/matplotlab/gallery/images_contours_and_fields/contour_image.md new file mode 100644 index 00000000..35fbf900 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/contour_image.md @@ -0,0 +1,110 @@ +# 等高线图像 + +等高线,填充等高线和图像绘制的测试组合。 有关等高线标记,另请参见等高线演示示例。 + +本演示的重点是展示如何在图像上正确展示等高线,以及如何使两者按照需要定向。 特别要注意 [“origin”和“extent”](https://matplotlib.org/tutorials/intermediate/imshow_extent.html) 关键字参数在imshow和contour中的用法。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib import cm + +# Default delta is large because that makes it fast, and it illustrates +# the correct registration between image and contours. +delta = 0.5 + +extent = (-3, 4, -4, 3) + +x = np.arange(-3.0, 4.001, delta) +y = np.arange(-4.0, 3.001, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +# Boost the upper limit to avoid truncation errors. +levels = np.arange(-2.0, 1.601, 0.4) + +norm = cm.colors.Normalize(vmax=abs(Z).max(), vmin=-abs(Z).max()) +cmap = cm.PRGn + +fig, _axs = plt.subplots(nrows=2, ncols=2) +fig.subplots_adjust(hspace=0.3) +axs = _axs.flatten() + +cset1 = axs[0].contourf(X, Y, Z, levels, norm=norm, + cmap=cm.get_cmap(cmap, len(levels) - 1)) +# It is not necessary, but for the colormap, we need only the +# number of levels minus 1. To avoid discretization error, use +# either this number or a large number such as the default (256). + +# If we want lines as well as filled regions, we need to call +# contour separately; don't try to change the edgecolor or edgewidth +# of the polygons in the collections returned by contourf. +# Use levels output from previous call to guarantee they are the same. + +cset2 = axs[0].contour(X, Y, Z, cset1.levels, colors='k') + +# We don't really need dashed contour lines to indicate negative +# regions, so let's turn them off. + +for c in cset2.collections: + c.set_linestyle('solid') + +# It is easier here to make a separate call to contour than +# to set up an array of colors and linewidths. +# We are making a thick green line as a zero contour. +# Specify the zero level as a tuple with only 0 in it. + +cset3 = axs[0].contour(X, Y, Z, (0,), colors='g', linewidths=2) +axs[0].set_title('Filled contours') +fig.colorbar(cset1, ax=axs[0]) + + +axs[1].imshow(Z, extent=extent, cmap=cmap, norm=norm) +axs[1].contour(Z, levels, colors='k', origin='upper', extent=extent) +axs[1].set_title("Image, origin 'upper'") + +axs[2].imshow(Z, origin='lower', extent=extent, cmap=cmap, norm=norm) +axs[2].contour(Z, levels, colors='k', origin='lower', extent=extent) +axs[2].set_title("Image, origin 'lower'") + +# We will use the interpolation "nearest" here to show the actual +# image pixels. +# Note that the contour lines don't extend to the edge of the box. +# This is intentional. The Z values are defined at the center of each +# image pixel (each color block on the following subplot), so the +# domain that is contoured does not extend beyond these pixel centers. +im = axs[3].imshow(Z, interpolation='nearest', extent=extent, + cmap=cmap, norm=norm) +axs[3].contour(Z, levels, colors='k', origin='image', extent=extent) +ylim = axs[3].get_ylim() +axs[3].set_ylim(ylim[::-1]) +axs[3].set_title("Origin from rc, reversed y-axis") +fig.colorbar(im, ax=axs[3]) + +fig.tight_layout() +plt.show() +``` + +![等高线图像示例](https://matplotlib.org/_images/sphx_glr_contour_image_001.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colors.Normalize +``` + +## 下载这个示例 + +- [下载python源码: contour_image.py](https://matplotlib.org/_downloads/contour_image.py) +- [下载Jupyter notebook: contour_image.ipynb](https://matplotlib.org/_downloads/contour_image.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/contour_label_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/contour_label_demo.md new file mode 100644 index 00000000..4450c74c --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/contour_label_demo.md @@ -0,0 +1,111 @@ +# 等高线标签演示 + +说明一些可以用等高线的标签做的更高级的东西。 + +另请参见[轮廓演示示例](https://matplotlib.org/gallery/images_contours_and_fields/contour_demo.html)。 + +```python +import matplotlib +import numpy as np +import matplotlib.ticker as ticker +import matplotlib.pyplot as plt +``` + +定义我们的外观 + +```python +delta = 0.025 +x = np.arange(-3.0, 3.0, delta) +y = np.arange(-2.0, 2.0, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 +``` + +使用创造性的浮动类制作等高线的标签,遵循曼纽尔·梅茨的建议。 + +```python +# Define a class that forces representation of float to look a certain way +# This remove trailing zero so '1.0' becomes '1' + + +class nf(float): + def __repr__(self): + str = '%.1f' % (self.__float__(),) + if str[-1] == '0': + return '%.0f' % self.__float__() + else: + return '%.1f' % self.__float__() + + +# Basic contour plot +fig, ax = plt.subplots() +CS = ax.contour(X, Y, Z) + +# Recast levels to new class +CS.levels = [nf(val) for val in CS.levels] + +# Label levels with specially formatted floats +if plt.rcParams["text.usetex"]: + fmt = r'%r \%%' +else: + fmt = '%r %%' + +ax.clabel(CS, CS.levels, inline=True, fmt=fmt, fontsize=10) +``` + +![等高线标签演示示例](https://matplotlib.org/_images/sphx_glr_contour_label_demo_001.png) + +使用字典用任意字符串标记等高线 + +```python +fig1, ax1 = plt.subplots() + +# Basic contour plot +CS1 = ax1.contour(X, Y, Z) + +fmt = {} +strs = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh'] +for l, s in zip(CS1.levels, strs): + fmt[l] = s + +# Label every other level using strings +ax1.clabel(CS1, CS1.levels[::2], inline=True, fmt=fmt, fontsize=10) +``` + +![等高线标签演示示例2](https://matplotlib.org/_images/sphx_glr_contour_label_demo_002.png) + +使用Formatter来格式化 + +```python +fig2, ax2 = plt.subplots() + +CS2 = ax2.contour(X, Y, 100**Z, locator=plt.LogLocator()) +fmt = ticker.LogFormatterMathtext() +fmt.create_dummy_axis() +ax2.clabel(CS2, CS2.levels, fmt=fmt) +ax2.set_title("$100^Z$") + +plt.show() +``` + +![等高线标签演示示例3](https://matplotlib.org/_images/sphx_glr_contour_label_demo_003.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.axes.Axes.clabel +matplotlib.pyplot.clabel +matplotlib.ticker.LogFormatterMathtext +matplotlib.ticker.TickHelper.create_dummy_axis +``` + +## 下载这个示例 + +- [下载python源码: contour_image.py](https://matplotlib.org/_downloads/contour_image.py) +- [下载Jupyter notebook: contour_image.ipynb](https://matplotlib.org/_downloads/contour_image.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/contourf_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/contourf_demo.md new file mode 100644 index 00000000..804c8eee --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/contourf_demo.md @@ -0,0 +1,136 @@ +# Contourf演示 + +如何使用 [axes.Axes.contourf()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.contourf.html#matplotlib.axes.Axes.contourf) 方法创建填充的等高线图。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +origin = 'lower' + +delta = 0.025 + +x = y = np.arange(-3.0, 3.01, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +nr, nc = Z.shape + +# put NaNs in one corner: +Z[-nr // 6:, -nc // 6:] = np.nan +# contourf will convert these to masked + + +Z = np.ma.array(Z) +# mask another corner: +Z[:nr // 6, :nc // 6] = np.ma.masked + +# mask a circle in the middle: +interior = np.sqrt((X**2) + (Y**2)) < 0.5 +Z[interior] = np.ma.masked + +# We are using automatic selection of contour levels; +# this is usually not such a good idea, because they don't +# occur on nice boundaries, but we do it here for purposes +# of illustration. + +fig1, ax2 = plt.subplots(constrained_layout=True) +CS = ax2.contourf(X, Y, Z, 10, cmap=plt.cm.bone, origin=origin) + +# Note that in the following, we explicitly pass in a subset of +# the contour levels used for the filled contours. Alternatively, +# We could pass in additional levels to provide extra resolution, +# or leave out the levels kwarg to use all of the original levels. + +CS2 = ax2.contour(CS, levels=CS.levels[::2], colors='r', origin=origin) + +ax2.set_title('Nonsense (3 masked regions)') +ax2.set_xlabel('word length anomaly') +ax2.set_ylabel('sentence length anomaly') + +# Make a colorbar for the ContourSet returned by the contourf call. +cbar = fig1.colorbar(CS) +cbar.ax.set_ylabel('verbosity coefficient') +# Add the contour line levels to the colorbar +cbar.add_lines(CS2) + +fig2, ax2 = plt.subplots(constrained_layout=True) +# Now make a contour plot with the levels specified, +# and with the colormap generated automatically from a list +# of colors. +levels = [-1.5, -1, -0.5, 0, 0.5, 1] +CS3 = ax2.contourf(X, Y, Z, levels, + colors=('r', 'g', 'b'), + origin=origin, + extend='both') +# Our data range extends outside the range of levels; make +# data below the lowest contour level yellow, and above the +# highest level cyan: +CS3.cmap.set_under('yellow') +CS3.cmap.set_over('cyan') + +CS4 = ax2.contour(X, Y, Z, levels, + colors=('k',), + linewidths=(3,), + origin=origin) +ax2.set_title('Listed colors (3 masked regions)') +ax2.clabel(CS4, fmt='%2.1f', colors='w', fontsize=14) + +# Notice that the colorbar command gets all the information it +# needs from the ContourSet object, CS3. +fig2.colorbar(CS3) + +# Illustrate all 4 possible "extend" settings: +extends = ["neither", "both", "min", "max"] +cmap = plt.cm.get_cmap("winter") +cmap.set_under("magenta") +cmap.set_over("yellow") +# Note: contouring simply excludes masked or nan regions, so +# instead of using the "bad" colormap value for them, it draws +# nothing at all in them. Therefore the following would have +# no effect: +# cmap.set_bad("red") + +fig, axs = plt.subplots(2, 2, constrained_layout=True) + +for ax, extend in zip(axs.ravel(), extends): + cs = ax.contourf(X, Y, Z, levels, cmap=cmap, extend=extend, origin=origin) + fig.colorbar(cs, ax=ax, shrink=0.9) + ax.set_title("extend = %s" % extend) + ax.locator_params(nbins=4) + +plt.show() +``` + +![Contourf演示](https://matplotlib.org/_images/sphx_glr_contourf_demo_001.png) + +![Contourf演示2](https://matplotlib.org/_images/sphx_glr_contourf_demo_002.png) + +![Contourf演示3](https://matplotlib.org/_images/sphx_glr_contourf_demo_003.png) + +## 参考 + +此示例中显示了以下函数,方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.axes.Axes.contourf +matplotlib.pyplot.contourf +matplotlib.axes.Axes.clabel +matplotlib.pyplot.clabel +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colors.Colormap +matplotlib.colors.Colormap.set_bad +matplotlib.colors.Colormap.set_under +matplotlib.colors.Colormap.set_over +``` + +## 下载这个示例 + +- [下载python源码: contourf_demo.py](https://matplotlib.org/_downloads/contourf_demo.py) +- [下载Jupyter notebook: contourf_demo.ipynb](https://matplotlib.org/_downloads/contourf_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/contourf_hatching.md b/Python/matplotlab/gallery/images_contours_and_fields/contourf_hatching.md new file mode 100644 index 00000000..dd3c3684 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/contourf_hatching.md @@ -0,0 +1,69 @@ +# Contourf 影线法 + +演示填充轮廓图形与阴影模式。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +# invent some numbers, turning the x and y arrays into simple +# 2d arrays, which make combining them together easier. +x = np.linspace(-3, 5, 150).reshape(1, -1) +y = np.linspace(-3, 5, 120).reshape(-1, 1) +z = np.cos(x) + np.sin(y) + +# we no longer need x and y to be 2 dimensional, so flatten them. +x, y = x.flatten(), y.flatten() +``` + +图1:最简单的带彩色条的阴影图 + +```python +fig1, ax1 = plt.subplots() +cs = ax1.contourf(x, y, z, hatches=['-', '/', '\\', '//'], + cmap='gray', extend='both', alpha=0.5) +fig1.colorbar(cs) +``` + +![Contourf 影线法](https://matplotlib.org/_images/sphx_glr_contourf_hatching_001.png) + +绘制2:没有带图例的颜色的阴影图 + +```python +fig2, ax2 = plt.subplots() +n_levels = 6 +ax2.contour(x, y, z, n_levels, colors='black', linestyles='-') +cs = ax2.contourf(x, y, z, n_levels, colors='none', + hatches=['.', '/', '\\', None, '\\\\', '*'], + extend='lower') + +# create a legend for the contour set +artists, labels = cs.legend_elements() +ax2.legend(artists, labels, handleheight=2) +plt.show() +``` + +![Contourf 影线法2](https://matplotlib.org/_images/sphx_glr_contourf_hatching_002.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.axes.Axes.contourf +matplotlib.pyplot.contourf +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.axes.Axes.legend +matplotlib.pyplot.legend +matplotlib.contour.ContourSet +matplotlib.contour.ContourSet.legend_elements +``` + +## 下载这个示例 + +- [下载python源码: contourf_hatching.py](https://matplotlib.org/_downloads/contourf_hatching.py) +- [下载Jupyter notebook: contourf_hatching.ipynb](https://matplotlib.org/_downloads/contourf_hatching.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/contourf_log.md b/Python/matplotlab/gallery/images_contours_and_fields/contourf_log.md new file mode 100644 index 00000000..bd09dcec --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/contourf_log.md @@ -0,0 +1,69 @@ +# Contourf 与记录颜色刻度 + +演示在 Contourf 中记录颜色的标度 + +```python +import matplotlib.pyplot as plt +import numpy as np +from numpy import ma +from matplotlib import ticker, cm + +N = 100 +x = np.linspace(-3.0, 3.0, N) +y = np.linspace(-2.0, 2.0, N) + +X, Y = np.meshgrid(x, y) + +# A low hump with a spike coming out. +# Needs to have z/colour axis on a log scale so we see both hump and spike. +# linear scale only shows the spike. +Z1 = np.exp(-(X)**2 - (Y)**2) +Z2 = np.exp(-(X * 10)**2 - (Y * 10)**2) +z = Z1 + 50 * Z2 + +# Put in some negative values (lower left corner) to cause trouble with logs: +z[:5, :5] = -1 + +# The following is not strictly essential, but it will eliminate +# a warning. Comment it out to see the warning. +z = ma.masked_where(z <= 0, z) + + +# Automatic selection of levels works; setting the +# log locator tells contourf to use a log scale: +fig, ax = plt.subplots() +cs = ax.contourf(X, Y, z, locator=ticker.LogLocator(), cmap=cm.PuBu_r) + +# Alternatively, you can manually set the levels +# and the norm: +# lev_exp = np.arange(np.floor(np.log10(z.min())-1), +# np.ceil(np.log10(z.max())+1)) +# levs = np.power(10, lev_exp) +# cs = ax.contourf(X, Y, z, levs, norm=colors.LogNorm()) + +cbar = fig.colorbar(cs) + +plt.show() +``` + +![Contourf 与记录颜色刻度示例](https://matplotlib.org/_images/sphx_glr_contourf_log_001.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.contourf +matplotlib.pyplot.contourf +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.axes.Axes.legend +matplotlib.pyplot.legend +matplotlib.ticker.LogLocator +``` + +## 下载这个示例 + +- [下载python源码: contourf_log.py](https://matplotlib.org/_downloads/contourf_log.py) +- [下载Jupyter notebook: contourf_log.ipynb](https://matplotlib.org/_downloads/contourf_log.ipynb) diff --git a/Python/matplotlab/gallery/images_contours_and_fields/custom_cmap.md b/Python/matplotlab/gallery/images_contours_and_fields/custom_cmap.md new file mode 100644 index 00000000..d858db8f --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/custom_cmap.md @@ -0,0 +1,2 @@ +# 从颜色列表创建颜色图 + diff --git a/Python/matplotlab/gallery/images_contours_and_fields/demo_bboximage.md b/Python/matplotlab/gallery/images_contours_and_fields/demo_bboximage.md new file mode 100644 index 00000000..4669fe9b --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/demo_bboximage.md @@ -0,0 +1,86 @@ +# BboxImage 演示 + +BboxImage可用于根据边界框定位图像。此演示演示如何在text.Text的边界框内显示图像以及如何手动为图像创建边界框。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.image import BboxImage +from matplotlib.transforms import Bbox, TransformedBbox + + +fig, (ax1, ax2) = plt.subplots(ncols=2) + +# ---------------------------- +# Create a BboxImage with Text +# ---------------------------- +txt = ax1.text(0.5, 0.5, "test", size=30, ha="center", color="w") +kwargs = dict() + +bbox_image = BboxImage(txt.get_window_extent, + norm=None, + origin=None, + clip_on=False, + **kwargs + ) +a = np.arange(256).reshape(1, 256)/256. +bbox_image.set_data(a) +ax1.add_artist(bbox_image) + +# ------------------------------------ +# Create a BboxImage for each colormap +# ------------------------------------ +a = np.linspace(0, 1, 256).reshape(1, -1) +a = np.vstack((a, a)) + +# List of all colormaps; skip reversed colormaps. +maps = sorted(m for m in plt.cm.cmap_d if not m.endswith("_r")) + +ncol = 2 +nrow = len(maps)//ncol + 1 + +xpad_fraction = 0.3 +dx = 1./(ncol + xpad_fraction*(ncol - 1)) + +ypad_fraction = 0.3 +dy = 1./(nrow + ypad_fraction*(nrow - 1)) + +for i, m in enumerate(maps): + ix, iy = divmod(i, nrow) + + bbox0 = Bbox.from_bounds(ix*dx*(1 + xpad_fraction), + 1. - iy*dy*(1 + ypad_fraction) - dy, + dx, dy) + bbox = TransformedBbox(bbox0, ax2.transAxes) + + bbox_image = BboxImage(bbox, + cmap=plt.get_cmap(m), + norm=None, + origin=None, + **kwargs + ) + + bbox_image.set_data(a) + ax2.add_artist(bbox_image) + +plt.show() +``` + +![BboxImage 演示](https://matplotlib.org/_images/sphx_glr_demo_bboximage_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.image.BboxImage +matplotlib.transforms.Bbox +matplotlib.transforms.TransformedBbox +matplotlib.text.Text +``` + +## 下载这个示例 + +- [下载python源码: demo_bboximage.py](https://matplotlib.org/_downloads/demo_bboximage.py) +- [下载Jupyter notebook: demo_bboximage.ipynb](https://matplotlib.org/_downloads/demo_bboximage.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/figimage_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/figimage_demo.md new file mode 100644 index 00000000..356f2ba1 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/figimage_demo.md @@ -0,0 +1,36 @@ +# Figimage 演示 + +这说明了在没有轴对象的情况下,直接将图像放置在图形中。 + +```python +import numpy as np +import matplotlib +import matplotlib.pyplot as plt + + +fig = plt.figure() +Z = np.arange(10000).reshape((100, 100)) +Z[:, 50:] = 1 + +im1 = fig.figimage(Z, xo=50, yo=0, origin='lower') +im2 = fig.figimage(Z, xo=100, yo=100, alpha=.8, origin='lower') + +plt.show() +``` + +![Figimage 演示](https://matplotlib.org/_images/sphx_glr_figimage_demo_001.png) + +## 参考 + +本例中显示了下列函数、方法、类和模块的使用: + +```python +matplotlib.figure.Figure +matplotlib.figure.Figure.figimage +matplotlib.pyplot.figimage +``` + +## 下载这个示例 + +- [下载python源码: figimage_demo.py](https://matplotlib.org/_downloads/figimage_demo.py) +- [下载Jupyter notebook: figimage_demo.ipynb](https://matplotlib.org/_downloads/figimage_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/image_annotated_heatmap.md b/Python/matplotlab/gallery/images_contours_and_fields/image_annotated_heatmap.md new file mode 100644 index 00000000..3a3e0573 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/image_annotated_heatmap.md @@ -0,0 +1,275 @@ +# 创建带注释的热度图 + +通常希望将依赖于两个独立变量的数据显示为彩色编码图像图。这通常被称为热度图。如果数据是分类的,则称为分类热度图。Matplotlib的[imshow](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow)功能使得这种图的制作特别容易。 + +以下示例显示如何使用注释创建热图 我们将从一个简单的示例开始,并将其扩展为可用作通用功能。 + +## 一种简单的分类热度图 + +我们可以从定义一些数据开始。我们需要的是一个二维列表或数组,它定义了颜色代码的数据。然后,我们还需要类别的两个列表或数组;当然,这些列表中的元素数量需要沿着各自的轴匹配数据。热度图本身是一个 [imshow](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow) 图,其标签设置为我们所拥有的类别。请注意,重要的是同时设置刻度位置([Set_Xticks](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_xticks.html#matplotlib.axes.Axes.set_xticks))和刻度标签([set_xtick标签](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_xticklabels.html#matplotlib.axes.Axes.set_xticklabels)),否则它们将变得不同步。位置只是升序整数,而节拍标签则是要显示的标签。最后,我们可以通过在每个单元格内创建一个[文本](https://matplotlib.org/api/text_api.html#matplotlib.text.Text)来标记数据本身,以显示该单元格的值。 + +```python +import numpy as np +import matplotlib +import matplotlib.pyplot as plt +# sphinx_gallery_thumbnail_number = 2 + +vegetables = ["cucumber", "tomato", "lettuce", "asparagus", + "potato", "wheat", "barley"] +farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening", + "Agrifun", "Organiculture", "BioGoods Ltd.", "Cornylee Corp."] + +harvest = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0], + [2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0], + [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0], + [0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0], + [0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0], + [1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1], + [0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]]) + + +fig, ax = plt.subplots() +im = ax.imshow(harvest) + +# We want to show all ticks... +ax.set_xticks(np.arange(len(farmers))) +ax.set_yticks(np.arange(len(vegetables))) +# ... and label them with the respective list entries +ax.set_xticklabels(farmers) +ax.set_yticklabels(vegetables) + +# Rotate the tick labels and set their alignment. +plt.setp(ax.get_xticklabels(), rotation=45, ha="right", + rotation_mode="anchor") + +# Loop over data dimensions and create text annotations. +for i in range(len(vegetables)): + for j in range(len(farmers)): + text = ax.text(j, i, harvest[i, j], + ha="center", va="center", color="w") + +ax.set_title("Harvest of local farmers (in tons/year)") +fig.tight_layout() +plt.show() +``` + +![热度图示例](https://matplotlib.org/_images/sphx_glr_image_annotated_heatmap_001.png) + +## 使用辅助函数的编码风格 + +正如在[编码风格](https://matplotlib.org/tutorials/introductory/usage.html#coding-styles)中所讨论的,人们可能希望重用这样的代码,为不同的输入数据和/或不同的轴创建某种热映射。我们创建了一个函数,该函数接受数据以及行和列标签作为输入,并允许用于自定义绘图的参数。 + +在这里,除了上面的内容之外,我们还想创建一个颜色条,并将标签放在热图的上面而不是下面。注释应根据阈值获得不同的颜色,以便与像素颜色形成更好的对比度。最后,我们关闭周围的轴刺,创建一个白线网格来分隔细胞。 + +```python +def heatmap(data, row_labels, col_labels, ax=None, + cbar_kw={}, cbarlabel="", **kwargs): + """ + Create a heatmap from a numpy array and two lists of labels. + + Arguments: + data : A 2D numpy array of shape (N,M) + row_labels : A list or array of length N with the labels + for the rows + col_labels : A list or array of length M with the labels + for the columns + Optional arguments: + ax : A matplotlib.axes.Axes instance to which the heatmap + is plotted. If not provided, use current axes or + create a new one. + cbar_kw : A dictionary with arguments to + :meth:`matplotlib.Figure.colorbar`. + cbarlabel : The label for the colorbar + All other arguments are directly passed on to the imshow call. + """ + + if not ax: + ax = plt.gca() + + # Plot the heatmap + im = ax.imshow(data, **kwargs) + + # Create colorbar + cbar = ax.figure.colorbar(im, ax=ax, **cbar_kw) + cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom") + + # We want to show all ticks... + ax.set_xticks(np.arange(data.shape[1])) + ax.set_yticks(np.arange(data.shape[0])) + # ... and label them with the respective list entries. + ax.set_xticklabels(col_labels) + ax.set_yticklabels(row_labels) + + # Let the horizontal axes labeling appear on top. + ax.tick_params(top=True, bottom=False, + labeltop=True, labelbottom=False) + + # Rotate the tick labels and set their alignment. + plt.setp(ax.get_xticklabels(), rotation=-30, ha="right", + rotation_mode="anchor") + + # Turn spines off and create white grid. + for edge, spine in ax.spines.items(): + spine.set_visible(False) + + ax.set_xticks(np.arange(data.shape[1]+1)-.5, minor=True) + ax.set_yticks(np.arange(data.shape[0]+1)-.5, minor=True) + ax.grid(which="minor", color="w", linestyle='-', linewidth=3) + ax.tick_params(which="minor", bottom=False, left=False) + + return im, cbar + + +def annotate_heatmap(im, data=None, valfmt="{x:.2f}", + textcolors=["black", "white"], + threshold=None, **textkw): + """ + A function to annotate a heatmap. + + Arguments: + im : The AxesImage to be labeled. + Optional arguments: + data : Data used to annotate. If None, the image's data is used. + valfmt : The format of the annotations inside the heatmap. + This should either use the string format method, e.g. + "$ {x:.2f}", or be a :class:`matplotlib.ticker.Formatter`. + textcolors : A list or array of two color specifications. The first is + used for values below a threshold, the second for those + above. + threshold : Value in data units according to which the colors from + textcolors are applied. If None (the default) uses the + middle of the colormap as separation. + + Further arguments are passed on to the created text labels. + """ + + if not isinstance(data, (list, np.ndarray)): + data = im.get_array() + + # Normalize the threshold to the images color range. + if threshold is not None: + threshold = im.norm(threshold) + else: + threshold = im.norm(data.max())/2. + + # Set default alignment to center, but allow it to be + # overwritten by textkw. + kw = dict(horizontalalignment="center", + verticalalignment="center") + kw.update(textkw) + + # Get the formatter in case a string is supplied + if isinstance(valfmt, str): + valfmt = matplotlib.ticker.StrMethodFormatter(valfmt) + + # Loop over the data and create a `Text` for each "pixel". + # Change the text's color depending on the data. + texts = [] + for i in range(data.shape[0]): + for j in range(data.shape[1]): + kw.update(color=textcolors[im.norm(data[i, j]) > threshold]) + text = im.axes.text(j, i, valfmt(data[i, j], None), **kw) + texts.append(text) + + return texts +``` + +以上所述使我们能够保持实际的绘制创作非常紧凑。 + +```python +fig, ax = plt.subplots() + +im, cbar = heatmap(harvest, vegetables, farmers, ax=ax, + cmap="YlGn", cbarlabel="harvest [t/year]") +texts = annotate_heatmap(im, valfmt="{x:.1f} t") + +fig.tight_layout() +plt.show() +``` + +![热度图示例2](https://matplotlib.org/_images/sphx_glr_image_annotated_heatmap_002.png) + +## 一些更复杂的热度图示例 + +在下面的文章中,我们将通过在不同的情况下使用不同的参数来展示前面创建的函数的多样性。 + +```python +np.random.seed(19680801) + +fig, ((ax, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(8, 6)) + +# Replicate the above example with a different font size and colormap. + +im, _ = heatmap(harvest, vegetables, farmers, ax=ax, + cmap="Wistia", cbarlabel="harvest [t/year]") +annotate_heatmap(im, valfmt="{x:.1f}", size=7) + +# Create some new data, give further arguments to imshow (vmin), +# use an integer format on the annotations and provide some colors. + +data = np.random.randint(2, 100, size=(7, 7)) +y = ["Book {}".format(i) for i in range(1, 8)] +x = ["Store {}".format(i) for i in list("ABCDEFG")] +im, _ = heatmap(data, y, x, ax=ax2, vmin=0, + cmap="magma_r", cbarlabel="weekly sold copies") +annotate_heatmap(im, valfmt="{x:d}", size=7, threshold=20, + textcolors=["red", "white"]) + +# Sometimes even the data itself is categorical. Here we use a +# :class:`matplotlib.colors.BoundaryNorm` to get the data into classes +# and use this to colorize the plot, but also to obtain the class +# labels from an array of classes. + +data = np.random.randn(6, 6) +y = ["Prod. {}".format(i) for i in range(10, 70, 10)] +x = ["Cycle {}".format(i) for i in range(1, 7)] + +qrates = np.array(list("ABCDEFG")) +norm = matplotlib.colors.BoundaryNorm(np.linspace(-3.5, 3.5, 8), 7) +fmt = matplotlib.ticker.FuncFormatter(lambda x, pos: qrates[::-1][norm(x)]) + +im, _ = heatmap(data, y, x, ax=ax3, + cmap=plt.get_cmap("PiYG", 7), norm=norm, + cbar_kw=dict(ticks=np.arange(-3, 4), format=fmt), + cbarlabel="Quality Rating") + +annotate_heatmap(im, valfmt=fmt, size=9, fontweight="bold", threshold=-1, + textcolors=["red", "black"]) + +# We can nicely plot a correlation matrix. Since this is bound by -1 and 1, +# we use those as vmin and vmax. We may also remove leading zeros and hide +# the diagonal elements (which are all 1) by using a +# :class:`matplotlib.ticker.FuncFormatter`. + +corr_matrix = np.corrcoef(np.random.rand(6, 5)) +im, _ = heatmap(corr_matrix, vegetables, vegetables, ax=ax4, + cmap="PuOr", vmin=-1, vmax=1, + cbarlabel="correlation coeff.") + + +def func(x, pos): + return "{:.2f}".format(x).replace("0.", ".").replace("1.00", "") + +annotate_heatmap(im, valfmt=matplotlib.ticker.FuncFormatter(func), size=7) + +plt.tight_layout() +plt.show() +``` + +![热度图示例3](https://matplotlib.org/_images/sphx_glr_image_annotated_heatmap_003.png) + +## 参考 + +下面的示例显示了以下函数和方法的用法: + +```python +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +``` + +## 下载这个示例 + +- [下载python源码: image_annotated_heatmap.py](https://matplotlib.org/_downloads/image_annotated_heatmap.py) +- [下载Jupyter notebook: image_annotated_heatmap.ipynb](https://matplotlib.org/_downloads/image_annotated_heatmap.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/image_clip_path.md b/Python/matplotlab/gallery/images_contours_and_fields/image_clip_path.md new file mode 100644 index 00000000..8f8db30c --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/image_clip_path.md @@ -0,0 +1,39 @@ +# 使用补丁剪切图像 + +演示的图像,已被一个圆形补丁裁剪。 + +```python +import matplotlib.pyplot as plt +import matplotlib.patches as patches +import matplotlib.cbook as cbook + + +with cbook.get_sample_data('grace_hopper.png') as image_file: + image = plt.imread(image_file) + +fig, ax = plt.subplots() +im = ax.imshow(image) +patch = patches.Circle((260, 200), radius=200, transform=ax.transData) +im.set_clip_path(patch) + +ax.axis('off') +plt.show() +``` + +![使用补丁剪切图像示例](https://matplotlib.org/_images/sphx_glr_image_clip_path_001.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.artist.Artist.set_clip_path +``` + +## 下载这个示例 + +- [下载python源码: image_clip_path.py](https://matplotlib.org/_downloads/image_clip_path.py) +- [下载Jupyter notebook: image_clip_path.ipynb](https://matplotlib.org/_downloads/image_clip_path.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/image_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/image_demo.md new file mode 100644 index 00000000..7f7116c6 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/image_demo.md @@ -0,0 +1,176 @@ +# 图像演示 + +在Matplotlib中绘制图像的许多方法。 + +在Matplotlib中绘制图像最常见的方法是使用 [imShow()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow)。下面的示例演示了imShow的许多功能以及您可以创建的许多图像。 + +```python +import numpy as np +import matplotlib.cm as cm +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook +from matplotlib.path import Path +from matplotlib.patches import PathPatch +``` + +首先,我们将生成一个简单的二元正态分布。 + +```python +delta = 0.025 +x = y = np.arange(-3.0, 3.0, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +fig, ax = plt.subplots() +im = ax.imshow(Z, interpolation='bilinear', cmap=cm.RdYlGn, + origin='lower', extent=[-3, 3, -3, 3], + vmax=abs(Z).max(), vmin=-abs(Z).max()) + +plt.show() +``` + +![图像演示示例](https://matplotlib.org/_images/sphx_glr_image_demo_001.png) + +还可以显示图片的图像。 + +```python +# A sample image +with cbook.get_sample_data('ada.png') as image_file: + image = plt.imread(image_file) + +fig, ax = plt.subplots() +ax.imshow(image) +ax.axis('off') # clear x-axis and y-axis + + +# And another image + +w, h = 512, 512 + +with cbook.get_sample_data('ct.raw.gz', asfileobj=True) as datafile: + s = datafile.read() +A = np.fromstring(s, np.uint16).astype(float).reshape((w, h)) +A /= A.max() + +fig, ax = plt.subplots() +extent = (0, 25, 0, 25) +im = ax.imshow(A, cmap=plt.cm.hot, origin='upper', extent=extent) + +markers = [(15.9, 14.5), (16.8, 15)] +x, y = zip(*markers) +ax.plot(x, y, 'o') + +ax.set_title('CT density') + +plt.show() +``` + +![图像演示示例2](https://matplotlib.org/_images/sphx_glr_image_demo_002.png) + +![图像演示示例3](https://matplotlib.org/_images/sphx_glr_image_demo_003.png) + +## 插值图像 + +也可以在显示图像之前对其进行插值。请注意,因为这可能会影响数据的外观,但它有助于实现您想要的外观。下面我们将显示相同的(小)数组,使用三种不同的插值方法进行插值。 + +A[i, j]处的像素的中心绘制在 i + 0.5,i + 0.5 处。 如果使用interpolation ='nearest',则由(i,j) 和 (i + 1, j + 1) 限定的区域将具有相同的颜色。如果使用插值,像素中心的颜色与最近的颜色相同,但其他像素将在相邻像素之间进行插值。 + +早期版本的matplotlib(<0.63)试图通过设置视图限制来隐藏边缘效果,以便它们不可见。最近在antigrain中的一个误差修复,以及利用此修复的matplotlib._image模块中的一个新实现,不再需要它。为了防止边缘效应,在进行插值时,matplotlib._image模块现在用边缘周围相同的像素填充输入数组。 例如,如果你有一个5x5数组,颜色a-y如下: + +``` +a b c d e +f g h i j +k l m n o +p q r s t +u v w x y +``` + +_image模块创建填充数组: + +``` +a a b c d e e +a a b c d e e +f f g h i j j +k k l m n o o +p p q r s t t +o u v w x y y +o u v w x y y +``` + +进行插值/调整大小,然后提取中心区域。这允许你绘制没有边缘效果的阵列的整个范围,例如,使用不同的插值方法将多个不同大小的图像叠加在一起 - 请参阅[图层图像](https://matplotlib.org/gallery/images_contours_and_fields/layer_images.html)。它还意味着性能损失,因为必须创建这个新的临时填充数组。复杂的插值也意味着性能损失,因此如果您需要最大性能或具有非常大的图像,建议插值=“最近”。 + +```python +A = np.random.rand(5, 5) + +fig, axs = plt.subplots(1, 3, figsize=(10, 3)) +for ax, interp in zip(axs, ['nearest', 'bilinear', 'bicubic']): + ax.imshow(A, interpolation=interp) + ax.set_title(interp.capitalize()) + ax.grid(True) + +plt.show() +``` + +![图像演示示例4](https://matplotlib.org/_images/sphx_glr_image_demo_004.png) + +可以使用“原点”参数指定图像应以数组原点 x[0, 0] 绘制在左上角还是右下角。您还可以在matplotLibrary c文件中控制默认设置Image.Source。有关此主题的更多信息,请参阅关于起源和范围的完整指南。 + +```python +x = np.arange(120).reshape((10, 12)) + +interp = 'bilinear' +fig, axs = plt.subplots(nrows=2, sharex=True, figsize=(3, 5)) +axs[0].set_title('blue should be up') +axs[0].imshow(x, origin='upper', interpolation=interp) + +axs[1].set_title('blue should be down') +axs[1].imshow(x, origin='lower', interpolation=interp) +plt.show() +``` + +![图像演示示例5](https://matplotlib.org/_images/sphx_glr_image_demo_005.png) + +最后,我们将使用剪辑路径显示图像。 + +```python +delta = 0.025 +x = y = np.arange(-3.0, 3.0, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +path = Path([[0, 1], [1, 0], [0, -1], [-1, 0], [0, 1]]) +patch = PathPatch(path, facecolor='none') + +fig, ax = plt.subplots() +ax.add_patch(patch) + +im = ax.imshow(Z, interpolation='bilinear', cmap=cm.gray, + origin='lower', extent=[-3, 3, -3, 3], + clip_path=patch, clip_on=True) +im.set_clip_path(patch) + +plt.show() +``` + +![图像演示示例6](https://matplotlib.org/_images/sphx_glr_image_demo_006.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.artist.Artist.set_clip_path +matplotlib.patches.PathPatch +``` + +## 下载这个示例 + +- [下载python源码: image_demo.py](https://matplotlib.org/_downloads/image_demo.py) +- [下载Jupyter notebook: image_demo.ipynb](https://matplotlib.org/_downloads/image_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/image_masked.md b/Python/matplotlab/gallery/images_contours_and_fields/image_masked.md new file mode 100644 index 00000000..f504c010 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/image_masked.md @@ -0,0 +1,94 @@ +# 图像掩码 + +显示与掩码数组输入和范围以外的颜色。 + +第二个子图说明了如何使用边界规范来获得填充轮廓效果。 + +```python +from copy import copy + +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors + +# compute some interesting data +x0, x1 = -5, 5 +y0, y1 = -3, 3 +x = np.linspace(x0, x1, 500) +y = np.linspace(y0, y1, 500) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +# Set up a colormap: +# use copy so that we do not mutate the global colormap instance +palette = copy(plt.cm.gray) +palette.set_over('r', 1.0) +palette.set_under('g', 1.0) +palette.set_bad('b', 1.0) +# Alternatively, we could use +# palette.set_bad(alpha = 0.0) +# to make the bad region transparent. This is the default. +# If you comment out all the palette.set* lines, you will see +# all the defaults; under and over will be colored with the +# first and last colors in the palette, respectively. +Zm = np.ma.masked_where(Z > 1.2, Z) + +# By setting vmin and vmax in the norm, we establish the +# range to which the regular palette color scale is applied. +# Anything above that range is colored based on palette.set_over, etc. + +# set up the Axes objets +fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(6, 5.4)) + +# plot using 'continuous' color map +im = ax1.imshow(Zm, interpolation='bilinear', + cmap=palette, + norm=colors.Normalize(vmin=-1.0, vmax=1.0), + aspect='auto', + origin='lower', + extent=[x0, x1, y0, y1]) +ax1.set_title('Green=low, Red=high, Blue=masked') +cbar = fig.colorbar(im, extend='both', shrink=0.9, ax=ax1) +cbar.set_label('uniform') +for ticklabel in ax1.xaxis.get_ticklabels(): + ticklabel.set_visible(False) + +# Plot using a small number of colors, with unevenly spaced boundaries. +im = ax2.imshow(Zm, interpolation='nearest', + cmap=palette, + norm=colors.BoundaryNorm([-1, -0.5, -0.2, 0, 0.2, 0.5, 1], + ncolors=palette.N), + aspect='auto', + origin='lower', + extent=[x0, x1, y0, y1]) +ax2.set_title('With BoundaryNorm') +cbar = fig.colorbar(im, extend='both', spacing='proportional', + shrink=0.9, ax=ax2) +cbar.set_label('proportional') + +fig.suptitle('imshow, with out-of-range and masked data') +plt.show() +``` + +![图像掩码示例](https://matplotlib.org/_images/sphx_glr_image_masked_001.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colors.BoundaryNorm +matplotlib.colorbar.ColorbarBase.set_label +``` + +## 下载这个示例 + +- [下载python源码: image_masked.py](https://matplotlib.org/_downloads/image_masked.py) +- [下载Jupyter notebook: image_masked.ipynb](https://matplotlib.org/_downloads/image_masked.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/image_nonuniform.md b/Python/matplotlab/gallery/images_contours_and_fields/image_nonuniform.md new file mode 100644 index 00000000..7e5b9d24 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/image_nonuniform.md @@ -0,0 +1,71 @@ +# 不均匀分布图像 + +这说明了非统一图像类。它不是通过AXES方法提供的,但是可以很容易地将它添加到AXIS实例中,如下所示。 + +![图像掩码示例](https://matplotlib.org/_images/sphx_glr_image_nonuniform_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.image import NonUniformImage +from matplotlib import cm + +interp = 'nearest' + +# Linear x array for cell centers: +x = np.linspace(-4, 4, 9) + +# Highly nonlinear x array: +x2 = x**3 + +y = np.linspace(-4, 4, 9) + +z = np.sqrt(x[np.newaxis, :]**2 + y[:, np.newaxis]**2) + +fig, axs = plt.subplots(nrows=2, ncols=2, constrained_layout=True) +fig.suptitle('NonUniformImage class', fontsize='large') +ax = axs[0, 0] +im = NonUniformImage(ax, interpolation=interp, extent=(-4, 4, -4, 4), + cmap=cm.Purples) +im.set_data(x, y, z) +ax.images.append(im) +ax.set_xlim(-4, 4) +ax.set_ylim(-4, 4) +ax.set_title(interp) + +ax = axs[0, 1] +im = NonUniformImage(ax, interpolation=interp, extent=(-64, 64, -4, 4), + cmap=cm.Purples) +im.set_data(x2, y, z) +ax.images.append(im) +ax.set_xlim(-64, 64) +ax.set_ylim(-4, 4) +ax.set_title(interp) + +interp = 'bilinear' + +ax = axs[1, 0] +im = NonUniformImage(ax, interpolation=interp, extent=(-4, 4, -4, 4), + cmap=cm.Purples) +im.set_data(x, y, z) +ax.images.append(im) +ax.set_xlim(-4, 4) +ax.set_ylim(-4, 4) +ax.set_title(interp) + +ax = axs[1, 1] +im = NonUniformImage(ax, interpolation=interp, extent=(-64, 64, -4, 4), + cmap=cm.Purples) +im.set_data(x2, y, z) +ax.images.append(im) +ax.set_xlim(-64, 64) +ax.set_ylim(-4, 4) +ax.set_title(interp) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: image_nonuniform.py](https://matplotlib.org/_downloads/image_nonuniform.py) +- [下载Jupyter notebook: image_nonuniform.ipynb](https://matplotlib.org/_downloads/image_nonuniform.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/image_transparency_blend.md b/Python/matplotlab/gallery/images_contours_and_fields/image_transparency_blend.md new file mode 100644 index 00000000..e6760058 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/image_transparency_blend.md @@ -0,0 +1,136 @@ +# 在二维图像中混合透明和颜色 + +混合透明与颜色,以突出显示部分数据与显示。 + +[matplotlib.pyplot.imshow()](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow)的一个常见用途是绘制二维统计图。虽然imshow可以很容易地将二维矩阵可视化为图像,但它并不容易让您为输出添加透明度。例如,可以绘制统计量(例如t统计量)并根据其p值为每个像素的透明度着色。此示例演示如何使用[matplotlib.colors.Normalize](https://matplotlib.org/api/_as_gen/matplotlib.colors.Normalize.html#matplotlib.colors.Normalize)实现此效果。 请注意,无法直接将alpha值传递给[matplotlib.pyplot.imshow()](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow)。 + +首先我们将生成一些数据,在这种情况下,我们将在二维网格中创建两个2维“blob”。一个blob是正面的,另一个是负面的。 + +```python +# sphinx_gallery_thumbnail_number = 3 +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.colors import Normalize + + +def normal_pdf(x, mean, var): + return np.exp(-(x - mean)**2 / (2*var)) + + +# Generate the space in which the blobs will live +xmin, xmax, ymin, ymax = (0, 100, 0, 100) +n_bins = 100 +xx = np.linspace(xmin, xmax, n_bins) +yy = np.linspace(ymin, ymax, n_bins) + +# Generate the blobs. The range of the values is roughly -.0002 to .0002 +means_high = [20, 50] +means_low = [50, 60] +var = [150, 200] + +gauss_x_high = normal_pdf(xx, means_high[0], var[0]) +gauss_y_high = normal_pdf(yy, means_high[1], var[0]) + +gauss_x_low = normal_pdf(xx, means_low[0], var[1]) +gauss_y_low = normal_pdf(yy, means_low[1], var[1]) + +weights_high = np.array(np.meshgrid(gauss_x_high, gauss_y_high)).prod(0) +weights_low = -1 * np.array(np.meshgrid(gauss_x_low, gauss_y_low)).prod(0) +weights = weights_high + weights_low + +# We'll also create a grey background into which the pixels will fade +greys = np.empty(weights.shape + (3,), dtype=np.uint8) +greys.fill(70) + +# First we'll plot these blobs using only ``imshow``. +vmax = np.abs(weights).max() +vmin = -vmax +cmap = plt.cm.RdYlBu + +fig, ax = plt.subplots() +ax.imshow(greys) +ax.imshow(weights, extent=(xmin, xmax, ymin, ymax), cmap=cmap) +ax.set_axis_off() +``` + +![在二维图像中混合透明和颜色](https://matplotlib.org/_images/sphx_glr_image_transparency_blend_001.png) + +## 混合透明度 + +在使用[matplotlib.pyplot.imshow()](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow)绘制数据时,包含透明度的最简单方法是将二维数据数组转换为RGBA值的三维图像数组。这可以用[matplotlib.colors.Normalize](https://matplotlib.org/api/_as_gen/matplotlib.colors.Normalize.html#matplotlib.colors.Normalize)来实现。例如,我们将创建一个从左向右移动的渐变。 + +```python +# Create an alpha channel of linearly increasing values moving to the right. +alphas = np.ones(weights.shape) +alphas[:, 30:] = np.linspace(1, 0, 70) + +# Normalize the colors b/w 0 and 1, we'll then pass an MxNx4 array to imshow +colors = Normalize(vmin, vmax, clip=True)(weights) +colors = cmap(colors) + +# Now set the alpha channel to the one we created above +colors[..., -1] = alphas + +# Create the figure and image +# Note that the absolute values may be slightly different +fig, ax = plt.subplots() +ax.imshow(greys) +ax.imshow(colors, extent=(xmin, xmax, ymin, ymax)) +ax.set_axis_off() +``` + +![在二维图像中混合透明和颜色2](https://matplotlib.org/_images/sphx_glr_image_transparency_blend_002.png) + +## 使用透明度高亮显示高振幅值 + +最后,我们将重新创建相同的图,但是这一次我们将使用透明来突出显示数据中的极端值。这通常用于突出显示具有较小p值的数据点。我们还将添加等高线,以突出显示图像值。评星 + +```python +# Create an alpha channel based on weight values +# Any value whose absolute value is > .0001 will have zero transparency +alphas = Normalize(0, .3, clip=True)(np.abs(weights)) +alphas = np.clip(alphas, .4, 1) # alpha value clipped at the bottom at .4 + +# Normalize the colors b/w 0 and 1, we'll then pass an MxNx4 array to imshow +colors = Normalize(vmin, vmax)(weights) +colors = cmap(colors) + +# Now set the alpha channel to the one we created above +colors[..., -1] = alphas + +# Create the figure and image +# Note that the absolute values may be slightly different +fig, ax = plt.subplots() +ax.imshow(greys) +ax.imshow(colors, extent=(xmin, xmax, ymin, ymax)) + +# Add contour lines to further highlight different levels. +ax.contour(weights[::-1], levels=[-.1, .1], colors='k', linestyles='-') +ax.set_axis_off() +plt.show() + +ax.contour(weights[::-1], levels=[-.0001, .0001], colors='k', linestyles='-') +ax.set_axis_off() +plt.show() +``` + +![在二维图像中混合透明和颜色3](https://matplotlib.org/_images/sphx_glr_image_transparency_blend_003.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.colors.Normalize +matplotlib.axes.Axes.set_axis_off +``` + +## 下载这个示例 + +- [下载python源码: image_transparency_blend.py](https://matplotlib.org/_downloads/image_transparency_blend.py) +- [下载Jupyter notebook: image_transparency_blend.ipynb](https://matplotlib.org/_downloads/image_transparency_blend.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/image_zcoord.md b/Python/matplotlab/gallery/images_contours_and_fields/image_zcoord.md new file mode 100644 index 00000000..927a9d74 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/image_zcoord.md @@ -0,0 +1,50 @@ +# 修改坐标格式化程序 + +修改坐标格式化程序,以报告给定x和y的最近像素的图像“z”值。这个功能在默认情况下是内置的,但是展示如何自定义Format_coord函数仍然很有用。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +X = 10*np.random.rand(5, 3) + +fig, ax = plt.subplots() +ax.imshow(X, interpolation='nearest') + +numrows, numcols = X.shape + + +def format_coord(x, y): + col = int(x + 0.5) + row = int(y + 0.5) + if col >= 0 and col < numcols and row >= 0 and row < numrows: + z = X[row, col] + return 'x=%1.4f, y=%1.4f, z=%1.4f' % (x, y, z) + else: + return 'x=%1.4f, y=%1.4f' % (x, y) + +ax.format_coord = format_coord +plt.show() +``` + +![修改坐标格式化程序](https://matplotlib.org/_images/sphx_glr_image_zcoord_001.png) + + +## 参考 + +本例中显示了下列函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.format_coord +matplotlib.axes.Axes.imshow +``` + +## 下载这个示例 + +- [下载python源码: image_zcoord.py](https://matplotlib.org/_downloads/image_zcoord.py) +- [下载Jupyter notebook: image_zcoord.ipynb](https://matplotlib.org/_downloads/image_zcoord.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/interpolation_methods.md b/Python/matplotlab/gallery/images_contours_and_fields/interpolation_methods.md new file mode 100644 index 00000000..9c9db442 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/interpolation_methods.md @@ -0,0 +1,50 @@ +# imshow或matshow的插值 + +这个示例显示了[imshow()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow) 和[matshow()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.matshow.html#matplotlib.axes.Axes.matshow)的插值方法之间的区别。 + +如果插值为无,则默认为图像。插值RC参数。如果插值是 “none” ,则不执行插值的Agg,ps和pdf后端。其他后端将默认为“nearest”。 + +对于Agg、ps和pdf后端,当大图像缩小时,``interpolation = 'none'`` 工作得很好,而当小图像被放大时,interpolation = ``'interpolation = 'none'`` 则运行正常。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +methods = [None, 'none', 'nearest', 'bilinear', 'bicubic', 'spline16', + 'spline36', 'hanning', 'hamming', 'hermite', 'kaiser', 'quadric', + 'catrom', 'gaussian', 'bessel', 'mitchell', 'sinc', 'lanczos'] + +# Fixing random state for reproducibility +np.random.seed(19680801) + +grid = np.random.rand(4, 4) + +fig, axs = plt.subplots(nrows=3, ncols=6, figsize=(9.3, 6), + subplot_kw={'xticks': [], 'yticks': []}) + +fig.subplots_adjust(left=0.03, right=0.97, hspace=0.3, wspace=0.05) + +for ax, interp_method in zip(axs.flat, methods): + ax.imshow(grid, interpolation=interp_method, cmap='viridis') + ax.set_title(str(interp_method)) + +plt.tight_layout() +plt.show() +``` + +![imshow或matshow的插值](https://matplotlib.org/_images/sphx_glr_interpolation_methods_001.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +``` + +## 下载这个示例 + +- [下载python源码: interpolation_methods.py](https://matplotlib.org/_downloads/interpolation_methods.py) +- [下载Jupyter notebook: interpolation_methods.ipynb](https://matplotlib.org/_downloads/interpolation_methods.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/irregulardatagrid.md b/Python/matplotlab/gallery/images_contours_and_fields/irregulardatagrid.md new file mode 100644 index 00000000..331995aa --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/irregulardatagrid.md @@ -0,0 +1,99 @@ +# 不规则空间数据的等高线图 + +不规则空间数据在规则网格上插值的等高线图与非结构三角形网格的三棱图的比较。 + +由于 [contour](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.contour.html#matplotlib.axes.Axes.contour) 和 [contourf](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.contourf.html#matplotlib.axes.Axes.contourf) 期望数据存在于规则网格上,因此绘制不规则间隔数据的等高线图需要不同的方法。这两个选项是: + +- 首先将数据插值到常规网格。这可以通过机载装置完成,例如,通过[LinearTriInterpolator](https://matplotlib.org/api/tri_api.html#matplotlib.tri.LinearTriInterpolator)或使用外部功能,例如 通过scipy.interpolate.griddata。然后用常规的等高线绘制插值数据。 +- 直接使用[tricontour](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.tricontour.html#matplotlib.axes.Axes.tricontour)或[tricontourf](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.tricontourf.html#matplotlib.axes.Axes.tricontourf),它将在内部进行三角测量。 + +此示例显示了两种方法。 + +```python +import matplotlib.pyplot as plt +import matplotlib.tri as tri +import numpy as np + +np.random.seed(19680801) +npts = 200 +ngridx = 100 +ngridy = 200 +x = np.random.uniform(-2, 2, npts) +y = np.random.uniform(-2, 2, npts) +z = x * np.exp(-x**2 - y**2) + +fig, (ax1, ax2) = plt.subplots(nrows=2) + +# ----------------------- +# Interpolation on a grid +# ----------------------- +# A contour plot of irregularly spaced data coordinates +# via interpolation on a grid. + +# Create grid values first. +xi = np.linspace(-2.1, 2.1, ngridx) +yi = np.linspace(-2.1, 2.1, ngridy) + +# Perform linear interpolation of the data (x,y) +# on a grid defined by (xi,yi) +triang = tri.Triangulation(x, y) +interpolator = tri.LinearTriInterpolator(triang, z) +Xi, Yi = np.meshgrid(xi, yi) +zi = interpolator(Xi, Yi) + +# Note that scipy.interpolate provides means to interpolate data on a grid +# as well. The following would be an alternative to the four lines above: +#from scipy.interpolate import griddata +#zi = griddata((x, y), z, (xi[None,:], yi[:,None]), method='linear') + + +ax1.contour(xi, yi, zi, levels=14, linewidths=0.5, colors='k') +cntr1 = ax1.contourf(xi, yi, zi, levels=14, cmap="RdBu_r") + +fig.colorbar(cntr1, ax=ax1) +ax1.plot(x, y, 'ko', ms=3) +ax1.axis((-2, 2, -2, 2)) +ax1.set_title('grid and contour (%d points, %d grid points)' % + (npts, ngridx * ngridy)) + + +# ---------- +# Tricontour +# ---------- +# Directly supply the unordered, irregularly spaced coordinates +# to tricontour. + +ax2.tricontour(x, y, z, levels=14, linewidths=0.5, colors='k') +cntr2 = ax2.tricontourf(x, y, z, levels=14, cmap="RdBu_r") + +fig.colorbar(cntr2, ax=ax2) +ax2.plot(x, y, 'ko', ms=3) +ax2.axis((-2, 2, -2, 2)) +ax2.set_title('tricontour (%d points)' % npts) + +plt.subplots_adjust(hspace=0.5) +plt.show() +``` + +![不规则空间数据的等高线图例](https://matplotlib.org/_images/sphx_glr_irregulardatagrid_001.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.axes.Axes.contour +matplotlib.pyplot.contour +matplotlib.axes.Axes.contourf +matplotlib.pyplot.contourf +matplotlib.axes.Axes.tricontour +matplotlib.pyplot.tricontour +matplotlib.axes.Axes.tricontourf +matplotlib.pyplot.tricontourf +``` + +## 下载这个示例 + +- [下载python源码: irregulardatagrid.py](https://matplotlib.org/_downloads/irregulardatagrid.py) +- [下载Jupyter notebook: irregulardatagrid.ipynb](https://matplotlib.org/_downloads/irregulardatagrid.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/layer_images.md b/Python/matplotlab/gallery/images_contours_and_fields/layer_images.md new file mode 100644 index 00000000..e2ad6721 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/layer_images.md @@ -0,0 +1,58 @@ +# 图层图像 + +使用Alpha混合将图像层叠在彼此之上 + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def func3(x, y): + return (1 - x / 2 + x**5 + y**3) * np.exp(-(x**2 + y**2)) + + +# make these smaller to increase the resolution +dx, dy = 0.05, 0.05 + +x = np.arange(-3.0, 3.0, dx) +y = np.arange(-3.0, 3.0, dy) +X, Y = np.meshgrid(x, y) + +# when layering multiple images, the images need to have the same +# extent. This does not mean they need to have the same shape, but +# they both need to render to the same coordinate system determined by +# xmin, xmax, ymin, ymax. Note if you use different interpolations +# for the images their apparent extent could be different due to +# interpolation edge effects + +extent = np.min(x), np.max(x), np.min(y), np.max(y) +fig = plt.figure(frameon=False) + +Z1 = np.add.outer(range(8), range(8)) % 2 # chessboard +im1 = plt.imshow(Z1, cmap=plt.cm.gray, interpolation='nearest', + extent=extent) + +Z2 = func3(X, Y) + +im2 = plt.imshow(Z2, cmap=plt.cm.viridis, alpha=.9, interpolation='bilinear', + extent=extent) + +plt.show() +``` + +![图层图像示例](https://matplotlib.org/_images/sphx_glr_layer_images_001.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +``` + +## 下载这个示例 + +- [下载python源码: layer_images.py](https://matplotlib.org/_downloads/layer_images.py) +- [下载Jupyter notebook: layer_images.ipynb](https://matplotlib.org/_downloads/layer_images.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/matshow.md b/Python/matplotlab/gallery/images_contours_and_fields/matshow.md new file mode 100644 index 00000000..aa957a5b --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/matshow.md @@ -0,0 +1,39 @@ +# Matshow + +简单的 matshow 例子。 + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def samplemat(dims): + """Make a matrix with all zeros and increasing elements on the diagonal""" + aa = np.zeros(dims) + for i in range(min(dims)): + aa[i, i] = i + return aa + + +# Display matrix +plt.matshow(samplemat((15, 15))) + +plt.show() +``` + +![matshow示例](https://matplotlib.org/_images/sphx_glr_matshow_001.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.axes.Axes.matshow +matplotlib.pyplot.matshow +``` + +## 下载这个示例 + +- [下载python源码: matshow.py](https://matplotlib.org/_downloads/matshow.py) +- [下载Jupyter notebook: matshow.ipynb](https://matplotlib.org/_downloads/matshow.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/multi_image.md b/Python/matplotlab/gallery/images_contours_and_fields/multi_image.md new file mode 100644 index 00000000..5f69d83e --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/multi_image.md @@ -0,0 +1,75 @@ +# 多重图像 + +用单一的彩色地图、标准和颜色条制作一组图像。 + +```python +from matplotlib import colors +import matplotlib.pyplot as plt +import numpy as np + +np.random.seed(19680801) +Nr = 3 +Nc = 2 +cmap = "cool" + +fig, axs = plt.subplots(Nr, Nc) +fig.suptitle('Multiple images') + +images = [] +for i in range(Nr): + for j in range(Nc): + # Generate data with a range that varies from one plot to the next. + data = ((1 + i + j) / 10) * np.random.rand(10, 20) * 1e-6 + images.append(axs[i, j].imshow(data, cmap=cmap)) + axs[i, j].label_outer() + +# Find the min and max of all colors for use in setting the color scale. +vmin = min(image.get_array().min() for image in images) +vmax = max(image.get_array().max() for image in images) +norm = colors.Normalize(vmin=vmin, vmax=vmax) +for im in images: + im.set_norm(norm) + +fig.colorbar(images[0], ax=axs, orientation='horizontal', fraction=.1) + + +# Make images respond to changes in the norm of other images (e.g. via the +# "edit axis, curves and images parameters" GUI on Qt), but be careful not to +# recurse infinitely! +def update(changed_image): + for im in images: + if (changed_image.get_cmap() != im.get_cmap() + or changed_image.get_clim() != im.get_clim()): + im.set_cmap(changed_image.get_cmap()) + im.set_clim(changed_image.get_clim()) + + +for im in images: + im.callbacksSM.connect('changed', update) + +plt.show() +``` + +![多重图像示例](https://matplotlib.org/_images/sphx_glr_multi_image_001.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colors.Normalize +matplotlib.cm.ScalarMappable.set_cmap +matplotlib.cm.ScalarMappable.set_norm +matplotlib.cm.ScalarMappable.set_clim +matplotlib.cbook.CallbackRegistry.connect +``` + +## 下载这个示例 + +- [下载python源码: multi_image.py](https://matplotlib.org/_downloads/multi_image.py) +- [下载Jupyter notebook: multi_image.ipynb](https://matplotlib.org/_downloads/multi_image.ipynb) diff --git a/Python/matplotlab/gallery/images_contours_and_fields/pcolor_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/pcolor_demo.md new file mode 100644 index 00000000..58fb4517 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/pcolor_demo.md @@ -0,0 +1,133 @@ +# Pcolor 演示 + +使用[pcolor()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pcolor.html#matplotlib.axes.Axes.pcolor)生成图像。 + +Pcolor允许您生成二维图像样式图。 下面我们将在Matplotlib中展示如何做到这一点。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.colors import LogNorm +``` + +## 一个简单的 pcolor 示例 + +```python +Z = np.random.rand(6, 10) + +fig, (ax0, ax1) = plt.subplots(2, 1) + +c = ax0.pcolor(Z) +ax0.set_title('default: no edges') + +c = ax1.pcolor(Z, edgecolors='k', linewidths=4) +ax1.set_title('thick edges') + +fig.tight_layout() +plt.show() +``` + +![Pcolor 演示](https://matplotlib.org/_images/sphx_glr_pcolor_demo_001.png) + +## 比较pcolor与类似的功能 + +Demonstrates similarities between pcolor(), pcolormesh(), imshow() and pcolorfast() for drawing quadrilateral grids. + +```python +# make these smaller to increase the resolution +dx, dy = 0.15, 0.05 + +# generate 2 2d grids for the x & y bounds +y, x = np.mgrid[slice(-3, 3 + dy, dy), + slice(-3, 3 + dx, dx)] +z = (1 - x / 2. + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2) +# x and y are bounds, so z should be the value *inside* those bounds. +# Therefore, remove the last value from the z array. +z = z[:-1, :-1] +z_min, z_max = -np.abs(z).max(), np.abs(z).max() + +fig, axs = plt.subplots(2, 2) + +ax = axs[0, 0] +c = ax.pcolor(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max) +ax.set_title('pcolor') +# set the limits of the plot to the limits of the data +ax.axis([x.min(), x.max(), y.min(), y.max()]) +fig.colorbar(c, ax=ax) + +ax = axs[0, 1] +c = ax.pcolormesh(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max) +ax.set_title('pcolormesh') +# set the limits of the plot to the limits of the data +ax.axis([x.min(), x.max(), y.min(), y.max()]) +fig.colorbar(c, ax=ax) + +ax = axs[1, 0] +c = ax.imshow(z, cmap='RdBu', vmin=z_min, vmax=z_max, + extent=[x.min(), x.max(), y.min(), y.max()], + interpolation='nearest', origin='lower') +ax.set_title('image (nearest)') +fig.colorbar(c, ax=ax) + +ax = axs[1, 1] +c = ax.pcolorfast(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max) +ax.set_title('pcolorfast') +fig.colorbar(c, ax=ax) + +fig.tight_layout() +plt.show() +``` + +![Pcolor 演示2](https://matplotlib.org/_images/sphx_glr_pcolor_demo_002.png) + +## Pcolor具有对数刻度 + +以下显示了具有对数刻度的pcolor图。 + +```python +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] + +# A low hump with a spike coming out. +# Needs to have z/colour axis on a log scale so we see both hump and spike. +# linear scale only shows the spike. +Z1 = np.exp(-(X)**2 - (Y)**2) +Z2 = np.exp(-(X * 10)**2 - (Y * 10)**2) +Z = Z1 + 50 * Z2 + +fig, (ax0, ax1) = plt.subplots(2, 1) + +c = ax0.pcolor(X, Y, Z, + norm=LogNorm(vmin=Z.min(), vmax=Z.max()), cmap='PuBu_r') +fig.colorbar(c, ax=ax0) + +c = ax1.pcolor(X, Y, Z, cmap='PuBu_r') +fig.colorbar(c, ax=ax1) + +plt.show() +``` + +![Pcolor 演示3](https://matplotlib.org/_images/sphx_glr_pcolor_demo_003.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.pcolor +matplotlib.pyplot.pcolor +matplotlib.axes.Axes.pcolormesh +matplotlib.pyplot.pcolormesh +matplotlib.axes.Axes.pcolorfast +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colors.LogNorm +``` + +## 下载这个示例 + +- [下载python源码: pcolor_demo.py](https://matplotlib.org/_downloads/pcolor_demo.py) +- [下载Jupyter notebook: pcolor_demo.ipynb](https://matplotlib.org/_downloads/pcolor_demo.ipynb) diff --git a/Python/matplotlab/gallery/images_contours_and_fields/pcolormesh_levels.md b/Python/matplotlab/gallery/images_contours_and_fields/pcolormesh_levels.md new file mode 100644 index 00000000..57a5e79f --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/pcolormesh_levels.md @@ -0,0 +1,75 @@ +# pcolormesh + +演示如何组合Normalization和Colormap实例以在[pcolor()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pcolor.html#matplotlib.axes.Axes.pcolor),[pcolormesh()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pcolormesh.html#matplotlib.axes.Axes.pcolormesh)和[imshow()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow)类型图中绘制“级别”,其方式与contour / contourf的levels关键字参数类似。 + +```python +import matplotlib +import matplotlib.pyplot as plt +from matplotlib.colors import BoundaryNorm +from matplotlib.ticker import MaxNLocator +import numpy as np + + +# make these smaller to increase the resolution +dx, dy = 0.05, 0.05 + +# generate 2 2d grids for the x & y bounds +y, x = np.mgrid[slice(1, 5 + dy, dy), + slice(1, 5 + dx, dx)] + +z = np.sin(x)**10 + np.cos(10 + y*x) * np.cos(x) + +# x and y are bounds, so z should be the value *inside* those bounds. +# Therefore, remove the last value from the z array. +z = z[:-1, :-1] +levels = MaxNLocator(nbins=15).tick_values(z.min(), z.max()) + + +# pick the desired colormap, sensible levels, and define a normalization +# instance which takes data values and translates those into levels. +cmap = plt.get_cmap('PiYG') +norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True) + +fig, (ax0, ax1) = plt.subplots(nrows=2) + +im = ax0.pcolormesh(x, y, z, cmap=cmap, norm=norm) +fig.colorbar(im, ax=ax0) +ax0.set_title('pcolormesh with levels') + + +# contours are *point* based plots, so convert our bound into point +# centers +cf = ax1.contourf(x[:-1, :-1] + dx/2., + y[:-1, :-1] + dy/2., z, levels=levels, + cmap=cmap) +fig.colorbar(cf, ax=ax1) +ax1.set_title('contourf with levels') + +# adjust spacing between subplots so `ax1` title and `ax0` tick labels +# don't overlap +fig.tight_layout() + +plt.show() +``` + +![pcolormesh示例](https://matplotlib.org/_images/sphx_glr_pcolormesh_levels_001.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +matplotlib.axes.Axes.pcolormesh +matplotlib.pyplot.pcolormesh +matplotlib.axes.Axes.contourf +matplotlib.pyplot.contourf +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.colors.BoundaryNorm +matplotlib.ticker.MaxNLocator +``` + +## 下载这个示例 + +- [下载python源码: pcolormesh_levels.py](https://matplotlib.org/_downloads/pcolormesh_levels.py) +- [下载Jupyter notebook: pcolormesh_levels.ipynb](https://matplotlib.org/_downloads/pcolormesh_levels.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/plot_streamplot.md b/Python/matplotlab/gallery/images_contours_and_fields/plot_streamplot.md new file mode 100644 index 00000000..083d0fda --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/plot_streamplot.md @@ -0,0 +1,91 @@ +# 流图 + +流图或流线图用于显示2D矢量场。此示例显示了 [streamplot()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.streamplot.html#matplotlib.axes.Axes.streamplot) 函数的一些功能: + +- 沿着流线改变颜色。 +- 改变流线的密度。 +- 沿流线改变线宽。 +- 控制流线的起点。 +- 流线跳过蒙面区域和NaN值。 + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec + +w = 3 +Y, X = np.mgrid[-w:w:100j, -w:w:100j] +U = -1 - X**2 + Y +V = 1 + X - Y**2 +speed = np.sqrt(U*U + V*V) + +fig = plt.figure(figsize=(7, 9)) +gs = gridspec.GridSpec(nrows=3, ncols=2, height_ratios=[1, 1, 2]) + +# Varying density along a streamline +ax0 = fig.add_subplot(gs[0, 0]) +ax0.streamplot(X, Y, U, V, density=[0.5, 1]) +ax0.set_title('Varying Density') + +# Varying color along a streamline +ax1 = fig.add_subplot(gs[0, 1]) +strm = ax1.streamplot(X, Y, U, V, color=U, linewidth=2, cmap='autumn') +fig.colorbar(strm.lines) +ax1.set_title('Varying Color') + +# Varying line width along a streamline +ax2 = fig.add_subplot(gs[1, 0]) +lw = 5*speed / speed.max() +ax2.streamplot(X, Y, U, V, density=0.6, color='k', linewidth=lw) +ax2.set_title('Varying Line Width') + +# Controlling the starting points of the streamlines +seed_points = np.array([[-2, -1, 0, 1, 2, -1], [-2, -1, 0, 1, 2, 2]]) + +ax3 = fig.add_subplot(gs[1, 1]) +strm = ax3.streamplot(X, Y, U, V, color=U, linewidth=2, + cmap='autumn', start_points=seed_points.T) +fig.colorbar(strm.lines) +ax3.set_title('Controlling Starting Points') + +# Displaying the starting points with blue symbols. +ax3.plot(seed_points[0], seed_points[1], 'bo') +ax3.axis((-w, w, -w, w)) + +# Create a mask +mask = np.zeros(U.shape, dtype=bool) +mask[40:60, 40:60] = True +U[:20, :20] = np.nan +U = np.ma.array(U, mask=mask) + +ax4 = fig.add_subplot(gs[2:, :]) +ax4.streamplot(X, Y, U, V, color='r') +ax4.set_title('Streamplot with Masking') + +ax4.imshow(~mask, extent=(-w, w, -w, w), alpha=0.5, + interpolation='nearest', cmap='gray', aspect='auto') +ax4.set_aspect('equal') + +plt.tight_layout() +plt.show() +``` + +![流图示例](https://matplotlib.org/_images/sphx_glr_plot_streamplot_001.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.streamplot +matplotlib.pyplot.streamplot +matplotlib.gridspec +matplotlib.gridspec.GridSpec +Total running time of the script: ( 0 minutes 1.403 seconds) +``` + +## 下载这个示例 + +- [下载python源码: plot_streamplot.py](https://matplotlib.org/_downloads/plot_streamplot.py) +- [下载Jupyter notebook: plot_streamplot.ipynb](https://matplotlib.org/_downloads/plot_streamplot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/quadmesh_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/quadmesh_demo.md new file mode 100644 index 00000000..72abd286 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/quadmesh_demo.md @@ -0,0 +1,59 @@ +# QuadMesh 演示 + +[pcolormesh](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pcolormesh.html#matplotlib.axes.Axes.pcolormesh) 使用[QuadMesh](https://matplotlib.org/api/collections_api.html#matplotlib.collections.QuadMesh),一种更快的 [pcolor](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pcolor.html#matplotlib.axes.Axes.pcolor) 泛化,但有一些限制。 + +此演示说明了带有掩码数据的quadmesh中的误差。 + +```python +import copy + +from matplotlib import cm, pyplot as plt +import numpy as np + +n = 12 +x = np.linspace(-1.5, 1.5, n) +y = np.linspace(-1.5, 1.5, n * 2) +X, Y = np.meshgrid(x, y) +Qx = np.cos(Y) - np.cos(X) +Qz = np.sin(Y) + np.sin(X) +Z = np.sqrt(X**2 + Y**2) / 5 +Z = (Z - Z.min()) / (Z.max() - Z.min()) + +# The color array can include masked values. +Zm = np.ma.masked_where(np.abs(Qz) < 0.5 * np.max(Qz), Z) + +fig, axs = plt.subplots(nrows=1, ncols=3) +axs[0].pcolormesh(Qx, Qz, Z, shading='gouraud') +axs[0].set_title('Without masked values') + +# You can control the color of the masked region. We copy the default colormap +# before modifying it. +cmap = copy.copy(cm.get_cmap(plt.rcParams['image.cmap'])) +cmap.set_bad('y', 1.0) +axs[1].pcolormesh(Qx, Qz, Zm, shading='gouraud', cmap=cmap) +axs[1].set_title('With masked values') + +# Or use the default, which is transparent. +axs[2].pcolormesh(Qx, Qz, Zm, shading='gouraud') +axs[2].set_title('With masked values') + +fig.tight_layout() +plt.show() +``` + +![QuadMesh 演示](https://matplotlib.org/_images/sphx_glr_quadmesh_demo_001.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.pcolormesh +matplotlib.pyplot.pcolormesh +``` + +## 下载这个示例 + +- [下载python源码: quadmesh_demo.py](https://matplotlib.org/_downloads/quadmesh_demo.py) +- [下载Jupyter notebook: quadmesh_demo.ipynb](https://matplotlib.org/_downloads/quadmesh_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/quiver_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/quiver_demo.md new file mode 100644 index 00000000..71ca6e73 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/quiver_demo.md @@ -0,0 +1,65 @@ +# 演示高级箭图和箭袋功能 + +为[箭袋]((https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.quiver.html#matplotlib.axes.Axes.quiver))展示一些更高级的选项。有关简单示例,请参阅 [Quiver Simple Demo](https://matplotlib.org/gallery/images_contours_and_fields/quiver_simple_demo.html)。 + +已知问题:自动缩放图未考虑箭头,因此边界上的那些通常不在图中。以完全一般的方式解决这个问题并不容易。解决方法是手动展开Axes对象。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +X, Y = np.meshgrid(np.arange(0, 2 * np.pi, .2), np.arange(0, 2 * np.pi, .2)) +U = np.cos(X) +V = np.sin(Y) +fig1, ax1 = plt.subplots() +ax1.set_title('Arrows scale with plot width, not view') +Q = ax1.quiver(X, Y, U, V, units='width') +qk = ax1.quiverkey(Q, 0.9, 0.9, 2, r'$2 \frac{m}{s}$', labelpos='E', + coordinates='figure') +``` + +![箭图示例](https://matplotlib.org/_images/sphx_glr_quiver_demo_001.png) + +```python +fig2, ax2 = plt.subplots() +ax2.set_title("pivot='mid'; every third arrow; units='inches'") +Q = ax2.quiver(X[::3, ::3], Y[::3, ::3], U[::3, ::3], V[::3, ::3], + pivot='mid', units='inches') +qk = ax2.quiverkey(Q, 0.9, 0.9, 1, r'$1 \frac{m}{s}$', labelpos='E', + coordinates='figure') +ax2.scatter(X[::3, ::3], Y[::3, ::3], color='r', s=5) +``` + +![箭图示例2](https://matplotlib.org/_images/sphx_glr_quiver_demo_002.png) + +```python +fig3, ax3 = plt.subplots() +ax3.set_title("pivot='tip'; scales with x view") +M = np.hypot(U, V) +Q = ax3.quiver(X, Y, U, V, M, units='x', pivot='tip', width=0.022, + scale=1 / 0.15) +qk = ax3.quiverkey(Q, 0.9, 0.9, 1, r'$1 \frac{m}{s}$', labelpos='E', + coordinates='figure') +ax3.scatter(X, Y, color='k', s=5) + +plt.show() +``` + +![箭图示例3](https://matplotlib.org/_images/sphx_glr_quiver_demo_003.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.axes.Axes.quiver +matplotlib.pyplot.quiver +matplotlib.axes.Axes.quiverkey +matplotlib.pyplot.quiverkey +``` + +## 下载这个示例 + +- [下载python源码: quiver_demo.py](https://matplotlib.org/_downloads/quiver_demo.py) +- [下载Jupyter notebook: quiver_demo.ipynb](https://matplotlib.org/_downloads/quiver_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/quiver_simple_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/quiver_simple_demo.md new file mode 100644 index 00000000..6d4ef121 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/quiver_simple_demo.md @@ -0,0 +1,40 @@ +# 箭图简单演示 + +带[quiverkey](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.quiver.html#matplotlib.axes.Axes.quiver)的[箭袋图](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.quiver.html#matplotlib.axes.Axes.quiver)的简单示例。 + +有关更高级的选项,请参阅演示[高级箭袋和quiverkey功能](https://matplotlib.org/gallery/images_contours_and_fields/quiver_demo.html)。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +X = np.arange(-10, 10, 1) +Y = np.arange(-10, 10, 1) +U, V = np.meshgrid(X, Y) + +fig, ax = plt.subplots() +q = ax.quiver(X, Y, U, V) +ax.quiverkey(q, X=0.3, Y=1.1, U=10, + label='Quiver key, length = 10', labelpos='E') + +plt.show() +``` + +![箭图简单演示](https://matplotlib.org/_images/sphx_glr_quiver_simple_demo_001.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.axes.Axes.quiver +matplotlib.pyplot.quiver +matplotlib.axes.Axes.quiverkey +matplotlib.pyplot.quiverkey +``` + +## 下载这个示例 + +- [下载python源码: quiver_simple_demo.py](https://matplotlib.org/_downloads/quiver_simple_demo.py) +- [下载Jupyter notebook: quiver_simple_demo.ipynb](https://matplotlib.org/_downloads/quiver_simple_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/shading_example.md b/Python/matplotlab/gallery/images_contours_and_fields/shading_example.md new file mode 100644 index 00000000..bbb96bd0 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/shading_example.md @@ -0,0 +1,78 @@ +# 着色示例 + +显示如何制作阴影浮雕图的示例,如Mathematica (http://reference.wolfram.com/mathematica/ref/ReliefPlot.html)或通用映射工具(https://gmt.soest.hawaii.edu/) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.colors import LightSource +from matplotlib.cbook import get_sample_data + + +def main(): + # Test data + x, y = np.mgrid[-5:5:0.05, -5:5:0.05] + z = 5 * (np.sqrt(x**2 + y**2) + np.sin(x**2 + y**2)) + + filename = get_sample_data('jacksboro_fault_dem.npz', asfileobj=False) + with np.load(filename) as dem: + elev = dem['elevation'] + + fig = compare(z, plt.cm.copper) + fig.suptitle('HSV Blending Looks Best with Smooth Surfaces', y=0.95) + + fig = compare(elev, plt.cm.gist_earth, ve=0.05) + fig.suptitle('Overlay Blending Looks Best with Rough Surfaces', y=0.95) + + plt.show() + + +def compare(z, cmap, ve=1): + # Create subplots and hide ticks + fig, axs = plt.subplots(ncols=2, nrows=2) + for ax in axs.flat: + ax.set(xticks=[], yticks=[]) + + # Illuminate the scene from the northwest + ls = LightSource(azdeg=315, altdeg=45) + + axs[0, 0].imshow(z, cmap=cmap) + axs[0, 0].set(xlabel='Colormapped Data') + + axs[0, 1].imshow(ls.hillshade(z, vert_exag=ve), cmap='gray') + axs[0, 1].set(xlabel='Illumination Intensity') + + rgb = ls.shade(z, cmap=cmap, vert_exag=ve, blend_mode='hsv') + axs[1, 0].imshow(rgb) + axs[1, 0].set(xlabel='Blend Mode: "hsv" (default)') + + rgb = ls.shade(z, cmap=cmap, vert_exag=ve, blend_mode='overlay') + axs[1, 1].imshow(rgb) + axs[1, 1].set(xlabel='Blend Mode: "overlay"') + + return fig + + +if __name__ == '__main__': + main() +``` + +![着色示例](https://matplotlib.org/_images/sphx_glr_shading_example_001.png) + +![着色示例2](https://matplotlib.org/_images/sphx_glr_shading_example_002.png) + +## 参考 + +本例中显示了以下函数、方法和类的使用: + +```python +import matplotlib +matplotlib.colors.LightSource +matplotlib.axes.Axes.imshow +matplotlib.pyplot.imshow +``` + +## 下载这个示例 + +- [下载python源码: shading_example.py](https://matplotlib.org/_downloads/shading_example.py) +- [下载Jupyter notebook: shading_example.ipynb](https://matplotlib.org/_downloads/shading_example.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/specgram_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/specgram_demo.md new file mode 100644 index 00000000..53c98c55 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/specgram_demo.md @@ -0,0 +1,54 @@ +# 频谱图演示 + +频谱图的演示 ([specgram()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.specgram.html#matplotlib.axes.Axes.specgram))。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + +dt = 0.0005 +t = np.arange(0.0, 20.0, dt) +s1 = np.sin(2 * np.pi * 100 * t) +s2 = 2 * np.sin(2 * np.pi * 400 * t) + +# create a transient "chirp" +mask = np.where(np.logical_and(t > 10, t < 12), 1.0, 0.0) +s2 = s2 * mask + +# add some noise into the mix +nse = 0.01 * np.random.random(size=len(t)) + +x = s1 + s2 + nse # the signal +NFFT = 1024 # the length of the windowing segments +Fs = int(1.0 / dt) # the sampling frequency + +fig, (ax1, ax2) = plt.subplots(nrows=2) +ax1.plot(t, x) +Pxx, freqs, bins, im = ax2.specgram(x, NFFT=NFFT, Fs=Fs, noverlap=900) +# The `specgram` method returns 4 objects. They are: +# - Pxx: the periodogram +# - freqs: the frequency vector +# - bins: the centers of the time bins +# - im: the matplotlib.image.AxesImage instance representing the data in the plot +plt.show() +``` + +![频谱图示例](https://matplotlib.org/_images/sphx_glr_specgram_demo_001.png) + +## 参考 + +此示例中显示了以下函数的使用方法: + +```python +import matplotlib +matplotlib.axes.Axes.specgram +matplotlib.pyplot.specgram +``` + +## 下载这个示例 + +- [下载python源码: specgram_demo.py](https://matplotlib.org/_downloads/specgram_demo.py) +- [下载Jupyter notebook: specgram_demo.ipynb](https://matplotlib.org/_downloads/specgram_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/spy_demos.md b/Python/matplotlab/gallery/images_contours_and_fields/spy_demos.md new file mode 100644 index 00000000..0e715ac1 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/spy_demos.md @@ -0,0 +1,43 @@ +# Spy 演示 + +绘制数组的稀疏模式。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +fig, axs = plt.subplots(2, 2) +ax1 = axs[0, 0] +ax2 = axs[0, 1] +ax3 = axs[1, 0] +ax4 = axs[1, 1] + +x = np.random.randn(20, 20) +x[5, :] = 0. +x[:, 12] = 0. + +ax1.spy(x, markersize=5) +ax2.spy(x, precision=0.1, markersize=5) + +ax3.spy(x) +ax4.spy(x, precision=0.1) + +plt.show() +``` + +![Spy 演示](https://matplotlib.org/_images/sphx_glr_spy_demos_001.png) + +## 参考 + +此示例中显示了以下函数,方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.spy +matplotlib.pyplot.spy +``` + +## 下载这个示例 + +- [下载python源码: spy_demos.py](https://matplotlib.org/_downloads/spy_demos.py) +- [下载Jupyter notebook: spy_demos.ipynb](https://matplotlib.org/_downloads/spy_demos.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/tricontour_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/tricontour_demo.md new file mode 100644 index 00000000..9d09bc7b --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/tricontour_demo.md @@ -0,0 +1,128 @@ +# Tricontour 演示 + +非结构化三角形网格的等高线图。 + +```python +import matplotlib.pyplot as plt +import matplotlib.tri as tri +import numpy as np +``` + +在不指定三角形的情况下创建三角剖分会导致点的Delaunay三角剖分。 + +```python +# First create the x and y coordinates of the points. +n_angles = 48 +n_radii = 8 +min_radius = 0.25 +radii = np.linspace(min_radius, 0.95, n_radii) + +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles + +x = (radii * np.cos(angles)).flatten() +y = (radii * np.sin(angles)).flatten() +z = (np.cos(radii) * np.cos(3 * angles)).flatten() + +# Create the Triangulation; no triangles so Delaunay triangulation created. +triang = tri.Triangulation(x, y) + +# Mask off unwanted triangles. +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) +``` + +pcolor 绘图。 + +```python +fig1, ax1 = plt.subplots() +ax1.set_aspect('equal') +tcf = ax1.tricontourf(triang, z) +fig1.colorbar(tcf) +ax1.tricontour(triang, z, colors='k') +ax1.set_title('Contour plot of Delaunay triangulation') +``` + +![Tricontour 演示](https://matplotlib.org/_images/sphx_glr_tricontour_demo_001.png) + +您可以指定自己的三角剖分而不是执行点的Delaunay三角剖分,其中每个三角形由构成三角形的三个点的索引给出,以顺时针或逆时针方式排序。 + +```python +xy = np.asarray([ + [-0.101, 0.872], [-0.080, 0.883], [-0.069, 0.888], [-0.054, 0.890], + [-0.045, 0.897], [-0.057, 0.895], [-0.073, 0.900], [-0.087, 0.898], + [-0.090, 0.904], [-0.069, 0.907], [-0.069, 0.921], [-0.080, 0.919], + [-0.073, 0.928], [-0.052, 0.930], [-0.048, 0.942], [-0.062, 0.949], + [-0.054, 0.958], [-0.069, 0.954], [-0.087, 0.952], [-0.087, 0.959], + [-0.080, 0.966], [-0.085, 0.973], [-0.087, 0.965], [-0.097, 0.965], + [-0.097, 0.975], [-0.092, 0.984], [-0.101, 0.980], [-0.108, 0.980], + [-0.104, 0.987], [-0.102, 0.993], [-0.115, 1.001], [-0.099, 0.996], + [-0.101, 1.007], [-0.090, 1.010], [-0.087, 1.021], [-0.069, 1.021], + [-0.052, 1.022], [-0.052, 1.017], [-0.069, 1.010], [-0.064, 1.005], + [-0.048, 1.005], [-0.031, 1.005], [-0.031, 0.996], [-0.040, 0.987], + [-0.045, 0.980], [-0.052, 0.975], [-0.040, 0.973], [-0.026, 0.968], + [-0.020, 0.954], [-0.006, 0.947], [ 0.003, 0.935], [ 0.006, 0.926], + [ 0.005, 0.921], [ 0.022, 0.923], [ 0.033, 0.912], [ 0.029, 0.905], + [ 0.017, 0.900], [ 0.012, 0.895], [ 0.027, 0.893], [ 0.019, 0.886], + [ 0.001, 0.883], [-0.012, 0.884], [-0.029, 0.883], [-0.038, 0.879], + [-0.057, 0.881], [-0.062, 0.876], [-0.078, 0.876], [-0.087, 0.872], + [-0.030, 0.907], [-0.007, 0.905], [-0.057, 0.916], [-0.025, 0.933], + [-0.077, 0.990], [-0.059, 0.993]]) +x = np.degrees(xy[:, 0]) +y = np.degrees(xy[:, 1]) +x0 = -5 +y0 = 52 +z = np.exp(-0.01 * ((x - x0) * (x - x0) + (y - y0) * (y - y0))) + +triangles = np.asarray([ + [67, 66, 1], [65, 2, 66], [ 1, 66, 2], [64, 2, 65], [63, 3, 64], + [60, 59, 57], [ 2, 64, 3], [ 3, 63, 4], [ 0, 67, 1], [62, 4, 63], + [57, 59, 56], [59, 58, 56], [61, 60, 69], [57, 69, 60], [ 4, 62, 68], + [ 6, 5, 9], [61, 68, 62], [69, 68, 61], [ 9, 5, 70], [ 6, 8, 7], + [ 4, 70, 5], [ 8, 6, 9], [56, 69, 57], [69, 56, 52], [70, 10, 9], + [54, 53, 55], [56, 55, 53], [68, 70, 4], [52, 56, 53], [11, 10, 12], + [69, 71, 68], [68, 13, 70], [10, 70, 13], [51, 50, 52], [13, 68, 71], + [52, 71, 69], [12, 10, 13], [71, 52, 50], [71, 14, 13], [50, 49, 71], + [49, 48, 71], [14, 16, 15], [14, 71, 48], [17, 19, 18], [17, 20, 19], + [48, 16, 14], [48, 47, 16], [47, 46, 16], [16, 46, 45], [23, 22, 24], + [21, 24, 22], [17, 16, 45], [20, 17, 45], [21, 25, 24], [27, 26, 28], + [20, 72, 21], [25, 21, 72], [45, 72, 20], [25, 28, 26], [44, 73, 45], + [72, 45, 73], [28, 25, 29], [29, 25, 31], [43, 73, 44], [73, 43, 40], + [72, 73, 39], [72, 31, 25], [42, 40, 43], [31, 30, 29], [39, 73, 40], + [42, 41, 40], [72, 33, 31], [32, 31, 33], [39, 38, 72], [33, 72, 38], + [33, 38, 34], [37, 35, 38], [34, 38, 35], [35, 37, 36]]) +``` + +而不是创建Triangulation对象,可以直接将x,y和三角形数组传递给tripcolor。 如果要多次使用相同的三角测量来保存重复计算,最好使用Triangulation对象。 + +```python +fig2, ax2 = plt.subplots() +ax2.set_aspect('equal') +tcf = ax2.tricontourf(x, y, triangles, z) +fig2.colorbar(tcf) +ax2.set_title('Contour plot of user-specified triangulation') +ax2.set_xlabel('Longitude (degrees)') +ax2.set_ylabel('Latitude (degrees)') + +plt.show() +``` + +![Tricontour 演示2](https://matplotlib.org/_images/sphx_glr_tricontour_demo_002.png) + +## 参考 + +此示例中显示了以下函数,方法和类的使用: + +```python +import matplotlib +matplotlib.axes.Axes.tricontourf +matplotlib.pyplot.tricontourf +matplotlib.tri.Triangulation +``` + +## 下载这个示例 + +- [下载python源码: tricontour_demo.py](https://matplotlib.org/_downloads/tricontour_demo.py) +- [下载Jupyter notebook: tricontour_demo.ipynb](https://matplotlib.org/_downloads/tricontour_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/tricontour_smooth_delaunay.md b/Python/matplotlab/gallery/images_contours_and_fields/tricontour_smooth_delaunay.md new file mode 100644 index 00000000..465ce059 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/tricontour_smooth_delaunay.md @@ -0,0 +1,155 @@ +# Tricontour 德洛内三角 + +演示一组随机点的高分辨率三视图;[matplotlib.tri.TriAnalyzer](https://matplotlib.org/api/tri_api.html#matplotlib.tri.TriAnalyzer)用于提高绘图质量。 + +该演示的初始数据点和三角形网格如下: + +- 在[-1, 1] x [-1, 1] 正方形内实例化一组随机点。 +- 然后计算这些点的Delaunay三角剖分,其中一个随机三角形子集由用户隐藏(基于init_mASK_frac参数)。这将模拟无效数据。 + +为获得这类数据集的高分辨率轮廓而提出的通用程序如下: + +1. 使用[matplotlib.tri.TriAnalyzer](https://matplotlib.org/api/tri_api.html#matplotlib.tri.TriAnalyzer)计算扩展掩码,该掩码将从三角剖分的边框中排除形状不佳(平坦)的三角形。将掩码应用于三角剖分(使用SET_MASK)。 +1. 使用[matplotlib.tri.UniformTriRefiner](https://matplotlib.org/api/tri_api.html#matplotlib.tri.UniformTriRefiner)对数据进行细化和插值。 +1. 用[tricontour](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.tricontour.html#matplotlib.axes.Axes.tricontour)绘制精确的数据。 + +```python +from matplotlib.tri import Triangulation, TriAnalyzer, UniformTriRefiner +import matplotlib.pyplot as plt +import matplotlib.cm as cm +import numpy as np + + +#----------------------------------------------------------------------------- +# Analytical test function +#----------------------------------------------------------------------------- +def experiment_res(x, y): + """ An analytic function representing experiment results """ + x = 2. * x + r1 = np.sqrt((0.5 - x)**2 + (0.5 - y)**2) + theta1 = np.arctan2(0.5 - x, 0.5 - y) + r2 = np.sqrt((-x - 0.2)**2 + (-y - 0.2)**2) + theta2 = np.arctan2(-x - 0.2, -y - 0.2) + z = (4 * (np.exp((r1 / 10)**2) - 1) * 30. * np.cos(3 * theta1) + + (np.exp((r2 / 10)**2) - 1) * 30. * np.cos(5 * theta2) + + 2 * (x**2 + y**2)) + return (np.max(z) - z) / (np.max(z) - np.min(z)) + +#----------------------------------------------------------------------------- +# Generating the initial data test points and triangulation for the demo +#----------------------------------------------------------------------------- +# User parameters for data test points +n_test = 200 # Number of test data points, tested from 3 to 5000 for subdiv=3 + +subdiv = 3 # Number of recursive subdivisions of the initial mesh for smooth + # plots. Values >3 might result in a very high number of triangles + # for the refine mesh: new triangles numbering = (4**subdiv)*ntri + +init_mask_frac = 0.0 # Float > 0. adjusting the proportion of + # (invalid) initial triangles which will be masked + # out. Enter 0 for no mask. + +min_circle_ratio = .01 # Minimum circle ratio - border triangles with circle + # ratio below this will be masked if they touch a + # border. Suggested value 0.01; use -1 to keep + # all triangles. + +# Random points +random_gen = np.random.RandomState(seed=19680801) +x_test = random_gen.uniform(-1., 1., size=n_test) +y_test = random_gen.uniform(-1., 1., size=n_test) +z_test = experiment_res(x_test, y_test) + +# meshing with Delaunay triangulation +tri = Triangulation(x_test, y_test) +ntri = tri.triangles.shape[0] + +# Some invalid data are masked out +mask_init = np.zeros(ntri, dtype=bool) +masked_tri = random_gen.randint(0, ntri, int(ntri * init_mask_frac)) +mask_init[masked_tri] = True +tri.set_mask(mask_init) + + +#----------------------------------------------------------------------------- +# Improving the triangulation before high-res plots: removing flat triangles +#----------------------------------------------------------------------------- +# masking badly shaped triangles at the border of the triangular mesh. +mask = TriAnalyzer(tri).get_flat_tri_mask(min_circle_ratio) +tri.set_mask(mask) + +# refining the data +refiner = UniformTriRefiner(tri) +tri_refi, z_test_refi = refiner.refine_field(z_test, subdiv=subdiv) + +# analytical 'results' for comparison +z_expected = experiment_res(tri_refi.x, tri_refi.y) + +# for the demo: loading the 'flat' triangles for plot +flat_tri = Triangulation(x_test, y_test) +flat_tri.set_mask(~mask) + + +#----------------------------------------------------------------------------- +# Now the plots +#----------------------------------------------------------------------------- +# User options for plots +plot_tri = True # plot of base triangulation +plot_masked_tri = True # plot of excessively flat excluded triangles +plot_refi_tri = False # plot of refined triangulation +plot_expected = False # plot of analytical function values for comparison + + +# Graphical options for tricontouring +levels = np.arange(0., 1., 0.025) +cmap = cm.get_cmap(name='Blues', lut=None) + +fig, ax = plt.subplots() +ax.set_aspect('equal') +ax.set_title("Filtering a Delaunay mesh\n" + + "(application to high-resolution tricontouring)") + +# 1) plot of the refined (computed) data contours: +ax.tricontour(tri_refi, z_test_refi, levels=levels, cmap=cmap, + linewidths=[2.0, 0.5, 1.0, 0.5]) +# 2) plot of the expected (analytical) data contours (dashed): +if plot_expected: + ax.tricontour(tri_refi, z_expected, levels=levels, cmap=cmap, + linestyles='--') +# 3) plot of the fine mesh on which interpolation was done: +if plot_refi_tri: + ax.triplot(tri_refi, color='0.97') +# 4) plot of the initial 'coarse' mesh: +if plot_tri: + ax.triplot(tri, color='0.7') +# 4) plot of the unvalidated triangles from naive Delaunay Triangulation: +if plot_masked_tri: + ax.triplot(flat_tri, color='red') + +plt.show() +``` + +![Tricontour 德洛内三角](https://matplotlib.org/_images/sphx_glr_tricontour_smooth_delaunay_001.png) + +## 参考 + +本例中显示了下列函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.tricontour +matplotlib.pyplot.tricontour +matplotlib.axes.Axes.tricontourf +matplotlib.pyplot.tricontourf +matplotlib.axes.Axes.triplot +matplotlib.pyplot.triplot +matplotlib.tri +matplotlib.tri.Triangulation +matplotlib.tri.TriAnalyzer +matplotlib.tri.UniformTriRefiner +``` + +## 下载这个示例 + +- [下载python源码: tricontour_smooth_delaunay.py](https://matplotlib.org/_downloads/tricontour_smooth_delaunay.py) +- [下载Jupyter notebook: tricontour_smooth_delaunay.ipynb](https://matplotlib.org/_downloads/tricontour_smooth_delaunay.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/tricontour_smooth_user.md b/Python/matplotlab/gallery/images_contours_and_fields/tricontour_smooth_user.md new file mode 100644 index 00000000..f6aff358 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/tricontour_smooth_user.md @@ -0,0 +1,98 @@ +# Tricontour Smooth User + +使用 [matplotlib.tri.UniformTriRefiner](https://matplotlib.org/api/tri_api.html#matplotlib.tri.UniformTriRefiner) 在用户定义的三角形网格上演示高分辨率三角形。 + +```python +import matplotlib.tri as tri +import matplotlib.pyplot as plt +import matplotlib.cm as cm +import numpy as np + + +#----------------------------------------------------------------------------- +# Analytical test function +#----------------------------------------------------------------------------- +def function_z(x, y): + """ A function of 2 variables """ + r1 = np.sqrt((0.5 - x)**2 + (0.5 - y)**2) + theta1 = np.arctan2(0.5 - x, 0.5 - y) + r2 = np.sqrt((-x - 0.2)**2 + (-y - 0.2)**2) + theta2 = np.arctan2(-x - 0.2, -y - 0.2) + z = -(2 * (np.exp((r1 / 10)**2) - 1) * 30. * np.cos(7. * theta1) + + (np.exp((r2 / 10)**2) - 1) * 30. * np.cos(11. * theta2) + + 0.7 * (x**2 + y**2)) + return (np.max(z) - z) / (np.max(z) - np.min(z)) + +#----------------------------------------------------------------------------- +# Creating a Triangulation +#----------------------------------------------------------------------------- +# First create the x and y coordinates of the points. +n_angles = 20 +n_radii = 10 +min_radius = 0.15 +radii = np.linspace(min_radius, 0.95, n_radii) + +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles + +x = (radii * np.cos(angles)).flatten() +y = (radii * np.sin(angles)).flatten() +z = function_z(x, y) + +# Now create the Triangulation. +# (Creating a Triangulation without specifying the triangles results in the +# Delaunay triangulation of the points.) +triang = tri.Triangulation(x, y) + +# Mask off unwanted triangles. +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) + +#----------------------------------------------------------------------------- +# Refine data +#----------------------------------------------------------------------------- +refiner = tri.UniformTriRefiner(triang) +tri_refi, z_test_refi = refiner.refine_field(z, subdiv=3) + +#----------------------------------------------------------------------------- +# Plot the triangulation and the high-res iso-contours +#----------------------------------------------------------------------------- +fig, ax = plt.subplots() +ax.set_aspect('equal') +ax.triplot(triang, lw=0.5, color='white') + +levels = np.arange(0., 1., 0.025) +cmap = cm.get_cmap(name='terrain', lut=None) +ax.tricontourf(tri_refi, z_test_refi, levels=levels, cmap=cmap) +ax.tricontour(tri_refi, z_test_refi, levels=levels, + colors=['0.25', '0.5', '0.5', '0.5', '0.5'], + linewidths=[1.0, 0.5, 0.5, 0.5, 0.5]) + +ax.set_title("High-resolution tricontouring") + +plt.show() +``` + +![Tricontour Smooth User](https://matplotlib.org/_images/sphx_glr_tricontour_smooth_user_001.png) + +## 参考 + +本例中显示了下列函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.tricontour +matplotlib.pyplot.tricontour +matplotlib.axes.Axes.tricontourf +matplotlib.pyplot.tricontourf +matplotlib.tri +matplotlib.tri.Triangulation +matplotlib.tri.UniformTriRefiner +``` + +## 下载这个示例 + +- [下载python源码: tricontour_smooth_user.py](https://matplotlib.org/_downloads/tricontour_smooth_user.py) +- [下载Jupyter notebook: tricontour_smooth_user.ipynb](https://matplotlib.org/_downloads/tricontour_smooth_user.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/trigradient_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/trigradient_demo.md new file mode 100644 index 00000000..829a120d --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/trigradient_demo.md @@ -0,0 +1,112 @@ +# Trigradient 演示 + +使用 [matplotlib.tri.CubicTriInterpolator](https://matplotlib.org/api/tri_api.html#matplotlib.tri.CubicTriInterpolator) 演示梯度计算。 + +```python +from matplotlib.tri import ( + Triangulation, UniformTriRefiner, CubicTriInterpolator) +import matplotlib.pyplot as plt +import matplotlib.cm as cm +import numpy as np + + +#----------------------------------------------------------------------------- +# Electrical potential of a dipole +#----------------------------------------------------------------------------- +def dipole_potential(x, y): + """ The electric dipole potential V """ + r_sq = x**2 + y**2 + theta = np.arctan2(y, x) + z = np.cos(theta)/r_sq + return (np.max(z) - z) / (np.max(z) - np.min(z)) + + +#----------------------------------------------------------------------------- +# Creating a Triangulation +#----------------------------------------------------------------------------- +# First create the x and y coordinates of the points. +n_angles = 30 +n_radii = 10 +min_radius = 0.2 +radii = np.linspace(min_radius, 0.95, n_radii) + +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles + +x = (radii*np.cos(angles)).flatten() +y = (radii*np.sin(angles)).flatten() +V = dipole_potential(x, y) + +# Create the Triangulation; no triangles specified so Delaunay triangulation +# created. +triang = Triangulation(x, y) + +# Mask off unwanted triangles. +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) + +#----------------------------------------------------------------------------- +# Refine data - interpolates the electrical potential V +#----------------------------------------------------------------------------- +refiner = UniformTriRefiner(triang) +tri_refi, z_test_refi = refiner.refine_field(V, subdiv=3) + +#----------------------------------------------------------------------------- +# Computes the electrical field (Ex, Ey) as gradient of electrical potential +#----------------------------------------------------------------------------- +tci = CubicTriInterpolator(triang, -V) +# Gradient requested here at the mesh nodes but could be anywhere else: +(Ex, Ey) = tci.gradient(triang.x, triang.y) +E_norm = np.sqrt(Ex**2 + Ey**2) + +#----------------------------------------------------------------------------- +# Plot the triangulation, the potential iso-contours and the vector field +#----------------------------------------------------------------------------- +fig, ax = plt.subplots() +ax.set_aspect('equal') +# Enforce the margins, and enlarge them to give room for the vectors. +ax.use_sticky_edges = False +ax.margins(0.07) + +ax.triplot(triang, color='0.8') + +levels = np.arange(0., 1., 0.01) +cmap = cm.get_cmap(name='hot', lut=None) +ax.tricontour(tri_refi, z_test_refi, levels=levels, cmap=cmap, + linewidths=[2.0, 1.0, 1.0, 1.0]) +# Plots direction of the electrical vector field +ax.quiver(triang.x, triang.y, Ex/E_norm, Ey/E_norm, + units='xy', scale=10., zorder=3, color='blue', + width=0.007, headwidth=3., headlength=4.) + +ax.set_title('Gradient plot: an electrical dipole') +plt.show() +``` + +![Trigradient 演示](https://matplotlib.org/_images/sphx_glr_trigradient_demo_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.tricontour +matplotlib.pyplot.tricontour +matplotlib.axes.Axes.triplot +matplotlib.pyplot.triplot +matplotlib.tri +matplotlib.tri.Triangulation +matplotlib.tri.CubicTriInterpolator +matplotlib.tri.CubicTriInterpolator.gradient +matplotlib.tri.UniformTriRefiner +matplotlib.axes.Axes.quiver +matplotlib.pyplot.quiver +``` + +## 下载这个示例 + +- [下载python源码: trigradient_demo.py](https://matplotlib.org/_downloads/trigradient_demo.py) +- [下载Jupyter notebook: trigradient_demo.ipynb](https://matplotlib.org/_downloads/trigradient_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/triinterp_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/triinterp_demo.md new file mode 100644 index 00000000..e85bf778 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/triinterp_demo.md @@ -0,0 +1,86 @@ +# Triinterp 演示 + +从三角网格到四边形网格的插值。 + +```python +import matplotlib.pyplot as plt +import matplotlib.tri as mtri +import numpy as np + +# Create triangulation. +x = np.asarray([0, 1, 2, 3, 0.5, 1.5, 2.5, 1, 2, 1.5]) +y = np.asarray([0, 0, 0, 0, 1.0, 1.0, 1.0, 2, 2, 3.0]) +triangles = [[0, 1, 4], [1, 2, 5], [2, 3, 6], [1, 5, 4], [2, 6, 5], [4, 5, 7], + [5, 6, 8], [5, 8, 7], [7, 8, 9]] +triang = mtri.Triangulation(x, y, triangles) + +# Interpolate to regularly-spaced quad grid. +z = np.cos(1.5 * x) * np.cos(1.5 * y) +xi, yi = np.meshgrid(np.linspace(0, 3, 20), np.linspace(0, 3, 20)) + +interp_lin = mtri.LinearTriInterpolator(triang, z) +zi_lin = interp_lin(xi, yi) + +interp_cubic_geom = mtri.CubicTriInterpolator(triang, z, kind='geom') +zi_cubic_geom = interp_cubic_geom(xi, yi) + +interp_cubic_min_E = mtri.CubicTriInterpolator(triang, z, kind='min_E') +zi_cubic_min_E = interp_cubic_min_E(xi, yi) + +# Set up the figure +fig, axs = plt.subplots(nrows=2, ncols=2) +axs = axs.flatten() + +# Plot the triangulation. +axs[0].tricontourf(triang, z) +axs[0].triplot(triang, 'ko-') +axs[0].set_title('Triangular grid') + +# Plot linear interpolation to quad grid. +axs[1].contourf(xi, yi, zi_lin) +axs[1].plot(xi, yi, 'k-', lw=0.5, alpha=0.5) +axs[1].plot(xi.T, yi.T, 'k-', lw=0.5, alpha=0.5) +axs[1].set_title("Linear interpolation") + +# Plot cubic interpolation to quad grid, kind=geom +axs[2].contourf(xi, yi, zi_cubic_geom) +axs[2].plot(xi, yi, 'k-', lw=0.5, alpha=0.5) +axs[2].plot(xi.T, yi.T, 'k-', lw=0.5, alpha=0.5) +axs[2].set_title("Cubic interpolation,\nkind='geom'") + +# Plot cubic interpolation to quad grid, kind=min_E +axs[3].contourf(xi, yi, zi_cubic_min_E) +axs[3].plot(xi, yi, 'k-', lw=0.5, alpha=0.5) +axs[3].plot(xi.T, yi.T, 'k-', lw=0.5, alpha=0.5) +axs[3].set_title("Cubic interpolation,\nkind='min_E'") + +fig.tight_layout() +plt.show() +``` + +![Triinterp 演示](https://matplotlib.org/_images/sphx_glr_triinterp_demo_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.tricontourf +matplotlib.pyplot.tricontourf +matplotlib.axes.Axes.triplot +matplotlib.pyplot.triplot +matplotlib.axes.Axes.contourf +matplotlib.pyplot.contourf +matplotlib.axes.Axes.plot +matplotlib.pyplot.plot +matplotlib.tri +matplotlib.tri.LinearTriInterpolator +matplotlib.tri.CubicTriInterpolator +matplotlib.tri.Triangulation +``` + +## 下载这个示例 + +- [下载python源码: triinterp_demo.py](https://matplotlib.org/_downloads/triinterp_demo.py) +- [下载Jupyter notebook: triinterp_demo.ipynb](https://matplotlib.org/_downloads/triinterp_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/tripcolor_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/tripcolor_demo.md new file mode 100644 index 00000000..c2c02c79 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/tripcolor_demo.md @@ -0,0 +1,143 @@ +# Tripcolor Demo + +非结构化三角形网格的伪彩色图。 + +```python +import matplotlib.pyplot as plt +import matplotlib.tri as tri +import numpy as np +``` + +在不指定三角形的情况下创建三角剖分会导致点的Delaunay三角剖分。 + +```python +# First create the x and y coordinates of the points. +n_angles = 36 +n_radii = 8 +min_radius = 0.25 +radii = np.linspace(min_radius, 0.95, n_radii) + +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles + +x = (radii * np.cos(angles)).flatten() +y = (radii * np.sin(angles)).flatten() +z = (np.cos(radii) * np.cos(3 * angles)).flatten() + +# Create the Triangulation; no triangles so Delaunay triangulation created. +triang = tri.Triangulation(x, y) + +# Mask off unwanted triangles. +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) +``` + +tripcolor plot. + +```python +fig1, ax1 = plt.subplots() +ax1.set_aspect('equal') +tpc = ax1.tripcolor(triang, z, shading='flat') +fig1.colorbar(tpc) +ax1.set_title('tripcolor of Delaunay triangulation, flat shading') +``` + +![Tripcolor 演示](https://matplotlib.org/_images/sphx_glr_tripcolor_demo_001.png) + +说明Gouraud阴影。 + +```python +fig2, ax2 = plt.subplots() +ax2.set_aspect('equal') +tpc = ax2.tripcolor(triang, z, shading='gouraud') +fig2.colorbar(tpc) +ax2.set_title('tripcolor of Delaunay triangulation, gouraud shading') +``` + +![Tripcolor 演示2](https://matplotlib.org/_images/sphx_glr_tripcolor_demo_002.png) + +您可以指定自己的三角剖分而不是执行点的Delaunay三角剖分,其中每个三角形由构成三角形的三个点的索引给出,以顺时针或逆时针方式排序。 + +```python +xy = np.asarray([ + [-0.101, 0.872], [-0.080, 0.883], [-0.069, 0.888], [-0.054, 0.890], + [-0.045, 0.897], [-0.057, 0.895], [-0.073, 0.900], [-0.087, 0.898], + [-0.090, 0.904], [-0.069, 0.907], [-0.069, 0.921], [-0.080, 0.919], + [-0.073, 0.928], [-0.052, 0.930], [-0.048, 0.942], [-0.062, 0.949], + [-0.054, 0.958], [-0.069, 0.954], [-0.087, 0.952], [-0.087, 0.959], + [-0.080, 0.966], [-0.085, 0.973], [-0.087, 0.965], [-0.097, 0.965], + [-0.097, 0.975], [-0.092, 0.984], [-0.101, 0.980], [-0.108, 0.980], + [-0.104, 0.987], [-0.102, 0.993], [-0.115, 1.001], [-0.099, 0.996], + [-0.101, 1.007], [-0.090, 1.010], [-0.087, 1.021], [-0.069, 1.021], + [-0.052, 1.022], [-0.052, 1.017], [-0.069, 1.010], [-0.064, 1.005], + [-0.048, 1.005], [-0.031, 1.005], [-0.031, 0.996], [-0.040, 0.987], + [-0.045, 0.980], [-0.052, 0.975], [-0.040, 0.973], [-0.026, 0.968], + [-0.020, 0.954], [-0.006, 0.947], [ 0.003, 0.935], [ 0.006, 0.926], + [ 0.005, 0.921], [ 0.022, 0.923], [ 0.033, 0.912], [ 0.029, 0.905], + [ 0.017, 0.900], [ 0.012, 0.895], [ 0.027, 0.893], [ 0.019, 0.886], + [ 0.001, 0.883], [-0.012, 0.884], [-0.029, 0.883], [-0.038, 0.879], + [-0.057, 0.881], [-0.062, 0.876], [-0.078, 0.876], [-0.087, 0.872], + [-0.030, 0.907], [-0.007, 0.905], [-0.057, 0.916], [-0.025, 0.933], + [-0.077, 0.990], [-0.059, 0.993]]) +x, y = np.rad2deg(xy).T + +triangles = np.asarray([ + [67, 66, 1], [65, 2, 66], [ 1, 66, 2], [64, 2, 65], [63, 3, 64], + [60, 59, 57], [ 2, 64, 3], [ 3, 63, 4], [ 0, 67, 1], [62, 4, 63], + [57, 59, 56], [59, 58, 56], [61, 60, 69], [57, 69, 60], [ 4, 62, 68], + [ 6, 5, 9], [61, 68, 62], [69, 68, 61], [ 9, 5, 70], [ 6, 8, 7], + [ 4, 70, 5], [ 8, 6, 9], [56, 69, 57], [69, 56, 52], [70, 10, 9], + [54, 53, 55], [56, 55, 53], [68, 70, 4], [52, 56, 53], [11, 10, 12], + [69, 71, 68], [68, 13, 70], [10, 70, 13], [51, 50, 52], [13, 68, 71], + [52, 71, 69], [12, 10, 13], [71, 52, 50], [71, 14, 13], [50, 49, 71], + [49, 48, 71], [14, 16, 15], [14, 71, 48], [17, 19, 18], [17, 20, 19], + [48, 16, 14], [48, 47, 16], [47, 46, 16], [16, 46, 45], [23, 22, 24], + [21, 24, 22], [17, 16, 45], [20, 17, 45], [21, 25, 24], [27, 26, 28], + [20, 72, 21], [25, 21, 72], [45, 72, 20], [25, 28, 26], [44, 73, 45], + [72, 45, 73], [28, 25, 29], [29, 25, 31], [43, 73, 44], [73, 43, 40], + [72, 73, 39], [72, 31, 25], [42, 40, 43], [31, 30, 29], [39, 73, 40], + [42, 41, 40], [72, 33, 31], [32, 31, 33], [39, 38, 72], [33, 72, 38], + [33, 38, 34], [37, 35, 38], [34, 38, 35], [35, 37, 36]]) + +xmid = x[triangles].mean(axis=1) +ymid = y[triangles].mean(axis=1) +x0 = -5 +y0 = 52 +zfaces = np.exp(-0.01 * ((xmid - x0) * (xmid - x0) + + (ymid - y0) * (ymid - y0))) +``` + +而不是创建Triangulation对象,可以直接将x,y和三角形数组传递给tripcolor。 如果要多次使用相同的三角测量来保存重复计算,最好使用Triangulation对象。 通过使用facecolors kwarg,可以为每个面指定一个颜色值,而不是每个点指定一个颜色值。 + +```python +fig3, ax3 = plt.subplots() +ax3.set_aspect('equal') +tpc = ax3.tripcolor(x, y, triangles, facecolors=zfaces, edgecolors='k') +fig3.colorbar(tpc) +ax3.set_title('tripcolor of user-specified triangulation') +ax3.set_xlabel('Longitude (degrees)') +ax3.set_ylabel('Latitude (degrees)') + +plt.show() +``` + +![Tripcolor 演示3](https://matplotlib.org/_images/sphx_glr_tripcolor_demo_003.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.tripcolor +matplotlib.pyplot.tripcolor +matplotlib.tri +matplotlib.tri.Triangulation +``` + +## 下载这个示例 + +- [下载python源码: tripcolor_demo.py](https://matplotlib.org/_downloads/tripcolor_demo.py) +- [下载Jupyter notebook: tripcolor_demo.ipynb](https://matplotlib.org/_downloads/tripcolor_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/triplot_demo.md b/Python/matplotlab/gallery/images_contours_and_fields/triplot_demo.md new file mode 100644 index 00000000..e9666994 --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/triplot_demo.md @@ -0,0 +1,122 @@ +# Triplot Demo + +创建和绘制非结构化三角形网格。 + +```python +import matplotlib.pyplot as plt +import matplotlib.tri as tri +import numpy as np +``` + +在不指定三角形的情况下创建三角剖分会导致点的Delaunay三角剖分。 + +```python +# First create the x and y coordinates of the points. +n_angles = 36 +n_radii = 8 +min_radius = 0.25 +radii = np.linspace(min_radius, 0.95, n_radii) + +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles + +x = (radii * np.cos(angles)).flatten() +y = (radii * np.sin(angles)).flatten() + +# Create the Triangulation; no triangles so Delaunay triangulation created. +triang = tri.Triangulation(x, y) + +# Mask off unwanted triangles. +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) +``` + +绘制三角测量。 + +```python +fig1, ax1 = plt.subplots() +ax1.set_aspect('equal') +ax1.triplot(triang, 'bo-', lw=1) +ax1.set_title('triplot of Delaunay triangulation') +``` + +![Triplot 演示](https://matplotlib.org/_images/sphx_glr_triplot_demo_001.png) + +您可以指定自己的三角剖分而不是执行点的Delaunay三角剖分,其中每个三角形由构成三角形的三个点的索引给出,以顺时针或逆时针方式排序。 + +```python +xy = np.asarray([ + [-0.101, 0.872], [-0.080, 0.883], [-0.069, 0.888], [-0.054, 0.890], + [-0.045, 0.897], [-0.057, 0.895], [-0.073, 0.900], [-0.087, 0.898], + [-0.090, 0.904], [-0.069, 0.907], [-0.069, 0.921], [-0.080, 0.919], + [-0.073, 0.928], [-0.052, 0.930], [-0.048, 0.942], [-0.062, 0.949], + [-0.054, 0.958], [-0.069, 0.954], [-0.087, 0.952], [-0.087, 0.959], + [-0.080, 0.966], [-0.085, 0.973], [-0.087, 0.965], [-0.097, 0.965], + [-0.097, 0.975], [-0.092, 0.984], [-0.101, 0.980], [-0.108, 0.980], + [-0.104, 0.987], [-0.102, 0.993], [-0.115, 1.001], [-0.099, 0.996], + [-0.101, 1.007], [-0.090, 1.010], [-0.087, 1.021], [-0.069, 1.021], + [-0.052, 1.022], [-0.052, 1.017], [-0.069, 1.010], [-0.064, 1.005], + [-0.048, 1.005], [-0.031, 1.005], [-0.031, 0.996], [-0.040, 0.987], + [-0.045, 0.980], [-0.052, 0.975], [-0.040, 0.973], [-0.026, 0.968], + [-0.020, 0.954], [-0.006, 0.947], [ 0.003, 0.935], [ 0.006, 0.926], + [ 0.005, 0.921], [ 0.022, 0.923], [ 0.033, 0.912], [ 0.029, 0.905], + [ 0.017, 0.900], [ 0.012, 0.895], [ 0.027, 0.893], [ 0.019, 0.886], + [ 0.001, 0.883], [-0.012, 0.884], [-0.029, 0.883], [-0.038, 0.879], + [-0.057, 0.881], [-0.062, 0.876], [-0.078, 0.876], [-0.087, 0.872], + [-0.030, 0.907], [-0.007, 0.905], [-0.057, 0.916], [-0.025, 0.933], + [-0.077, 0.990], [-0.059, 0.993]]) +x = np.degrees(xy[:, 0]) +y = np.degrees(xy[:, 1]) + +triangles = np.asarray([ + [67, 66, 1], [65, 2, 66], [ 1, 66, 2], [64, 2, 65], [63, 3, 64], + [60, 59, 57], [ 2, 64, 3], [ 3, 63, 4], [ 0, 67, 1], [62, 4, 63], + [57, 59, 56], [59, 58, 56], [61, 60, 69], [57, 69, 60], [ 4, 62, 68], + [ 6, 5, 9], [61, 68, 62], [69, 68, 61], [ 9, 5, 70], [ 6, 8, 7], + [ 4, 70, 5], [ 8, 6, 9], [56, 69, 57], [69, 56, 52], [70, 10, 9], + [54, 53, 55], [56, 55, 53], [68, 70, 4], [52, 56, 53], [11, 10, 12], + [69, 71, 68], [68, 13, 70], [10, 70, 13], [51, 50, 52], [13, 68, 71], + [52, 71, 69], [12, 10, 13], [71, 52, 50], [71, 14, 13], [50, 49, 71], + [49, 48, 71], [14, 16, 15], [14, 71, 48], [17, 19, 18], [17, 20, 19], + [48, 16, 14], [48, 47, 16], [47, 46, 16], [16, 46, 45], [23, 22, 24], + [21, 24, 22], [17, 16, 45], [20, 17, 45], [21, 25, 24], [27, 26, 28], + [20, 72, 21], [25, 21, 72], [45, 72, 20], [25, 28, 26], [44, 73, 45], + [72, 45, 73], [28, 25, 29], [29, 25, 31], [43, 73, 44], [73, 43, 40], + [72, 73, 39], [72, 31, 25], [42, 40, 43], [31, 30, 29], [39, 73, 40], + [42, 41, 40], [72, 33, 31], [32, 31, 33], [39, 38, 72], [33, 72, 38], + [33, 38, 34], [37, 35, 38], [34, 38, 35], [35, 37, 36]]) +``` + +而不是创建Triangulation对象,可以直接将x,y和三角形数组传递给triplot。 如果要多次使用相同的三角测量来保存重复计算,最好使用Triangulation对象。 + +```python +fig2, ax2 = plt.subplots() +ax2.set_aspect('equal') +ax2.triplot(x, y, triangles, 'go-', lw=1.0) +ax2.set_title('triplot of user-specified triangulation') +ax2.set_xlabel('Longitude (degrees)') +ax2.set_ylabel('Latitude (degrees)') + +plt.show() +``` + +![Triplot 演示2](https://matplotlib.org/_images/sphx_glr_triplot_demo_002.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.triplot +matplotlib.pyplot.triplot +matplotlib.tri +matplotlib.tri.Triangulation +``` + +## 下载这个示例 + +- [下载python源码: triplot_demo.py](https://matplotlib.org/_downloads/triplot_demo.py) +- [下载Jupyter notebook: triplot_demo.ipynb](https://matplotlib.org/_downloads/triplot_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/images_contours_and_fields/watermark_image.md b/Python/matplotlab/gallery/images_contours_and_fields/watermark_image.md new file mode 100644 index 00000000..71c81c0e --- /dev/null +++ b/Python/matplotlab/gallery/images_contours_and_fields/watermark_image.md @@ -0,0 +1,52 @@ +# 图像水印 + +使用PNG文件作为水印。 + +```python +import numpy as np +import matplotlib.cbook as cbook +import matplotlib.image as image +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +datafile = cbook.get_sample_data('logo2.png', asfileobj=False) +print('loading %s' % datafile) +im = image.imread(datafile) +im[:, :, -1] = 0.5 # set the alpha channel + +fig, ax = plt.subplots() + +ax.plot(np.random.rand(20), '-o', ms=20, lw=2, alpha=0.7, mfc='orange') +ax.grid() +fig.figimage(im, 10, 10, zorder=3) + +plt.show() +``` + +![图像水印示例](https://matplotlib.org/_images/sphx_glr_watermark_image_001.png) + +Out: + +```sh +loading /home/tcaswell/mc3/envs/dd37/lib/python3.7/site-packages/matplotlib/mpl-data/sample_data/logo2.png +``` + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.image +matplotlib.image.imread +matplotlib.pyplot.imread +matplotlib.figure.Figure.figimage +``` + +## 下载这个示例 + +- [下载python源码: watermark_image.py](https://matplotlib.org/_downloads/watermark_image.py) +- [下载Jupyter notebook: watermark_image.ipynb](https://matplotlib.org/_downloads/watermark_image.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/arctest.md b/Python/matplotlab/gallery/lines_bars_and_markers/arctest.md new file mode 100644 index 00000000..b11399de --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/arctest.md @@ -0,0 +1,31 @@ +# 拉弧测试 + +拉弧测试示例。 + +![拉弧测试图示](https://matplotlib.org/_images/sphx_glr_arctest_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def f(t): + 'A damped exponential' + s1 = np.cos(2 * np.pi * t) + e1 = np.exp(-t) + return s1 * e1 + + +t1 = np.arange(0.0, 5.0, .2) + +l = plt.plot(t1, f(t1), 'ro') +plt.setp(l, markersize=30) +plt.setp(l, markerfacecolor='C0') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: arctest.py](https://matplotlib.org/_downloads/arctest.py) +- [下载Jupyter notebook: arctest.ipynb](https://matplotlib.org/_downloads/arctest.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/bar_stacked.md b/Python/matplotlab/gallery/lines_bars_and_markers/bar_stacked.md new file mode 100644 index 00000000..c672e1ee --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/bar_stacked.md @@ -0,0 +1,35 @@ +# 堆积条形图 + +这是使用 [``bar``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.bar.html#matplotlib.pyplot.bar) 创建带有误差线的堆积条形图的示例。注意yerr用于误差条的参数,并且底部用于将``女人``的条形堆叠在``男人``条形的顶部。 + +![堆积条形图示](https://matplotlib.org/_images/sphx_glr_bar_stacked_001.png); + +```python +import numpy as np +import matplotlib.pyplot as plt + +N = 5 +menMeans = (20, 35, 30, 35, 27) +womenMeans = (25, 32, 34, 20, 25) +menStd = (2, 3, 4, 1, 2) +womenStd = (3, 5, 2, 3, 3) +ind = np.arange(N) # the x locations for the groups +width = 0.35 # the width of the bars: can also be len(x) sequence + +p1 = plt.bar(ind, menMeans, width, yerr=menStd) +p2 = plt.bar(ind, womenMeans, width, + bottom=menMeans, yerr=womenStd) + +plt.ylabel('Scores') +plt.title('Scores by group and gender') +plt.xticks(ind, ('G1', 'G2', 'G3', 'G4', 'G5')) +plt.yticks(np.arange(0, 81, 10)) +plt.legend((p1[0], p2[0]), ('Men', 'Women')) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bar_stacked.py](https://matplotlib.org/_downloads/bar_stacked.py) +- [下载Jupyter notebook: bar_stacked.ipynb](https://matplotlib.org/_downloads/bar_stacked.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/barchart.md b/Python/matplotlab/gallery/lines_bars_and_markers/barchart.md new file mode 100644 index 00000000..8d794309 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/barchart.md @@ -0,0 +1,58 @@ +# 条形图 + +这是简单的条形图,在单个条形图上带有误差条形图和高度标签。 + +![简单的条形图示](https://matplotlib.org/_images/sphx_glr_barchart_001.png); + +```python +import numpy as np +import matplotlib.pyplot as plt + +men_means, men_std = (20, 35, 30, 35, 27), (2, 3, 4, 1, 2) +women_means, women_std = (25, 32, 34, 20, 25), (3, 5, 2, 3, 3) + +ind = np.arange(len(men_means)) # the x locations for the groups +width = 0.35 # the width of the bars + +fig, ax = plt.subplots() +rects1 = ax.bar(ind - width/2, men_means, width, yerr=men_std, + color='SkyBlue', label='Men') +rects2 = ax.bar(ind + width/2, women_means, width, yerr=women_std, + color='IndianRed', label='Women') + +# Add some text for labels, title and custom x-axis tick labels, etc. +ax.set_ylabel('Scores') +ax.set_title('Scores by group and gender') +ax.set_xticks(ind) +ax.set_xticklabels(('G1', 'G2', 'G3', 'G4', 'G5')) +ax.legend() + + +def autolabel(rects, xpos='center'): + """ + Attach a text label above each bar in *rects*, displaying its height. + + *xpos* indicates which side to place the text w.r.t. the center of + the bar. It can be one of the following {'center', 'right', 'left'}. + """ + + xpos = xpos.lower() # normalize the case of the parameter + ha = {'center': 'center', 'right': 'left', 'left': 'right'} + offset = {'center': 0.5, 'right': 0.57, 'left': 0.43} # x_txt = x + w*off + + for rect in rects: + height = rect.get_height() + ax.text(rect.get_x() + rect.get_width()*offset[xpos], 1.01*height, + '{}'.format(height), ha=ha[xpos], va='bottom') + + +autolabel(rects1, "left") +autolabel(rects2, "right") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: barchart.py](https://matplotlib.org/_downloads/barchart.py) +- [下载Jupyter notebook: barchart.ipynb](https://matplotlib.org/_downloads/barchart.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/broken_barh.md b/Python/matplotlab/gallery/lines_bars_and_markers/broken_barh.md new file mode 100644 index 00000000..8f9354f1 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/broken_barh.md @@ -0,0 +1,32 @@ +# 破损条形图 + +制作一个“破损”的水平条形图,即一个有间隙的条形图 + +![破损条形图示](https://matplotlib.org/_images/sphx_glr_broken_barh_001.png); + +```python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() +ax.broken_barh([(110, 30), (150, 10)], (10, 9), facecolors='blue') +ax.broken_barh([(10, 50), (100, 20), (130, 10)], (20, 9), + facecolors=('red', 'yellow', 'green')) +ax.set_ylim(5, 35) +ax.set_xlim(0, 200) +ax.set_xlabel('seconds since start') +ax.set_yticks([15, 25]) +ax.set_yticklabels(['Bill', 'Jim']) +ax.grid(True) +ax.annotate('race interrupted', (61, 25), + xytext=(0.8, 0.9), textcoords='axes fraction', + arrowprops=dict(facecolor='black', shrink=0.05), + fontsize=16, + horizontalalignment='right', verticalalignment='top') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: broken_barh.py](https://matplotlib.org/_downloads/broken_barh.py) +- [下载Jupyter notebook: broken_barh.ipynb](https://matplotlib.org/_downloads/broken_barh.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/categorical_variables.md b/Python/matplotlab/gallery/lines_bars_and_markers/categorical_variables.md new file mode 100644 index 00000000..95703a37 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/categorical_variables.md @@ -0,0 +1,43 @@ +# 绘制分类变量 + +如何在Matplotlib中使用分类变量。 + +很多时候你想创建一个在Matplotlib中使用分类变量的图。Matplotlib允许你将分类变量直接传递给许多绘图函数,我们将在下面演示。 + +```python +import matplotlib.pyplot as plt + +data = {'apples': 10, 'oranges': 15, 'lemons': 5, 'limes': 20} +names = list(data.keys()) +values = list(data.values()) + +fig, axs = plt.subplots(1, 3, figsize=(9, 3), sharey=True) +axs[0].bar(names, values) +axs[1].scatter(names, values) +axs[2].plot(names, values) +fig.suptitle('Categorical Plotting') +``` + +![分类变量图示1](https://matplotlib.org/_images/sphx_glr_categorical_variables_001.png); + +这在两个轴上都起作用: + +```python +cat = ["bored", "happy", "bored", "bored", "happy", "bored"] +dog = ["happy", "happy", "happy", "happy", "bored", "bored"] +activity = ["combing", "drinking", "feeding", "napping", "playing", "washing"] + +fig, ax = plt.subplots() +ax.plot(activity, dog, label="dog") +ax.plot(activity, cat, label="cat") +ax.legend() + +plt.show() +``` + +![分类变量图示2](https://matplotlib.org/_images/sphx_glr_categorical_variables_002.png); + +## 下载这个示例 + +- [下载python源码: categorical_variables.py](https://matplotlib.org/_downloads/categorical_variables.py) +- [下载Jupyter notebook: categorical_variables.ipynb](https://matplotlib.org/_downloads/categorical_variables.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/cohere.md b/Python/matplotlab/gallery/lines_bars_and_markers/cohere.md new file mode 100644 index 00000000..260e6bcc --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/cohere.md @@ -0,0 +1,40 @@ +# 绘制两个信号的相干性 + +举例说明如何绘制两个信号的相干性。 + +![绘制两个信号的相干性图示](https://matplotlib.org/_images/sphx_glr_categorical_variables_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + +dt = 0.01 +t = np.arange(0, 30, dt) +nse1 = np.random.randn(len(t)) # white noise 1 +nse2 = np.random.randn(len(t)) # white noise 2 + +# Two signals with a coherent part at 10Hz and a random part +s1 = np.sin(2 * np.pi * 10 * t) + nse1 +s2 = np.sin(2 * np.pi * 10 * t) + nse2 + +fig, axs = plt.subplots(2, 1) +axs[0].plot(t, s1, t, s2) +axs[0].set_xlim(0, 2) +axs[0].set_xlabel('time') +axs[0].set_ylabel('s1 and s2') +axs[0].grid(True) + +cxy, f = axs[1].cohere(s1, s2, 256, 1. / dt) +axs[1].set_ylabel('coherence') + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: cohere.py](https://matplotlib.org/_downloads/cohere.py) +- [下载Jupyter notebook: cohere.ipynb](https://matplotlib.org/_downloads/cohere.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/csd_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/csd_demo.md new file mode 100644 index 00000000..999973cf --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/csd_demo.md @@ -0,0 +1,48 @@ +# 绘制两个信号交叉密度 + +计算两个信号的交叉谱密度 + +![计算两个信号的交叉谱密度图示](https://matplotlib.org/_images/sphx_glr_csd_demo_001.png); + +```python +import numpy as np +import matplotlib.pyplot as plt + + +fig, (ax1, ax2) = plt.subplots(2, 1) +# make a little extra space between the subplots +fig.subplots_adjust(hspace=0.5) + +dt = 0.01 +t = np.arange(0, 30, dt) + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +nse1 = np.random.randn(len(t)) # white noise 1 +nse2 = np.random.randn(len(t)) # white noise 2 +r = np.exp(-t / 0.05) + +cnse1 = np.convolve(nse1, r, mode='same') * dt # colored noise 1 +cnse2 = np.convolve(nse2, r, mode='same') * dt # colored noise 2 + +# two signals with a coherent part and a random part +s1 = 0.01 * np.sin(2 * np.pi * 10 * t) + cnse1 +s2 = 0.01 * np.sin(2 * np.pi * 10 * t) + cnse2 + +ax1.plot(t, s1, t, s2) +ax1.set_xlim(0, 5) +ax1.set_xlabel('time') +ax1.set_ylabel('s1 and s2') +ax1.grid(True) + +cxy, f = ax2.csd(s1, s2, 256, 1. / dt) +ax2.set_ylabel('CSD (db)') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: csd_demo.py](https://matplotlib.org/_downloads/csd_demo.py) +- [下载Jupyter notebook: csd_demo.ipynb](https://matplotlib.org/_downloads/csd_demo.ipynb) diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/errorbar_limits_simple.md b/Python/matplotlab/gallery/lines_bars_and_markers/errorbar_limits_simple.md new file mode 100644 index 00000000..64bcf662 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/errorbar_limits_simple.md @@ -0,0 +1,54 @@ +# 绘制限制型误差条形图 + +误差条上的上限符号和下限符号的说明 + +```python +import numpy as np +import matplotlib.pyplot as plt +``` + +```python +fig = plt.figure(0) +x = np.arange(10.0) +y = np.sin(np.arange(10.0) / 20.0 * np.pi) + +plt.errorbar(x, y, yerr=0.1) + +y = np.sin(np.arange(10.0) / 20.0 * np.pi) + 1 +plt.errorbar(x, y, yerr=0.1, uplims=True) + +y = np.sin(np.arange(10.0) / 20.0 * np.pi) + 2 +upperlimits = np.array([1, 0] * 5) +lowerlimits = np.array([0, 1] * 5) +plt.errorbar(x, y, yerr=0.1, uplims=upperlimits, lolims=lowerlimits) + +plt.xlim(-1, 10) +``` + +![限制型误差条形图示](https://matplotlib.org/_images/sphx_glr_errorbar_limits_simple_000.png); + +```python +fig = plt.figure(1) +x = np.arange(10.0) / 10.0 +y = (x + 0.1)**2 + +plt.errorbar(x, y, xerr=0.1, xlolims=True) +y = (x + 0.1)**3 + +plt.errorbar(x + 0.6, y, xerr=0.1, xuplims=upperlimits, xlolims=lowerlimits) + +y = (x + 0.1)**4 +plt.errorbar(x + 1.2, y, xerr=0.1, xuplims=True) + +plt.xlim(-0.2, 2.4) +plt.ylim(-0.1, 1.3) + +plt.show() +``` + +![限制型误差条形图示2](https://matplotlib.org/_images/sphx_glr_errorbar_limits_simple_002.png); + +## 下载这个示例 + +- [下载python源码: errorbar_limits_simple.py](https://matplotlib.org/_downloads/errorbar_limits_simple.py) +- [下载Jupyter notebook: errorbar_limits_simple.ipynb](https://matplotlib.org/_downloads/errorbar_limits_simple.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/errorbar_subsample.md b/Python/matplotlab/gallery/lines_bars_and_markers/errorbar_subsample.md new file mode 100644 index 00000000..3c4ccac0 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/errorbar_subsample.md @@ -0,0 +1,38 @@ +# 绘制误差条形图子样本 + +演示 errorevery 关键字,以显示数据的完全精度数据图与很少的误差条。 + +![绘制误差条形图子样本](https://matplotlib.org/_images/sphx_glr_errorbar_subsample_001.png); + +```python +import numpy as np +import matplotlib.pyplot as plt + +# example data +x = np.arange(0.1, 4, 0.1) +y = np.exp(-x) + +# example variable error bar values +yerr = 0.1 + 0.1 * np.sqrt(x) + + +# Now switch to a more OO interface to exercise more features. +fig, axs = plt.subplots(nrows=1, ncols=2, sharex=True) +ax = axs[0] +ax.errorbar(x, y, yerr=yerr) +ax.set_title('all errorbars') + +ax = axs[1] +ax.errorbar(x, y, yerr=yerr, errorevery=5) +ax.set_title('only every 5th errorbar') + + +fig.suptitle('Errorbar subsampling for better appearance') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: errorbar_subsample.py](https://matplotlib.org/_downloads/errorbar_subsample.py) +- [下载Jupyter notebook: errorbar_subsample.ipynb](https://matplotlib.org/_downloads/errorbar_subsample.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/eventcollection_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/eventcollection_demo.md new file mode 100644 index 00000000..0564a663 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/eventcollection_demo.md @@ -0,0 +1,64 @@ +# 绘制事件集的示例 + +绘制两条曲线,然后使用EventCollections标记每条曲线的相应轴上的x和y数据点的位置 + +![绘制事件集的示例](https://matplotlib.org/_images/sphx_glr_eventcollection_demo_001.png); + +```python +import matplotlib.pyplot as plt +from matplotlib.collections import EventCollection +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# create random data +xdata = np.random.random([2, 10]) + +# split the data into two parts +xdata1 = xdata[0, :] +xdata2 = xdata[1, :] + +# sort the data so it makes clean curves +xdata1.sort() +xdata2.sort() + +# create some y data points +ydata1 = xdata1 ** 2 +ydata2 = 1 - xdata2 ** 3 + +# plot the data +fig = plt.figure() +ax = fig.add_subplot(1, 1, 1) +ax.plot(xdata1, ydata1, 'r', xdata2, ydata2, 'b') + +# create the events marking the x data points +xevents1 = EventCollection(xdata1, color=[1, 0, 0], linelength=0.05) +xevents2 = EventCollection(xdata2, color=[0, 0, 1], linelength=0.05) + +# create the events marking the y data points +yevents1 = EventCollection(ydata1, color=[1, 0, 0], linelength=0.05, + orientation='vertical') +yevents2 = EventCollection(ydata2, color=[0, 0, 1], linelength=0.05, + orientation='vertical') + +# add the events to the axis +ax.add_collection(xevents1) +ax.add_collection(xevents2) +ax.add_collection(yevents1) +ax.add_collection(yevents2) + +# set the limits +ax.set_xlim([0, 1]) +ax.set_ylim([0, 1]) + +ax.set_title('line plot with data points') + +# display the plot +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: eventcollection_demo.py](https://matplotlib.org/_downloads/eventcollection_demo.py) +- [下载Jupyter notebook: eventcollection_demo.ipynb](https://matplotlib.org/_downloads/eventcollection_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/eventplot_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/eventplot_demo.md new file mode 100644 index 00000000..cf87b33d --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/eventplot_demo.md @@ -0,0 +1,69 @@ +# 绘制plot事件图的示例 + +一个事件图,显示具有各种线属性的事件序列。该图以水平和垂直方向显示。 + +![绘制事件集的示例](https://matplotlib.org/_images/sphx_glr_eventplot_demo_001.png); + +```python +import matplotlib.pyplot as plt +import numpy as np +import matplotlib +matplotlib.rcParams['font.size'] = 8.0 + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# create random data +data1 = np.random.random([6, 50]) + +# set different colors for each set of positions +colors1 = np.array([[1, 0, 0], + [0, 1, 0], + [0, 0, 1], + [1, 1, 0], + [1, 0, 1], + [0, 1, 1]]) + +# set different line properties for each set of positions +# note that some overlap +lineoffsets1 = np.array([-15, -3, 1, 1.5, 6, 10]) +linelengths1 = [5, 2, 1, 1, 3, 1.5] + +fig, axs = plt.subplots(2, 2) + +# create a horizontal plot +axs[0, 0].eventplot(data1, colors=colors1, lineoffsets=lineoffsets1, + linelengths=linelengths1) + +# create a vertical plot +axs[1, 0].eventplot(data1, colors=colors1, lineoffsets=lineoffsets1, + linelengths=linelengths1, orientation='vertical') + +# create another set of random data. +# the gamma distribution is only used fo aesthetic purposes +data2 = np.random.gamma(4, size=[60, 50]) + +# use individual values for the parameters this time +# these values will be used for all data sets (except lineoffsets2, which +# sets the increment between each data set in this usage) +colors2 = [[0, 0, 0]] +lineoffsets2 = 1 +linelengths2 = 1 + +# create a horizontal plot +axs[0, 1].eventplot(data2, colors=colors2, lineoffsets=lineoffsets2, + linelengths=linelengths2) + + +# create a vertical plot +axs[1, 1].eventplot(data2, colors=colors2, lineoffsets=lineoffsets2, + linelengths=linelengths2, orientation='vertical') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: eventplot_demo.py](https://matplotlib.org/_downloads/eventplot_demo.py) +- [下载Jupyter notebook: eventplot_demo.ipynb](https://matplotlib.org/_downloads/eventplot_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/fill.md b/Python/matplotlab/gallery/lines_bars_and_markers/fill.md new file mode 100644 index 00000000..7a0e1a3a --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/fill.md @@ -0,0 +1,50 @@ +# 绘制填充图的示例 + +演示填充图。 + +首先,用户可以使用matplotlib制作的最基本的填充图: + +```python +import numpy as np +import matplotlib.pyplot as plt + +x = [0, 1, 2, 1] +y = [1, 2, 1, 0] + +fig, ax = plt.subplots() +ax.fill(x, y) +plt.show() +``` + +![绘制填充图的示例1](https://matplotlib.org/_images/sphx_glr_fill_001.png) + +接下来,还有一些可选功能: + +- 使用单个命令的多条曲线。 +- 设置填充颜色。 +- 设置不透明度(alpha值)。 + +```python +x = np.linspace(0, 1.5 * np.pi, 500) +y1 = np.sin(x) +y2 = np.sin(3 * x) + +fig, ax = plt.subplots() + +ax.fill(x, y1, 'b', x, y2, 'r', alpha=0.3) + +# Outline of the region we've filled in +ax.plot(x, y1, c='b', alpha=0.8) +ax.plot(x, y2, c='r', alpha=0.8) +ax.plot([x[0], x[-1]], [y1[0], y1[-1]], c='b', alpha=0.8) +ax.plot([x[0], x[-1]], [y2[0], y2[-1]], c='r', alpha=0.8) + +plt.show() +``` + +![绘制填充图的示例2](https://matplotlib.org/_images/sphx_glr_fill_002.png) + +## 下载这个示例 + +- [下载python源码: fill.py](https://matplotlib.org/_downloads/fill.py) +- [下载Jupyter notebook: fill.ipynb](https://matplotlib.org/_downloads/fill.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/fill_between_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/fill_between_demo.md new file mode 100644 index 00000000..c21036be --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/fill_between_demo.md @@ -0,0 +1,80 @@ +# 填充线条之间的区域 + +此示例显示如何使用fill_between方法基于用户定义的逻辑在行之间着色。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.arange(0.0, 2, 0.01) +y1 = np.sin(2 * np.pi * x) +y2 = 1.2 * np.sin(4 * np.pi * x) +``` + +```python +fig, (ax1, ax2, ax3) = plt.subplots(3, 1, sharex=True) + +ax1.fill_between(x, 0, y1) +ax1.set_ylabel('between y1 and 0') + +ax2.fill_between(x, y1, 1) +ax2.set_ylabel('between y1 and 1') + +ax3.fill_between(x, y1, y2) +ax3.set_ylabel('between y1 and y2') +ax3.set_xlabel('x') +``` + +![填充线条之间的区域1](https://matplotlib.org/_images/sphx_glr_fill_between_demo_001.png) + +现在在满足逻辑条件的y1和y2之间填充。 请注意,这与调用fill_between(x[where], y1[where], y2[where]...)不同,因为多个连续区域的边缘效应。 + +```python +fig, (ax, ax1) = plt.subplots(2, 1, sharex=True) +ax.plot(x, y1, x, y2, color='black') +ax.fill_between(x, y1, y2, where=y2 >= y1, facecolor='green', interpolate=True) +ax.fill_between(x, y1, y2, where=y2 <= y1, facecolor='red', interpolate=True) +ax.set_title('fill between where') + +# Test support for masked arrays. +y2 = np.ma.masked_greater(y2, 1.0) +ax1.plot(x, y1, x, y2, color='black') +ax1.fill_between(x, y1, y2, where=y2 >= y1, + facecolor='green', interpolate=True) +ax1.fill_between(x, y1, y2, where=y2 <= y1, + facecolor='red', interpolate=True) +ax1.set_title('Now regions with y2>1 are masked') +``` + +![填充线条之间的区域2](https://matplotlib.org/_images/sphx_glr_fill_between_demo_002.png) + +这个例子说明了一个问题; 由于数据网格化,在交叉点处存在不期望的未填充三角形。 蛮力解决方案是在绘图之前将所有阵列插值到非常精细的网格。 + +使用变换创建满足特定条件的轴跨度: + +```python +fig, ax = plt.subplots() +y = np.sin(4 * np.pi * x) +ax.plot(x, y, color='black') + +# use data coordinates for the x-axis and the axes coordinates for the y-axis +import matplotlib.transforms as mtransforms +trans = mtransforms.blended_transform_factory(ax.transData, ax.transAxes) +theta = 0.9 +ax.axhline(theta, color='green', lw=2, alpha=0.5) +ax.axhline(-theta, color='red', lw=2, alpha=0.5) +ax.fill_between(x, 0, 1, where=y > theta, + facecolor='green', alpha=0.5, transform=trans) +ax.fill_between(x, 0, 1, where=y < -theta, + facecolor='red', alpha=0.5, transform=trans) + + +plt.show() +``` + +![填充线条之间的区域3](https://matplotlib.org/_images/sphx_glr_fill_between_demo_003.png) + +## 下载这个示例 + +- [下载python源码: fill_between_demo.py](https://matplotlib.org/_downloads/fill_between_demo.py) +- [下载Jupyter notebook: fill_between_demo.ipynb](https://matplotlib.org/_downloads/fill_between_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/fill_betweenx_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/fill_betweenx_demo.md new file mode 100644 index 00000000..8a6b644d --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/fill_betweenx_demo.md @@ -0,0 +1,58 @@ +# betweenx填充示例 + +使用fill_betweenx在两条水平曲线之间着色。 + +![betweenx填充示例](https://matplotlib.org/_images/sphx_glr_fill_betweenx_demo_001.png) + +![betweenx填充示例2](https://matplotlib.org/_images/sphx_glr_fill_betweenx_demo_002.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +y = np.arange(0.0, 2, 0.01) +x1 = np.sin(2 * np.pi * y) +x2 = 1.2 * np.sin(4 * np.pi * y) + +fig, [ax1, ax2, ax3] = plt.subplots(3, 1, sharex=True) + +ax1.fill_betweenx(y, 0, x1) +ax1.set_ylabel('(x1, 0)') + +ax2.fill_betweenx(y, x1, 1) +ax2.set_ylabel('(x1, 1)') + +ax3.fill_betweenx(y, x1, x2) +ax3.set_ylabel('(x1, x2)') +ax3.set_xlabel('x') + +# now fill between x1 and x2 where a logical condition is met. Note +# this is different than calling +# fill_between(y[where], x1[where], x2[where]) +# because of edge effects over multiple contiguous regions. + +fig, [ax, ax1] = plt.subplots(2, 1, sharex=True) +ax.plot(x1, y, x2, y, color='black') +ax.fill_betweenx(y, x1, x2, where=x2 >= x1, facecolor='green') +ax.fill_betweenx(y, x1, x2, where=x2 <= x1, facecolor='red') +ax.set_title('fill between where') + +# Test support for masked arrays. +x2 = np.ma.masked_greater(x2, 1.0) +ax1.plot(x1, y, x2, y, color='black') +ax1.fill_betweenx(y, x1, x2, where=x2 >= x1, facecolor='green') +ax1.fill_betweenx(y, x1, x2, where=x2 <= x1, facecolor='red') +ax1.set_title('Now regions with x2 > 1 are masked') + +# This example illustrates a problem; because of the data +# gridding, there are undesired unfilled triangles at the crossover +# points. A brute-force solution would be to interpolate all +# arrays to a very fine grid before plotting. + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fill_betweenx_demo.py](https://matplotlib.org/_downloads/fill_betweenx_demo.py) +- [下载Jupyter notebook: fill_betweenx_demo.ipynb](https://matplotlib.org/_downloads/fill_betweenx_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/filled_step.md b/Python/matplotlab/gallery/lines_bars_and_markers/filled_step.md new file mode 100644 index 00000000..3f7fe1f5 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/filled_step.md @@ -0,0 +1,300 @@ +# 填充直方图 + +用于绘制直方图的剖面线功能。 + +```python +import itertools +from collections import OrderedDict +from functools import partial + +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.ticker as mticker +from cycler import cycler + + +def filled_hist(ax, edges, values, bottoms=None, orientation='v', + **kwargs): + """ + Draw a histogram as a stepped patch. + + Extra kwargs are passed through to `fill_between` + + Parameters + ---------- + ax : Axes + The axes to plot to + + edges : array + A length n+1 array giving the left edges of each bin and the + right edge of the last bin. + + values : array + A length n array of bin counts or values + + bottoms : scalar or array, optional + A length n array of the bottom of the bars. If None, zero is used. + + orientation : {'v', 'h'} + Orientation of the histogram. 'v' (default) has + the bars increasing in the positive y-direction. + + Returns + ------- + ret : PolyCollection + Artist added to the Axes + """ + print(orientation) + if orientation not in 'hv': + raise ValueError("orientation must be in {{'h', 'v'}} " + "not {o}".format(o=orientation)) + + kwargs.setdefault('step', 'post') + edges = np.asarray(edges) + values = np.asarray(values) + if len(edges) - 1 != len(values): + raise ValueError('Must provide one more bin edge than value not: ' + 'len(edges): {lb} len(values): {lv}'.format( + lb=len(edges), lv=len(values))) + + if bottoms is None: + bottoms = np.zeros_like(values) + if np.isscalar(bottoms): + bottoms = np.ones_like(values) * bottoms + + values = np.r_[values, values[-1]] + bottoms = np.r_[bottoms, bottoms[-1]] + if orientation == 'h': + return ax.fill_betweenx(edges, values, bottoms, + **kwargs) + elif orientation == 'v': + return ax.fill_between(edges, values, bottoms, + **kwargs) + else: + raise AssertionError("you should never be here") + + +def stack_hist(ax, stacked_data, sty_cycle, bottoms=None, + hist_func=None, labels=None, + plot_func=None, plot_kwargs=None): + """ + ax : axes.Axes + The axes to add artists too + + stacked_data : array or Mapping + A (N, M) shaped array. The first dimension will be iterated over to + compute histograms row-wise + + sty_cycle : Cycler or operable of dict + Style to apply to each set + + bottoms : array, optional + The initial positions of the bottoms, defaults to 0 + + hist_func : callable, optional + Must have signature `bin_vals, bin_edges = f(data)`. + `bin_edges` expected to be one longer than `bin_vals` + + labels : list of str, optional + The label for each set. + + If not given and stacked data is an array defaults to 'default set {n}' + + If stacked_data is a mapping, and labels is None, default to the keys + (which may come out in a random order). + + If stacked_data is a mapping and labels is given then only + the columns listed by be plotted. + + plot_func : callable, optional + Function to call to draw the histogram must have signature: + + ret = plot_func(ax, edges, top, bottoms=bottoms, + label=label, **kwargs) + + plot_kwargs : dict, optional + Any extra kwargs to pass through to the plotting function. This + will be the same for all calls to the plotting function and will + over-ride the values in cycle. + + Returns + ------- + arts : dict + Dictionary of artists keyed on their labels + """ + # deal with default binning function + if hist_func is None: + hist_func = np.histogram + + # deal with default plotting function + if plot_func is None: + plot_func = filled_hist + + # deal with default + if plot_kwargs is None: + plot_kwargs = {} + print(plot_kwargs) + try: + l_keys = stacked_data.keys() + label_data = True + if labels is None: + labels = l_keys + + except AttributeError: + label_data = False + if labels is None: + labels = itertools.repeat(None) + + if label_data: + loop_iter = enumerate((stacked_data[lab], lab, s) + for lab, s in zip(labels, sty_cycle)) + else: + loop_iter = enumerate(zip(stacked_data, labels, sty_cycle)) + + arts = {} + for j, (data, label, sty) in loop_iter: + if label is None: + label = 'dflt set {n}'.format(n=j) + label = sty.pop('label', label) + vals, edges = hist_func(data) + if bottoms is None: + bottoms = np.zeros_like(vals) + top = bottoms + vals + print(sty) + sty.update(plot_kwargs) + print(sty) + ret = plot_func(ax, edges, top, bottoms=bottoms, + label=label, **sty) + bottoms = top + arts[label] = ret + ax.legend(fontsize=10) + return arts + + +# set up histogram function to fixed bins +edges = np.linspace(-3, 3, 20, endpoint=True) +hist_func = partial(np.histogram, bins=edges) + +# set up style cycles +color_cycle = cycler(facecolor=plt.rcParams['axes.prop_cycle'][:4]) +label_cycle = cycler(label=['set {n}'.format(n=n) for n in range(4)]) +hatch_cycle = cycler(hatch=['/', '*', '+', '|']) + +# Fixing random state for reproducibility +np.random.seed(19680801) + +stack_data = np.random.randn(4, 12250) +dict_data = OrderedDict(zip((c['label'] for c in label_cycle), stack_data)) +``` + +使用普通数组 + +```python +fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 4.5), tight_layout=True) +arts = stack_hist(ax1, stack_data, color_cycle + label_cycle + hatch_cycle, + hist_func=hist_func) + +arts = stack_hist(ax2, stack_data, color_cycle, + hist_func=hist_func, + plot_kwargs=dict(edgecolor='w', orientation='h')) +ax1.set_ylabel('counts') +ax1.set_xlabel('x') +ax2.set_xlabel('counts') +ax2.set_ylabel('x') +``` + +![填充直方图](https://matplotlib.org/_images/sphx_glr_filled_step_001.png) + +输出: + +``` +{} +{'facecolor': '#1f77b4', 'hatch': '/'} +{'facecolor': '#1f77b4', 'hatch': '/'} +v +{'facecolor': '#ff7f0e', 'hatch': '*'} +{'facecolor': '#ff7f0e', 'hatch': '*'} +v +{'facecolor': '#2ca02c', 'hatch': '+'} +{'facecolor': '#2ca02c', 'hatch': '+'} +v +{'facecolor': '#d62728', 'hatch': '|'} +{'facecolor': '#d62728', 'hatch': '|'} +v +{'edgecolor': 'w', 'orientation': 'h'} +{'facecolor': '#1f77b4'} +{'facecolor': '#1f77b4', 'edgecolor': 'w', 'orientation': 'h'} +h +{'facecolor': '#ff7f0e'} +{'facecolor': '#ff7f0e', 'edgecolor': 'w', 'orientation': 'h'} +h +{'facecolor': '#2ca02c'} +{'facecolor': '#2ca02c', 'edgecolor': 'w', 'orientation': 'h'} +h +{'facecolor': '#d62728'} +{'facecolor': '#d62728', 'edgecolor': 'w', 'orientation': 'h'} +h + +``` + +使用标记数据 + +```python +fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 4.5), + tight_layout=True, sharey=True) + +arts = stack_hist(ax1, dict_data, color_cycle + hatch_cycle, + hist_func=hist_func) + +arts = stack_hist(ax2, dict_data, color_cycle + hatch_cycle, + hist_func=hist_func, labels=['set 0', 'set 3']) +ax1.xaxis.set_major_locator(mticker.MaxNLocator(5)) +ax1.set_xlabel('counts') +ax1.set_ylabel('x') +ax2.set_ylabel('x') + +plt.show() +``` + +![填充直方图2](https://matplotlib.org/_images/sphx_glr_filled_step_002.png) + +输出: + +``` +{} +{'facecolor': '#1f77b4', 'hatch': '/'} +{'facecolor': '#1f77b4', 'hatch': '/'} +v +{'facecolor': '#ff7f0e', 'hatch': '*'} +{'facecolor': '#ff7f0e', 'hatch': '*'} +v +{'facecolor': '#2ca02c', 'hatch': '+'} +{'facecolor': '#2ca02c', 'hatch': '+'} +v +{'facecolor': '#d62728', 'hatch': '|'} +{'facecolor': '#d62728', 'hatch': '|'} +v +{} +{'facecolor': '#1f77b4', 'hatch': '/'} +{'facecolor': '#1f77b4', 'hatch': '/'} +v +{'facecolor': '#ff7f0e', 'hatch': '*'} +{'facecolor': '#ff7f0e', 'hatch': '*'} +v +``` + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.fill_betweenx +matplotlib.axes.Axes.fill_between +matplotlib.axis.Axis.set_major_locator +``` + +## 下载这个示例 + +- [下载python源码: filled_step.py](https://matplotlib.org/_downloads/filled_step.py) +- [下载Jupyter notebook: filled_step.ipynb](https://matplotlib.org/_downloads/filled_step.ipynb) diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/gradient_bar.md b/Python/matplotlab/gallery/lines_bars_and_markers/gradient_bar.md new file mode 100644 index 00000000..5e5fab05 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/gradient_bar.md @@ -0,0 +1,40 @@ +# 渐变条形图 + +![渐变条形图](https://matplotlib.org/_images/sphx_glr_gradient_bar_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +np.random.seed(19680801) + +def gbar(ax, x, y, width=0.5, bottom=0): + X = [[.6, .6], [.7, .7]] + for left, top in zip(x, y): + right = left + width + ax.imshow(X, interpolation='bicubic', cmap=plt.cm.Blues, + extent=(left, right, bottom, top), alpha=1) + + +xmin, xmax = xlim = 0, 10 +ymin, ymax = ylim = 0, 1 + +fig, ax = plt.subplots() +ax.set(xlim=xlim, ylim=ylim, autoscale_on=False) + +X = [[.6, .6], [.7, .7]] +ax.imshow(X, interpolation='bicubic', cmap=plt.cm.copper, + extent=(xmin, xmax, ymin, ymax), alpha=1) + +N = 10 +x = np.arange(N) + 0.25 +y = np.random.rand(N) +gbar(ax, x, y, width=0.7) +ax.set_aspect('auto') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: gradient_bar.py](https://matplotlib.org/_downloads/gradient_bar.py) +- [下载Jupyter notebook: fill.ipynb](https://matplotlib.org/_downloads/fill.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/horizontal_bar_chart.md b/Python/matplotlab/gallery/lines_bars_and_markers/horizontal_bar_chart.md new file mode 100644 index 00000000..f9ff3681 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/horizontal_bar_chart.md @@ -0,0 +1,38 @@ +# 水平条形图 + +这个例子展示了一个简单的水平条形图。 + +![水平条形图示](https://matplotlib.org/_images/sphx_glr_barh_001.png); + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +plt.rcdefaults() +fig, ax = plt.subplots() + +# Example data +people = ('Tom', 'Dick', 'Harry', 'Slim', 'Jim') +y_pos = np.arange(len(people)) +performance = 3 + 10 * np.random.rand(len(people)) +error = np.random.rand(len(people)) + +ax.barh(y_pos, performance, xerr=error, align='center', + color='green', ecolor='black') +ax.set_yticks(y_pos) +ax.set_yticklabels(people) +ax.invert_yaxis() # labels read top-to-bottom +ax.set_xlabel('Performance') +ax.set_title('How fast do you want to go today?') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: barh.py](https://matplotlib.org/_downloads/barh.py) +- [下载Jupyter notebook: barh.ipynb](https://matplotlib.org/_downloads/barh.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/interp_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/interp_demo.md new file mode 100644 index 00000000..0d685944 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/interp_demo.md @@ -0,0 +1,23 @@ +# 插补示例 + +![插补示例图](https://matplotlib.org/_images/sphx_glr_interp_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(0, 2 * np.pi, 20) +y = np.sin(x) +yp = None +xi = np.linspace(x[0], x[-1], 100) +yi = np.interp(xi, x, y, yp) + +fig, ax = plt.subplots() +ax.plot(x, y, 'o', xi, yi, '.') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: interp_demo.py](https://matplotlib.org/_downloads/interp_demo.py) +- [下载Jupyter notebook: interp_demo.ipynb](https://matplotlib.org/_downloads/interp_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/joinstyle.md b/Python/matplotlab/gallery/lines_bars_and_markers/joinstyle.md new file mode 100644 index 00000000..3ca36453 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/joinstyle.md @@ -0,0 +1,62 @@ +# 连接图样式 + +举例说明三种不同的连接样式。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +def plot_angle(ax, x, y, angle, style): + phi = np.radians(angle) + xx = [x + .5, x, x + .5*np.cos(phi)] + yy = [y, y, y + .5*np.sin(phi)] + ax.plot(xx, yy, lw=8, color='blue', solid_joinstyle=style) + ax.plot(xx[1:], yy[1:], lw=1, color='black') + ax.plot(xx[1::-1], yy[1::-1], lw=1, color='black') + ax.plot(xx[1:2], yy[1:2], 'o', color='red', markersize=3) + ax.text(x, y + .2, '%.0f degrees' % angle) + +fig, ax = plt.subplots() +ax.set_title('Join style') + +for x, style in enumerate((('miter', 'round', 'bevel'))): + ax.text(x, 5, style) + for i in range(5): + plot_angle(ax, x, i, pow(2.0, 3 + i), style) + +ax.set_xlim(-.5, 2.75) +ax.set_ylim(-.5, 5.5) +plt.show() +``` + +![连接图样式](https://matplotlib.org/_images/sphx_glr_joinstyle_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(0, 2 * np.pi, 20) +y = np.sin(x) +yp = None +xi = np.linspace(x[0], x[-1], 100) +yi = np.interp(xi, x, y, yp) + +fig, ax = plt.subplots() +ax.plot(x, y, 'o', xi, yi, '.') +plt.show() +``` + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.plot +matplotlib.pyplot.plot +``` + +## 下载这个示例 + +- [下载python源码: joinstyle.py](https://matplotlib.org/_downloads/joinstyle.py) +- [下载Jupyter notebook: joinstyle.ipynb](https://matplotlib.org/_downloads/joinstyle.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/line_demo_dash_control.md b/Python/matplotlab/gallery/lines_bars_and_markers/line_demo_dash_control.md new file mode 100644 index 00000000..7b397a0d --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/line_demo_dash_control.md @@ -0,0 +1,36 @@ +# 自定义虚线样式 + +通过破折号序列控制线的划线。 它可以使用[Line2D.set_dashes](https://matplotlib.org/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D.set_dashes)进行修改。 + +破折号序列是一系列点的开/关长度,例如 [3,1]将是由1pt空间隔开的3pt长线。 + +像[Axes.plot](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot) 这样的函数支持将Line属性作为关键字参数传递。 在这种情况下,您可以在创建线时设置划线。 + +注意:也可以通过[property_cycle](https://matplotlib.org/tutorials/intermediate/color_cycle.html)配置破折号样式,方法是使用关键字破折号将破折号序列列表传递给循环器。这个例子中没有显示。 + +![自定义虚线样式](https://matplotlib.org/_images/sphx_glr_line_demo_dash_control_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +x = np.linspace(0, 10, 500) +y = np.sin(x) + +fig, ax = plt.subplots() + +# Using set_dashes() to modify dashing of an existing line +line1, = ax.plot(x, y, label='Using set_dashes()') +line1.set_dashes([2, 2, 10, 2]) # 2pt line, 2pt break, 10pt line, 2pt break + +# Using plot(..., dashes=...) to set the dashing when creating a line +line2, = ax.plot(x, y - 0.2, dashes=[6, 2], label='Using the dashes parameter') + +ax.legend() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: line_demo_dash_control.py](https://matplotlib.org/_downloads/line_demo_dash_control.py) +- [下载Jupyter notebook: line_demo_dash_control.ipynb](https://matplotlib.org/_downloads/line_demo_dash_control.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/line_styles_reference.md b/Python/matplotlab/gallery/lines_bars_and_markers/line_styles_reference.md new file mode 100644 index 00000000..767fc146 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/line_styles_reference.md @@ -0,0 +1,39 @@ +# 线型样式参考 + +Matplotlib附带的线型参考。 + +![线型样式参考图示](https://matplotlib.org/_images/sphx_glr_line_styles_reference_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +color = 'cornflowerblue' +points = np.ones(5) # Draw 5 points for each line +text_style = dict(horizontalalignment='right', verticalalignment='center', + fontsize=12, fontdict={'family': 'monospace'}) + + +def format_axes(ax): + ax.margins(0.2) + ax.set_axis_off() + + +# Plot all line styles. +fig, ax = plt.subplots() + +linestyles = ['-', '--', '-.', ':'] +for y, linestyle in enumerate(linestyles): + ax.text(-0.1, y, repr(linestyle), **text_style) + ax.plot(y * points, linestyle=linestyle, color=color, linewidth=3) + format_axes(ax) + ax.set_title('line styles') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: line_styles_reference.py](https://matplotlib.org/_downloads/line_styles_reference.py) +- [下载Jupyter notebook: line_styles_reference.ipynb](https://matplotlib.org/_downloads/line_styles_reference.ipynb) diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/linestyles.md b/Python/matplotlab/gallery/lines_bars_and_markers/linestyles.md new file mode 100644 index 00000000..8badb9b7 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/linestyles.md @@ -0,0 +1,58 @@ +# 线的样式 + +这个例子展示了复制Tikz / PGF的不同线条样式。 + +![线的样式图示](https://matplotlib.org/_images/sphx_glr_linestyles_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from collections import OrderedDict +from matplotlib.transforms import blended_transform_factory + +linestyles = OrderedDict( + [('solid', (0, ())), + ('loosely dotted', (0, (1, 10))), + ('dotted', (0, (1, 5))), + ('densely dotted', (0, (1, 1))), + + ('loosely dashed', (0, (5, 10))), + ('dashed', (0, (5, 5))), + ('densely dashed', (0, (5, 1))), + + ('loosely dashdotted', (0, (3, 10, 1, 10))), + ('dashdotted', (0, (3, 5, 1, 5))), + ('densely dashdotted', (0, (3, 1, 1, 1))), + + ('loosely dashdotdotted', (0, (3, 10, 1, 10, 1, 10))), + ('dashdotdotted', (0, (3, 5, 1, 5, 1, 5))), + ('densely dashdotdotted', (0, (3, 1, 1, 1, 1, 1)))]) + + +plt.figure(figsize=(10, 6)) +ax = plt.subplot(1, 1, 1) + +X, Y = np.linspace(0, 100, 10), np.zeros(10) +for i, (name, linestyle) in enumerate(linestyles.items()): + ax.plot(X, Y+i, linestyle=linestyle, linewidth=1.5, color='black') + +ax.set_ylim(-0.5, len(linestyles)-0.5) +plt.yticks(np.arange(len(linestyles)), linestyles.keys()) +plt.xticks([]) + +# For each line style, add a text annotation with a small offset from +# the reference point (0 in Axes coords, y tick value in Data coords). +reference_transform = blended_transform_factory(ax.transAxes, ax.transData) +for i, (name, linestyle) in enumerate(linestyles.items()): + ax.annotate(str(linestyle), xy=(0.0, i), xycoords=reference_transform, + xytext=(-6, -12), textcoords='offset points', color="blue", + fontsize=8, ha="right", family="monospace") + +plt.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: linestyles.py](https://matplotlib.org/_downloads/linestyles.py) +- [下载Jupyter notebook: linestyles.ipynb](https://matplotlib.org/_downloads/linestyles.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/marker_fillstyle_reference.md b/Python/matplotlab/gallery/lines_bars_and_markers/marker_fillstyle_reference.md new file mode 100644 index 00000000..46b989b7 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/marker_fillstyle_reference.md @@ -0,0 +1,42 @@ +# 标记填充样式 + +Matplotlib中包含的标记填充样式的参考。 + +另请参阅 标记填充样式 和[标记路径示例](https://matplotlib.org/gallery/shapes_and_collections/marker_path.html)。 + +![标记填充样式图示](https://matplotlib.org/_images/sphx_glr_marker_fillstyle_reference_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.lines import Line2D + + +points = np.ones(5) # Draw 3 points for each line +text_style = dict(horizontalalignment='right', verticalalignment='center', + fontsize=12, fontdict={'family': 'monospace'}) +marker_style = dict(color='cornflowerblue', linestyle=':', marker='o', + markersize=15, markerfacecoloralt='gray') + + +def format_axes(ax): + ax.margins(0.2) + ax.set_axis_off() + + +fig, ax = plt.subplots() + +# Plot all fill styles. +for y, fill_style in enumerate(Line2D.fillStyles): + ax.text(-0.5, y, repr(fill_style), **text_style) + ax.plot(y * points, fillstyle=fill_style, **marker_style) + format_axes(ax) + ax.set_title('fill style') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: marker_fillstyle_reference.py](https://matplotlib.org/_downloads/marker_fillstyle_reference.py) +- [下载Jupyter notebook: marker_fillstyle_reference.ipynb](https://matplotlib.org/_downloads/marker_fillstyle_reference.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/marker_reference.md b/Python/matplotlab/gallery/lines_bars_and_markers/marker_reference.md new file mode 100644 index 00000000..1a405477 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/marker_reference.md @@ -0,0 +1,105 @@ +# 标记参考 + +使用Matplotlib参考填充,未填充和自定义标记类型。 + +有关所有标记的列表,请参阅 [matplotlib.markers](https://matplotlib.org/api/markers_api.html#module-matplotlib.markers) 文档。 另请参阅 [标记填充样式](/gallery/lines_bars_and_markers/marker_fillstyle_reference.html) 和 [标记路径示例](https://matplotlib.org/gallery/shapes_and_collections/marker_path.html)。 + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.lines import Line2D + + +points = np.ones(3) # Draw 3 points for each line +text_style = dict(horizontalalignment='right', verticalalignment='center', + fontsize=12, fontdict={'family': 'monospace'}) +marker_style = dict(linestyle=':', color='0.8', markersize=10, + mfc="C0", mec="C0") + + +def format_axes(ax): + ax.margins(0.2) + ax.set_axis_off() + ax.invert_yaxis() + + +def nice_repr(text): + return repr(text).lstrip('u') + + +def math_repr(text): + tx = repr(text).lstrip('u').strip("'").strip("$") + return r"'\${}\$'".format(tx) + + +def split_list(a_list): + i_half = len(a_list) // 2 + return (a_list[:i_half], a_list[i_half:]) +``` + +## 填充和未填充标记类型 + +绘制所有未填充的标记 + +```python +fig, axes = plt.subplots(ncols=2) +fig.suptitle('un-filled markers', fontsize=14) + +# Filter out filled markers and marker settings that do nothing. +unfilled_markers = [m for m, func in Line2D.markers.items() + if func != 'nothing' and m not in Line2D.filled_markers] + +for ax, markers in zip(axes, split_list(unfilled_markers)): + for y, marker in enumerate(markers): + ax.text(-0.5, y, nice_repr(marker), **text_style) + ax.plot(y * points, marker=marker, **marker_style) + format_axes(ax) + +plt.show() +``` + +![未填充标记图示](https://matplotlib.org/_images/sphx_glr_marker_reference_001.png) + +绘制所有填满的标记。 + +```python +fig, axes = plt.subplots(ncols=2) +for ax, markers in zip(axes, split_list(Line2D.filled_markers)): + for y, marker in enumerate(markers): + ax.text(-0.5, y, nice_repr(marker), **text_style) + ax.plot(y * points, marker=marker, **marker_style) + format_axes(ax) +fig.suptitle('filled markers', fontsize=14) + +plt.show() +``` + +![已填充充标记图示](https://matplotlib.org/_images/sphx_glr_marker_reference_002.png) + +## 带有MathText的自定义标记 + +使用[MathText](https://matplotlib.org/tutorials/text/mathtext.html),使用自定义标记符号,例如“$\$ u266B”。有关STIX字体符号的概述,请参阅[STIX字体表](http://www.stixfonts.org/allGlyphs.html)。另请参阅[STIX字体演示](https://matplotlib.org/gallery/text_labels_and_annotations/stix_fonts_demo.html)。 + +```python +fig, ax = plt.subplots() +fig.subplots_adjust(left=0.4) + +marker_style.update(mec="None", markersize=15) +markers = ["$1$", r"$\frac{1}{2}$", "$f$", "$\u266B$", + r"$\mathcircled{m}$"] + + +for y, marker in enumerate(markers): + ax.text(-0.5, y, math_repr(marker), **text_style) + ax.plot(y * points, marker=marker, **marker_style) +format_axes(ax) + +plt.show() +``` + +![带有MathText的自定义标记图示](https://matplotlib.org/_images/sphx_glr_marker_reference_003.png) + +## 下载这个示例 + +- [下载python源码: marker_reference.py](https://matplotlib.org/_downloads/marker_reference.py) +- [下载Jupyter notebook: marker_reference.ipynb](https://matplotlib.org/_downloads/marker_reference.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/markevery_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/markevery_demo.md new file mode 100644 index 00000000..df79fa8a --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/markevery_demo.md @@ -0,0 +1,112 @@ +# Markevery示例 + +此示例演示了使用Line2D对象的``markevery``属性在数据点子集上显示标记的各种选项。 + +整数参数非常直观。例如 markevery = 5 将从第一个数据点开始绘制每个第5个标记。 + +浮点参数允许标记沿着线以大致相等的距离间隔开。沿着标记之间的线的理论距离通过将轴边界对角线的显示坐标距离乘以 markevery 值来确定。将显示最接近理论距离的数据点。 + +切片或列表/数组也可以与 markevery 一起使用以指定要显示的标记。 + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec + +# define a list of markevery cases to plot +cases = [None, + 8, + (30, 8), + [16, 24, 30], [0, -1], + slice(100, 200, 3), + 0.1, 0.3, 1.5, + (0.0, 0.1), (0.45, 0.1)] + +# define the figure size and grid layout properties +figsize = (10, 8) +cols = 3 +gs = gridspec.GridSpec(len(cases) // cols + 1, cols) +gs.update(hspace=0.4) +# define the data for cartesian plots +delta = 0.11 +x = np.linspace(0, 10 - 2 * delta, 200) + delta +y = np.sin(x) + 1.0 + delta +``` + +将每个Markevery案例绘制为线性x和y标度 + +```python +fig1 = plt.figure(num=1, figsize=figsize) +ax = [] +for i, case in enumerate(cases): + row = (i // cols) + col = i % cols + ax.append(fig1.add_subplot(gs[row, col])) + ax[-1].set_title('markevery=%s' % str(case)) + ax[-1].plot(x, y, 'o', ls='-', ms=4, markevery=case) +``` + +![Markevery示例图示](https://matplotlib.org/_images/sphx_glr_markevery_demo_001.png) + +将每个Markevery案例绘制为log x和y scale + +```python +fig2 = plt.figure(num=2, figsize=figsize) +axlog = [] +for i, case in enumerate(cases): + row = (i // cols) + col = i % cols + axlog.append(fig2.add_subplot(gs[row, col])) + axlog[-1].set_title('markevery=%s' % str(case)) + axlog[-1].set_xscale('log') + axlog[-1].set_yscale('log') + axlog[-1].plot(x, y, 'o', ls='-', ms=4, markevery=case) +fig2.tight_layout() +``` + +![Markevery示例图示2](https://matplotlib.org/_images/sphx_glr_markevery_demo_003.png) + +将每个Markevery案例绘制为线性x和y比例,但放大时请注意放大时的行为。当指定开始标记偏移时,它始终相对于可能与第一个可见数据点不同的第一个数据点进行解释。 + +```python +fig3 = plt.figure(num=3, figsize=figsize) +axzoom = [] +for i, case in enumerate(cases): + row = (i // cols) + col = i % cols + axzoom.append(fig3.add_subplot(gs[row, col])) + axzoom[-1].set_title('markevery=%s' % str(case)) + axzoom[-1].plot(x, y, 'o', ls='-', ms=4, markevery=case) + axzoom[-1].set_xlim((6, 6.7)) + axzoom[-1].set_ylim((1.1, 1.7)) +fig3.tight_layout() + +# define data for polar plots +r = np.linspace(0, 3.0, 200) +theta = 2 * np.pi * r +``` + +![Markevery示例图示3](https://matplotlib.org/_images/sphx_glr_markevery_demo_005.png) + +绘制每个Markevery案例的极坐标图。 + +```python +fig4 = plt.figure(num=4, figsize=figsize) +axpolar = [] +for i, case in enumerate(cases): + row = (i // cols) + col = i % cols + axpolar.append(fig4.add_subplot(gs[row, col], projection='polar')) + axpolar[-1].set_title('markevery=%s' % str(case)) + axpolar[-1].plot(theta, r, 'o', ls='-', ms=4, markevery=case) +fig4.tight_layout() + +plt.show() +``` + +![Markevery示例图示4](https://matplotlib.org/_images/sphx_glr_markevery_demo_007.png) + +## 下载这个示例 + +- [下载python源码: markevery_demo.py](https://matplotlib.org/_downloads/markevery_demo.py) +- [下载Jupyter notebook: markevery_demo.ipynb](https://matplotlib.org/_downloads/markevery_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/markevery_prop_cycle.md b/Python/matplotlab/gallery/lines_bars_and_markers/markevery_prop_cycle.md new file mode 100644 index 00000000..b87a59ba --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/markevery_prop_cycle.md @@ -0,0 +1,64 @@ +# 在rcParams中实现了对prop_cycle属性markevery的支持 + +此示例演示了一个发布 #8576 的工作解决方案,通过rcParams为axes.prop_cycle分配提供对markevery属性的完全支持。从markevery演示中使用相同的markevery案例列表。 + +使用每列的移位正弦曲线渲染绘图,每个正弦曲线都有一个唯一的市场价值。 + +![markevery支持1](https://matplotlib.org/_images/sphx_glr_markevery_prop_cycle_001.png) + +```python +from cycler import cycler +import numpy as np +import matplotlib as mpl +import matplotlib.pyplot as plt + +# Define a list of markevery cases and color cases to plot +cases = [None, + 8, + (30, 8), + [16, 24, 30], + [0, -1], + slice(100, 200, 3), + 0.1, + 0.3, + 1.5, + (0.0, 0.1), + (0.45, 0.1)] + +colors = ['#1f77b4', + '#ff7f0e', + '#2ca02c', + '#d62728', + '#9467bd', + '#8c564b', + '#e377c2', + '#7f7f7f', + '#bcbd22', + '#17becf', + '#1a55FF'] + +# Configure rcParams axes.prop_cycle to simultaneously cycle cases and colors. +mpl.rcParams['axes.prop_cycle'] = cycler(markevery=cases, color=colors) + +# Create data points and offsets +x = np.linspace(0, 2 * np.pi) +offsets = np.linspace(0, 2 * np.pi, 11, endpoint=False) +yy = np.transpose([np.sin(x + phi) for phi in offsets]) + +# Set the plot curve with markers and a title +fig = plt.figure() +ax = fig.add_axes([0.1, 0.1, 0.6, 0.75]) + +for i in range(len(cases)): + ax.plot(yy[:, i], marker='o', label=str(cases[i])) + ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.) + +plt.title('Support for axes.prop_cycle cycler with markevery') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: markevery_prop_cycle.py](https://matplotlib.org/_downloads/markevery_prop_cycle.py) +- [下载Jupyter notebook: markevery_prop_cycle.ipynb](https://matplotlib.org/_downloads/markevery_prop_cycle.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/masked_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/masked_demo.md new file mode 100644 index 00000000..a353c232 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/masked_demo.md @@ -0,0 +1,34 @@ +# 遮盖示例 + +绘制带有点的线条。 + +这通常与gappy数据一起使用,以打破数据空白处的界限。 + +![遮盖示例1](https://matplotlib.org/_images/sphx_glr_masked_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.arange(0, 2*np.pi, 0.02) +y = np.sin(x) +y1 = np.sin(2*x) +y2 = np.sin(3*x) +ym1 = np.ma.masked_where(y1 > 0.5, y1) +ym2 = np.ma.masked_where(y2 < -0.5, y2) + +lines = plt.plot(x, y, x, ym1, x, ym2, 'o') +plt.setp(lines[0], linewidth=4) +plt.setp(lines[1], linewidth=2) +plt.setp(lines[2], markersize=10) + +plt.legend(('No mask', 'Masked if > 0.5', 'Masked if < -0.5'), + loc='upper right') +plt.title('Masked line demo') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: masked_demo.py](https://matplotlib.org/_downloads/masked_demo.py) +- [下载Jupyter notebook: masked_demo.ipynb](https://matplotlib.org/_downloads/masked_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/multicolored_line.md b/Python/matplotlab/gallery/lines_bars_and_markers/multicolored_line.md new file mode 100644 index 00000000..c70db0bf --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/multicolored_line.md @@ -0,0 +1,52 @@ +# 五彩线条 + +此示例显示如何制作多色线。 在此示例中,线条基于其衍生物着色。 + +![五彩线条示例1](https://matplotlib.org/_images/sphx_glr_multicolored_line_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.collections import LineCollection +from matplotlib.colors import ListedColormap, BoundaryNorm + +x = np.linspace(0, 3 * np.pi, 500) +y = np.sin(x) +dydx = np.cos(0.5 * (x[:-1] + x[1:])) # first derivative + +# Create a set of line segments so that we can color them individually +# This creates the points as a N x 1 x 2 array so that we can stack points +# together easily to get the segments. The segments array for line collection +# needs to be (numlines) x (points per line) x 2 (for x and y) +points = np.array([x, y]).T.reshape(-1, 1, 2) +segments = np.concatenate([points[:-1], points[1:]], axis=1) + +fig, axs = plt.subplots(2, 1, sharex=True, sharey=True) + +# Create a continuous norm to map from data points to colors +norm = plt.Normalize(dydx.min(), dydx.max()) +lc = LineCollection(segments, cmap='viridis', norm=norm) +# Set the values used for colormapping +lc.set_array(dydx) +lc.set_linewidth(2) +line = axs[0].add_collection(lc) +fig.colorbar(line, ax=axs[0]) + +# Use a boundary norm instead +cmap = ListedColormap(['r', 'g', 'b']) +norm = BoundaryNorm([-1, -0.5, 0.5, 1], cmap.N) +lc = LineCollection(segments, cmap=cmap, norm=norm) +lc.set_array(dydx) +lc.set_linewidth(2) +line = axs[1].add_collection(lc) +fig.colorbar(line, ax=axs[1]) + +axs[0].set_xlim(x.min(), x.max()) +axs[0].set_ylim(-1.1, 1.1) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: multicolored_line.py](https://matplotlib.org/_downloads/multicolored_line.py) +- [下载Jupyter notebook: multicolored_line.ipynb](https://matplotlib.org/_downloads/multicolored_line.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/nan_test.md b/Python/matplotlab/gallery/lines_bars_and_markers/nan_test.md new file mode 100644 index 00000000..ec96abc0 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/nan_test.md @@ -0,0 +1,40 @@ +# Nan测试 + +示例:插入Nan的简单线条图。 + +![Nan的简单线条图](https://matplotlib.org/_images/sphx_glr_nan_test_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +t = np.arange(0.0, 1.0 + 0.01, 0.01) +s = np.cos(2 * 2*np.pi * t) +t[41:60] = np.nan + +plt.subplot(2, 1, 1) +plt.plot(t, s, '-', lw=2) + +plt.xlabel('time (s)') +plt.ylabel('voltage (mV)') +plt.title('A sine wave with a gap of NaNs between 0.4 and 0.6') +plt.grid(True) + +plt.subplot(2, 1, 2) +t[0] = np.nan +t[-1] = np.nan +plt.plot(t, s, '-', lw=2) +plt.title('Also with NaN in first and last point') + +plt.xlabel('time (s)') +plt.ylabel('more nans') +plt.grid(True) + +plt.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: nan_test.py](https://matplotlib.org/_downloads/nan_test.py) +- [下载Jupyter notebook: nan_test.ipynb](https://matplotlib.org/_downloads/nan_test.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/psd_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/psd_demo.md new file mode 100644 index 00000000..74db3458 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/psd_demo.md @@ -0,0 +1,183 @@ +# 功率谱密度图示例 + +在Matplotlib中绘制功率谱密度(PSD)。 + +PSD是信号处理领域中常见的图形。NumPy有许多用于计算PSD的有用库。下面,我们演示一些如何使用Matplotlib实现和可视化这一点的示例。 + +```python +import matplotlib.pyplot as plt +import numpy as np +import matplotlib.mlab as mlab +import matplotlib.gridspec as gridspec + +# Fixing random state for reproducibility +np.random.seed(19680801) + +dt = 0.01 +t = np.arange(0, 10, dt) +nse = np.random.randn(len(t)) +r = np.exp(-t / 0.05) + +cnse = np.convolve(nse, r) * dt +cnse = cnse[:len(t)] +s = 0.1 * np.sin(2 * np.pi * t) + cnse + +plt.subplot(211) +plt.plot(t, s) +plt.subplot(212) +plt.psd(s, 512, 1 / dt) + +plt.show() +``` + +![功率谱密度图示例](https://matplotlib.org/_images/sphx_glr_psd_demo_001.png) + +将其与等效的Matlab代码进行比较,以完成相同的任务: + +```python +dt = 0.01; +t = [0:dt:10]; +nse = randn(size(t)); +r = exp(-t/0.05); +cnse = conv(nse, r)*dt; +cnse = cnse(1:length(t)); +s = 0.1*sin(2*pi*t) + cnse; + +subplot(211) +plot(t,s) +subplot(212) +psd(s, 512, 1/dt) +``` + +下面,我们将展示一个稍微复杂一些的示例,演示填充如何影响产生的PSD。 + +```python +dt = np.pi / 100. +fs = 1. / dt +t = np.arange(0, 8, dt) +y = 10. * np.sin(2 * np.pi * 4 * t) + 5. * np.sin(2 * np.pi * 4.25 * t) +y = y + np.random.randn(*t.shape) + +# Plot the raw time series +fig = plt.figure(constrained_layout=True) +gs = gridspec.GridSpec(2, 3, figure=fig) +ax = fig.add_subplot(gs[0, :]) +ax.plot(t, y) +ax.set_xlabel('time [s]') +ax.set_ylabel('signal') + +# Plot the PSD with different amounts of zero padding. This uses the entire +# time series at once +ax2 = fig.add_subplot(gs[1, 0]) +ax2.psd(y, NFFT=len(t), pad_to=len(t), Fs=fs) +ax2.psd(y, NFFT=len(t), pad_to=len(t) * 2, Fs=fs) +ax2.psd(y, NFFT=len(t), pad_to=len(t) * 4, Fs=fs) +plt.title('zero padding') + +# Plot the PSD with different block sizes, Zero pad to the length of the +# original data sequence. +ax3 = fig.add_subplot(gs[1, 1], sharex=ax2, sharey=ax2) +ax3.psd(y, NFFT=len(t), pad_to=len(t), Fs=fs) +ax3.psd(y, NFFT=len(t) // 2, pad_to=len(t), Fs=fs) +ax3.psd(y, NFFT=len(t) // 4, pad_to=len(t), Fs=fs) +ax3.set_ylabel('') +plt.title('block size') + +# Plot the PSD with different amounts of overlap between blocks +ax4 = fig.add_subplot(gs[1, 2], sharex=ax2, sharey=ax2) +ax4.psd(y, NFFT=len(t) // 2, pad_to=len(t), noverlap=0, Fs=fs) +ax4.psd(y, NFFT=len(t) // 2, pad_to=len(t), + noverlap=int(0.05 * len(t) / 2.), Fs=fs) +ax4.psd(y, NFFT=len(t) // 2, pad_to=len(t), + noverlap=int(0.2 * len(t) / 2.), Fs=fs) +ax4.set_ylabel('') +plt.title('overlap') + +plt.show() +``` + +![功率谱密度图示例2](https://matplotlib.org/_images/sphx_glr_psd_demo_002.png) + +这是一个来自信号处理工具箱的MATLAB示例的移植版本,它显示了Matplotlib和MATLAB对PSD的缩放之间的一些差异。 + +```python +fs = 1000 +t = np.linspace(0, 0.3, 301) +A = np.array([2, 8]).reshape(-1, 1) +f = np.array([150, 140]).reshape(-1, 1) +xn = (A * np.sin(2 * np.pi * f * t)).sum(axis=0) +xn += 5 * np.random.randn(*t.shape) + +fig, (ax0, ax1) = plt.subplots(ncols=2, constrained_layout=True) + +yticks = np.arange(-50, 30, 10) +yrange = (yticks[0], yticks[-1]) +xticks = np.arange(0, 550, 100) + +ax0.psd(xn, NFFT=301, Fs=fs, window=mlab.window_none, pad_to=1024, + scale_by_freq=True) +ax0.set_title('Periodogram') +ax0.set_yticks(yticks) +ax0.set_xticks(xticks) +ax0.grid(True) +ax0.set_ylim(yrange) + +ax1.psd(xn, NFFT=150, Fs=fs, window=mlab.window_none, pad_to=512, noverlap=75, + scale_by_freq=True) +ax1.set_title('Welch') +ax1.set_xticks(xticks) +ax1.set_yticks(yticks) +ax1.set_ylabel('') # overwrite the y-label added by `psd` +ax1.grid(True) +ax1.set_ylim(yrange) + +plt.show() +``` + +![功率谱密度图示例3](https://matplotlib.org/_images/sphx_glr_psd_demo_003.png) + +这是一个来自信号处理工具箱的MATLAB示例的移植版本,它显示了Matplotlib和MATLAB对PSD的缩放之间的一些差异。 + +它使用了一个复杂的信号,所以我们可以看到,复杂的PSD的工作正常。 + +```python +prng = np.random.RandomState(19680801) # to ensure reproducibility + +fs = 1000 +t = np.linspace(0, 0.3, 301) +A = np.array([2, 8]).reshape(-1, 1) +f = np.array([150, 140]).reshape(-1, 1) +xn = (A * np.exp(2j * np.pi * f * t)).sum(axis=0) + 5 * prng.randn(*t.shape) + +fig, (ax0, ax1) = plt.subplots(ncols=2, constrained_layout=True) + +yticks = np.arange(-50, 30, 10) +yrange = (yticks[0], yticks[-1]) +xticks = np.arange(-500, 550, 200) + +ax0.psd(xn, NFFT=301, Fs=fs, window=mlab.window_none, pad_to=1024, + scale_by_freq=True) +ax0.set_title('Periodogram') +ax0.set_yticks(yticks) +ax0.set_xticks(xticks) +ax0.grid(True) +ax0.set_ylim(yrange) + +ax1.psd(xn, NFFT=150, Fs=fs, window=mlab.window_none, pad_to=512, noverlap=75, + scale_by_freq=True) +ax1.set_title('Welch') +ax1.set_xticks(xticks) +ax1.set_yticks(yticks) +ax1.set_ylabel('') # overwrite the y-label added by `psd` +ax1.grid(True) +ax1.set_ylim(yrange) + +plt.show() +``` + +![功率谱密度图示例4](https://matplotlib.org/_images/sphx_glr_psd_demo_004.png) + +## 下载这个示例 + +- [下载python源码: psd_demo.py](https://matplotlib.org/_downloads/psd_demo.py) +- [下载Jupyter notebook: psd_demo.ipynb](https://matplotlib.org/_downloads/psd_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_custom_symbol.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_custom_symbol.md new file mode 100644 index 00000000..91e5eb89 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_custom_symbol.md @@ -0,0 +1,29 @@ +# 散点图自定义符号 + +在散点图中创建自定义椭圆符号。 + +![散点图自定义符号示例](https://matplotlib.org/_images/sphx_glr_scatter_custom_symbol_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# unit area ellipse +rx, ry = 3., 1. +area = rx * ry * np.pi +theta = np.arange(0, 2 * np.pi + 0.01, 0.1) +verts = np.column_stack([rx / area * np.cos(theta), ry / area * np.sin(theta)]) + +x, y, s, c = np.random.rand(4, 30) +s *= 10**2. + +fig, ax = plt.subplots() +ax.scatter(x, y, s, c, marker=verts) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter_custom_symbol.py](https://matplotlib.org/_downloads/scatter_custom_symbol.py) +- [下载Jupyter notebook: scatter_custom_symbol.ipynb](https://matplotlib.org/_downloads/scatter_custom_symbol.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_demo2.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_demo2.md new file mode 100644 index 00000000..78a9e457 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_demo2.md @@ -0,0 +1,40 @@ +# 散点图自定义样式 + +演示散点图与不同的标记颜色和大小。 + +![散点图自定义样式图示](https://matplotlib.org/_images/sphx_glr_scatter_demo2_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook + +# Load a numpy record array from yahoo csv data with fields date, open, close, +# volume, adj_close from the mpl-data/example directory. The record array +# stores the date as an np.datetime64 with a day unit ('D') in the date column. +with cbook.get_sample_data('goog.npz') as datafile: + price_data = np.load(datafile)['price_data'].view(np.recarray) +price_data = price_data[-250:] # get the most recent 250 trading days + +delta1 = np.diff(price_data.adj_close) / price_data.adj_close[:-1] + +# Marker size in units of points^2 +volume = (15 * price_data.volume[:-2] / price_data.volume[0])**2 +close = 0.003 * price_data.close[:-2] / 0.003 * price_data.open[:-2] + +fig, ax = plt.subplots() +ax.scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.5) + +ax.set_xlabel(r'$\Delta_i$', fontsize=15) +ax.set_ylabel(r'$\Delta_{i+1}$', fontsize=15) +ax.set_title('Volume and percent change') + +ax.grid(True) +fig.tight_layout() + +plt.show() +``` +## 下载这个示例 + +- [下载python源码: scatter_demo2.py](https://matplotlib.org/_downloads/scatter_demo2.py) +- [下载Jupyter notebook: scatter_demo2.ipynb](https://matplotlib.org/_downloads/scatter_demo2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_hist.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_hist.md new file mode 100644 index 00000000..a20f09c5 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_hist.md @@ -0,0 +1,66 @@ +# 散点图视图拆解 + +从散点图创建直方图,并将其添加到散点图的两侧。 + +![散点图视图拆解示例](https://matplotlib.org/_images/sphx_glr_scatter_hist_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.ticker import NullFormatter + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# the random data +x = np.random.randn(1000) +y = np.random.randn(1000) + +nullfmt = NullFormatter() # no labels + +# definitions for the axes +left, width = 0.1, 0.65 +bottom, height = 0.1, 0.65 +bottom_h = left_h = left + width + 0.02 + +rect_scatter = [left, bottom, width, height] +rect_histx = [left, bottom_h, width, 0.2] +rect_histy = [left_h, bottom, 0.2, height] + +# start with a rectangular Figure +plt.figure(1, figsize=(8, 8)) + +axScatter = plt.axes(rect_scatter) +axHistx = plt.axes(rect_histx) +axHisty = plt.axes(rect_histy) + +# no labels +axHistx.xaxis.set_major_formatter(nullfmt) +axHisty.yaxis.set_major_formatter(nullfmt) + +# the scatter plot: +axScatter.scatter(x, y) + +# now determine nice limits by hand: +binwidth = 0.25 +xymax = max(np.max(np.abs(x)), np.max(np.abs(y))) +lim = (int(xymax/binwidth) + 1) * binwidth + +axScatter.set_xlim((-lim, lim)) +axScatter.set_ylim((-lim, lim)) + +bins = np.arange(-lim, lim + binwidth, binwidth) +axHistx.hist(x, bins=bins) +axHisty.hist(y, bins=bins, orientation='horizontal') + +axHistx.set_xlim(axScatter.get_xlim()) +axHisty.set_ylim(axScatter.get_ylim()) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter_hist.py](https://matplotlib.org/_downloads/scatter_hist.py) +- [下载Jupyter notebook: scatter_hist.ipynb](https://matplotlib.org/_downloads/scatter_hist.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_masked.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_masked.md new file mode 100644 index 00000000..c6deb197 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_masked.md @@ -0,0 +1,36 @@ +# 散点图遮盖 + +屏蔽一些数据点,并添加一条线去标记掩码区域。 + +![散点图遮盖示例](https://matplotlib.org/_images/sphx_glr_scatter_masked_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +N = 100 +r0 = 0.6 +x = 0.9 * np.random.rand(N) +y = 0.9 * np.random.rand(N) +area = (20 * np.random.rand(N))**2 # 0 to 10 point radii +c = np.sqrt(area) +r = np.sqrt(x * x + y * y) +area1 = np.ma.masked_where(r < r0, area) +area2 = np.ma.masked_where(r >= r0, area) +plt.scatter(x, y, s=area1, marker='^', c=c) +plt.scatter(x, y, s=area2, marker='o', c=c) +# Show the boundary between the regions: +theta = np.arange(0, np.pi / 2, 0.01) +plt.plot(r0 * np.cos(theta), r0 * np.sin(theta)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter_masked.py](https://matplotlib.org/_downloads/scatter_masked.py) +- [下载Jupyter notebook: scatter_masked.ipynb](https://matplotlib.org/_downloads/scatter_masked.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_piecharts.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_piecharts.md new file mode 100644 index 00000000..8fe1652a --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_piecharts.md @@ -0,0 +1,63 @@ +# 带有饼图标记的散点图 + +此示例将自定义 '饼图' 作为散点图的标记。 + +感谢 Manuel Metz 的例子 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# first define the ratios +r1 = 0.2 # 20% +r2 = r1 + 0.4 # 40% + +# define some sizes of the scatter marker +sizes = np.array([60, 80, 120]) + +# calculate the points of the first pie marker +# +# these are just the origin (0,0) + +# some points on a circle cos,sin +x = [0] + np.cos(np.linspace(0, 2 * np.pi * r1, 10)).tolist() +y = [0] + np.sin(np.linspace(0, 2 * np.pi * r1, 10)).tolist() +xy1 = np.column_stack([x, y]) +s1 = np.abs(xy1).max() + +x = [0] + np.cos(np.linspace(2 * np.pi * r1, 2 * np.pi * r2, 10)).tolist() +y = [0] + np.sin(np.linspace(2 * np.pi * r1, 2 * np.pi * r2, 10)).tolist() +xy2 = np.column_stack([x, y]) +s2 = np.abs(xy2).max() + +x = [0] + np.cos(np.linspace(2 * np.pi * r2, 2 * np.pi, 10)).tolist() +y = [0] + np.sin(np.linspace(2 * np.pi * r2, 2 * np.pi, 10)).tolist() +xy3 = np.column_stack([x, y]) +s3 = np.abs(xy3).max() + +fig, ax = plt.subplots() +ax.scatter(range(3), range(3), marker=xy1, + s=s1 ** 2 * sizes, facecolor='blue') +ax.scatter(range(3), range(3), marker=xy2, + s=s2 ** 2 * sizes, facecolor='green') +ax.scatter(range(3), range(3), marker=xy3, + s=s3 ** 2 * sizes, facecolor='red') + +plt.show() +``` + +![带有饼图标记的散点图示例](https://matplotlib.org/_images/sphx_glr_scatter_piecharts_001.png) + +## 参考 + +本例中显示了下列函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.scatter +matplotlib.pyplot.scatter +``` + +## 下载这个示例 + +- [下载python源码: scatter_piecharts.py](https://matplotlib.org/_downloads/scatter_piecharts.py) +- [下载Jupyter notebook: scatter_piecharts.ipynb](https://matplotlib.org/_downloads/scatter_piecharts.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_star_poly.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_star_poly.md new file mode 100644 index 00000000..c3059633 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_star_poly.md @@ -0,0 +1,44 @@ +# 星标记散点图 + +创建多个具有不同星号符号的散点图。 + +![星标记散点图示例](https://matplotlib.org/_images/sphx_glr_scatter_star_poly_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +x = np.random.rand(10) +y = np.random.rand(10) +z = np.sqrt(x**2 + y**2) + +plt.subplot(321) +plt.scatter(x, y, s=80, c=z, marker=">") + +plt.subplot(322) +plt.scatter(x, y, s=80, c=z, marker=(5, 0)) + +verts = np.array([[-1, -1], [1, -1], [1, 1], [-1, -1]]) +plt.subplot(323) +plt.scatter(x, y, s=80, c=z, marker=verts) + +plt.subplot(324) +plt.scatter(x, y, s=80, c=z, marker=(5, 1)) + +plt.subplot(325) +plt.scatter(x, y, s=80, c=z, marker='+') + +plt.subplot(326) +plt.scatter(x, y, s=80, c=z, marker=(5, 2)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter_star_poly.py](https://matplotlib.org/_downloads/scatter_star_poly.py) +- [下载Jupyter notebook: scatter_star_poly.ipynb](https://matplotlib.org/_downloads/scatter_star_poly.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_symbol.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_symbol.md new file mode 100644 index 00000000..6d6ec60a --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_symbol.md @@ -0,0 +1,30 @@ +# 三叶草样式的散点图 + +用三叶草符号表示散点图。 + +![三叶草样式的散点图示例](https://matplotlib.org/_images/sphx_glr_scatter_symbol_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +x = np.arange(0.0, 50.0, 2.0) +y = x ** 1.3 + np.random.rand(*x.shape) * 30.0 +s = np.random.rand(*x.shape) * 800 + 500 + +plt.scatter(x, y, s, c="g", alpha=0.5, marker=r'$\clubsuit$', + label="Luck") +plt.xlabel("Leprechauns") +plt.ylabel("Gold") +plt.legend(loc='upper left') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter_symbol.py](https://matplotlib.org/_downloads/scatter_symbol.py) +- [下载Jupyter notebook: scatter_symbol.ipynb](https://matplotlib.org/_downloads/scatter_symbol.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/scatter_with_legend.md b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_with_legend.md new file mode 100644 index 00000000..a645ab6c --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/scatter_with_legend.md @@ -0,0 +1,29 @@ +# 带有图例的散点图 + +还演示了如何通过给alpha值介于0和1之间来调整标记的透明度。 + +![带有图例的散点图示例](https://matplotlib.org/_images/sphx_glr_scatter_with_legend_001.png) + +```python +import matplotlib.pyplot as plt +from numpy.random import rand + + +fig, ax = plt.subplots() +for color in ['red', 'green', 'blue']: + n = 750 + x, y = rand(2, n) + scale = 200.0 * rand(n) + ax.scatter(x, y, c=color, s=scale, label=color, + alpha=0.3, edgecolors='none') + +ax.legend() +ax.grid(True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter_with_legend.py](https://matplotlib.org/_downloads/scatter_with_legend.py) +- [下载Jupyter notebook: scatter_with_legend.ipynb](https://matplotlib.org/_downloads/scatter_with_legend.ipynb) diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/simple_plot.md b/Python/matplotlab/gallery/lines_bars_and_markers/simple_plot.md new file mode 100644 index 00000000..3ec2b2a8 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/simple_plot.md @@ -0,0 +1,41 @@ +# 简单图例 + +创建一个简单的图。 + +```python +import matplotlib +import matplotlib.pyplot as plt +import numpy as np + +# Data for plotting +t = np.arange(0.0, 2.0, 0.01) +s = 1 + np.sin(2 * np.pi * t) + +fig, ax = plt.subplots() +ax.plot(t, s) + +ax.set(xlabel='time (s)', ylabel='voltage (mV)', + title='About as simple as it gets, folks') +ax.grid() + +fig.savefig("test.png") +plt.show() +``` + +![简单图例](https://matplotlib.org/_images/sphx_glr_simple_plot_001.png) + +## 参考 + +下面的示例演示了以下函数和方法的使用: + +```python +matplotlib.axes.Axes.plot +matplotlib.pyplot.plot +matplotlib.pyplot.subplots +matplotlib.figure.Figure.savefig +``` + +## 下载这个示例 + +- [下载python源码: simple_plot.py](https://matplotlib.org/_downloads/simple_plot.py) +- [下载Jupyter notebook: simple_plot.ipynb](https://matplotlib.org/_downloads/simple_plot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/span_regions.md b/Python/matplotlab/gallery/lines_bars_and_markers/span_regions.md new file mode 100644 index 00000000..c475d7d4 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/span_regions.md @@ -0,0 +1,53 @@ +# 使用span_where + +说明一些用于逻辑掩码为True的阴影区域的辅助函数。 + +请参考 [matplotlib.collections.BrokenBarHCollection.span_where()](https://matplotlib.org/api/collections_api.html#matplotlib.collections.BrokenBarHCollection.span_where) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.collections as collections + + +t = np.arange(0.0, 2, 0.01) +s1 = np.sin(2*np.pi*t) +s2 = 1.2*np.sin(4*np.pi*t) + + +fig, ax = plt.subplots() +ax.set_title('using span_where') +ax.plot(t, s1, color='black') +ax.axhline(0, color='black', lw=2) + +collection = collections.BrokenBarHCollection.span_where( + t, ymin=0, ymax=1, where=s1 > 0, facecolor='green', alpha=0.5) +ax.add_collection(collection) + +collection = collections.BrokenBarHCollection.span_where( + t, ymin=-1, ymax=0, where=s1 < 0, facecolor='red', alpha=0.5) +ax.add_collection(collection) + + +plt.show() +``` + +![使用span_where示例](https://matplotlib.org/_images/sphx_glr_span_regions_001.png) + + +## 参考 + +本例中显示了下列函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.collections.BrokenBarHCollection +matplotlib.collections.BrokenBarHCollection.span_where +matplotlib.axes.Axes.add_collection +matplotlib.axes.Axes.axhline +``` + +## 下载这个示例 + +- [下载python源码: span_regions.py](https://matplotlib.org/_downloads/span_regions.py) +- [下载Jupyter notebook: span_regions.ipynb](https://matplotlib.org/_downloads/span_regions.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/spectrum_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/spectrum_demo.md new file mode 100644 index 00000000..ef1005d9 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/spectrum_demo.md @@ -0,0 +1,56 @@ +# 频谱表示图 + +该图显示了具有加性噪声的正弦信号的不同频谱表示。 通过利用快速傅立叶变换(FFT)计算离散时间信号的(频率)频谱。 + +![频谱表示图例](https://matplotlib.org/_images/sphx_glr_spectrum_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +np.random.seed(0) + +dt = 0.01 # sampling interval +Fs = 1 / dt # sampling frequency +t = np.arange(0, 10, dt) + +# generate noise: +nse = np.random.randn(len(t)) +r = np.exp(-t / 0.05) +cnse = np.convolve(nse, r) * dt +cnse = cnse[:len(t)] + +s = 0.1 * np.sin(4 * np.pi * t) + cnse # the signal + +fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(7, 7)) + +# plot time signal: +axes[0, 0].set_title("Signal") +axes[0, 0].plot(t, s, color='C0') +axes[0, 0].set_xlabel("Time") +axes[0, 0].set_ylabel("Amplitude") + +# plot different spectrum types: +axes[1, 0].set_title("Magnitude Spectrum") +axes[1, 0].magnitude_spectrum(s, Fs=Fs, color='C1') + +axes[1, 1].set_title("Log. Magnitude Spectrum") +axes[1, 1].magnitude_spectrum(s, Fs=Fs, scale='dB', color='C1') + +axes[2, 0].set_title("Phase Spectrum ") +axes[2, 0].phase_spectrum(s, Fs=Fs, color='C2') + +axes[2, 1].set_title("Angle Spectrum") +axes[2, 1].angle_spectrum(s, Fs=Fs, color='C2') + +axes[0, 1].remove() # don't display empty ax + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: spectrum_demo.py](https://matplotlib.org/_downloads/spectrum_demo.py) +- [下载Jupyter notebook: spectrum_demo.ipynb](https://matplotlib.org/_downloads/spectrum_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/stackplot_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/stackplot_demo.md new file mode 100644 index 00000000..3df0f7ea --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/stackplot_demo.md @@ -0,0 +1,67 @@ +# 堆栈图示例 + +如何使用Matplotlib创建堆栈图。 + +通过将不同的数据集垂直地绘制在彼此之上而不是彼此重叠来生成堆积图。下面我们展示一些使用Matplotlib实现此目的的示例。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +x = [1, 2, 3, 4, 5] +y1 = [1, 1, 2, 3, 5] +y2 = [0, 4, 2, 6, 8] +y3 = [1, 3, 5, 7, 9] + +y = np.vstack([y1, y2, y3]) + +labels = ["Fibonacci ", "Evens", "Odds"] + +fig, ax = plt.subplots() +ax.stackplot(x, y1, y2, y3, labels=labels) +ax.legend(loc='upper left') +plt.show() + +fig, ax = plt.subplots() +ax.stackplot(x, y) +plt.show() +``` + +![堆栈图示例图例](https://matplotlib.org/_images/sphx_glr_stackplot_demo_001.png) + +![堆栈图示例图例2](https://matplotlib.org/_images/sphx_glr_stackplot_demo_002.png) + +这里我们展示了使用stackplot制作流图的示例。 + +```python +def layers(n, m): + """ + Return *n* random Gaussian mixtures, each of length *m*. + """ + def bump(a): + x = 1 / (.1 + np.random.random()) + y = 2 * np.random.random() - .5 + z = 10 / (.1 + np.random.random()) + for i in range(m): + w = (i / m - y) * z + a[i] += x * np.exp(-w * w) + a = np.zeros((m, n)) + for i in range(n): + for j in range(5): + bump(a[:, i]) + return a + + +d = layers(3, 100) + +fig, ax = plt.subplots() +ax.stackplot(range(100), d.T, baseline='wiggle') +plt.show() +``` + +![堆栈图示例图例3](https://matplotlib.org/_images/sphx_glr_stackplot_demo_003.png) + +## 下载这个示例 + +- [下载python源码: stackplot_demo.py](https://matplotlib.org/_downloads/stackplot_demo.py) +- [下载Jupyter notebook: stackplot_demo.ipynb](https://matplotlib.org/_downloads/stackplot_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/stem_plot.md b/Python/matplotlab/gallery/lines_bars_and_markers/stem_plot.md new file mode 100644 index 00000000..24e17910 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/stem_plot.md @@ -0,0 +1,27 @@ +# 茎状图示 + +茎图的绘制是从基线到y坐标的垂直线绘制cosine(x) w.r.t x,使用 '-.' 作为绘制垂直线的图案。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +# returns 10 evenly spaced samples from 0.1 to 2*PI +x = np.linspace(0.1, 2 * np.pi, 10) + +markerline, stemlines, baseline = plt.stem(x, np.cos(x), '-.') + +# setting property of baseline with color red and linewidth 2 +plt.setp(baseline, color='r', linewidth=2) + +plt.show() +``` + +![茎状图示图例3](https://matplotlib.org/_images/sphx_glr_stem_plot_001.png) + +此示例使用了: * [matplotlib.axes.Axes.stem()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.stem.html#matplotlib.axes.Axes.stem) + +## 下载这个示例 + +- [下载python源码: stem_plot.py](https://matplotlib.org/_downloads/stem_plot.py) +- [下载Jupyter notebook: stem_plot.ipynb](https://matplotlib.org/_downloads/stem_plot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/step_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/step_demo.md new file mode 100644 index 00000000..6716807e --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/step_demo.md @@ -0,0 +1,38 @@ +# 阶梯图示例 + +阶梯图例子: + +![阶梯图示例图例](https://matplotlib.org/_images/sphx_glr_step_demo_001.png) + +```python +import numpy as np +from numpy import ma +import matplotlib.pyplot as plt + +x = np.arange(1, 7, 0.4) +y0 = np.sin(x) +y = y0.copy() + 2.5 + +plt.step(x, y, label='pre (default)') + +y -= 0.5 +plt.step(x, y, where='mid', label='mid') + +y -= 0.5 +plt.step(x, y, where='post', label='post') + +y = ma.masked_where((y0 > -0.15) & (y0 < 0.15), y - 0.5) +plt.step(x, y, label='masked (pre)') + +plt.legend() + +plt.xlim(0, 7) +plt.ylim(-0.5, 4) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: step_demo.py](https://matplotlib.org/_downloads/step_demo.py) +- [下载Jupyter notebook: step_demo.ipynb](https://matplotlib.org/_downloads/step_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/timeline.md b/Python/matplotlab/gallery/lines_bars_and_markers/timeline.md new file mode 100644 index 00000000..424df27a --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/timeline.md @@ -0,0 +1,74 @@ +# 使用线、日期、文本创建时间轴图 + +如何使用Matplotlib发布日期创建简单的时间轴。 + +可以使用日期和文本的集合创建时间轴。在本例中,我们将展示如何使用Matplotlib最新版本的日期创建一个简单的时间轴。首先,我们将从GitHub中提取数据。 + +```python +import matplotlib.pyplot as plt +import numpy as np +import matplotlib.dates as mdates +from datetime import datetime + +# A list of Matplotlib releases and their dates +# Taken from https://api.github.com/repos/matplotlib/matplotlib/releases +names = ['v2.2.2', 'v2.2.1', 'v2.2.0', 'v2.1.2', 'v2.1.1', 'v2.1.0', 'v2.0.2', + 'v2.0.1', 'v2.0.0', 'v1.5.3', 'v1.5.2', 'v1.5.1', 'v1.5.0', 'v1.4.3', + 'v1.4.2', 'v1.4.1', 'v1.4.0'] + +dates = ['2018-03-17T03:00:07Z', '2018-03-16T22:06:39Z', + '2018-03-06T12:53:32Z', '2018-01-18T04:56:47Z', + '2017-12-10T04:47:38Z', '2017-10-07T22:35:12Z', + '2017-05-10T02:11:15Z', '2017-05-02T01:59:49Z', + '2017-01-17T02:59:36Z', '2016-09-09T03:00:52Z', + '2016-07-03T15:52:01Z', '2016-01-10T22:38:50Z', + '2015-10-29T21:40:23Z', '2015-02-16T04:22:54Z', + '2014-10-26T03:24:13Z', '2014-10-18T18:56:23Z', + '2014-08-26T21:06:04Z'] +dates = [datetime.strptime(ii, "%Y-%m-%dT%H:%M:%SZ") for ii in dates] +``` + +接下来,我们将遍历每个日期并将其绘制在水平线上。我们将为文本添加一些样式,以便重叠不那么严重。 + +请注意,Matplotlib将自动绘制日期时间输入。 + +```python +levels = np.array([-5, 5, -3, 3, -1, 1]) +fig, ax = plt.subplots(figsize=(8, 5)) + +# Create the base line +start = min(dates) +stop = max(dates) +ax.plot((start, stop), (0, 0), 'k', alpha=.5) + +# Iterate through releases annotating each one +for ii, (iname, idate) in enumerate(zip(names, dates)): + level = levels[ii % 6] + vert = 'top' if level < 0 else 'bottom' + + ax.scatter(idate, 0, s=100, facecolor='w', edgecolor='k', zorder=9999) + # Plot a line up to the text + ax.plot((idate, idate), (0, level), c='r', alpha=.7) + # Give the text a faint background and align it properly + ax.text(idate, level, iname, + horizontalalignment='right', verticalalignment=vert, fontsize=14, + backgroundcolor=(1., 1., 1., .3)) +ax.set(title="Matplotlib release dates") +# Set the xticks formatting +# format xaxis with 3 month intervals +ax.get_xaxis().set_major_locator(mdates.MonthLocator(interval=3)) +ax.get_xaxis().set_major_formatter(mdates.DateFormatter("%b %Y")) +fig.autofmt_xdate() + +# Remove components for a cleaner look +plt.setp((ax.get_yticklabels() + ax.get_yticklines() + + list(ax.spines.values())), visible=False) +plt.show() +``` + +![时间轴图图例](https://matplotlib.org/_images/sphx_glr_timeline_001.png) + +## 下载这个示例 + +- [下载python源码: timeline.py](https://matplotlib.org/_downloads/timeline.py) +- [下载Jupyter notebook: timeline.ipynb](https://matplotlib.org/_downloads/timeline.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/vline_hline_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/vline_hline_demo.md new file mode 100644 index 00000000..05550bfa --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/vline_hline_demo.md @@ -0,0 +1,37 @@ +# hlines和vlines + +此示例展示了hlines和vlines的功能。 + +![hlines和vlines图例](https://matplotlib.org/_images/sphx_glr_vline_hline_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +t = np.arange(0.0, 5.0, 0.1) +s = np.exp(-t) + np.sin(2 * np.pi * t) + 1 +nse = np.random.normal(0.0, 0.3, t.shape) * s + +fig, (vax, hax) = plt.subplots(1, 2, figsize=(12, 6)) + +vax.plot(t, s + nse, '^') +vax.vlines(t, [0], s) +# By using ``transform=vax.get_xaxis_transform()`` the y coordinates are scaled +# such that 0 maps to the bottom of the axes and 1 to the top. +vax.vlines([1, 2], 0, 1, transform=vax.get_xaxis_transform(), colors='r') +vax.set_xlabel('time (s)') +vax.set_title('Vertical lines demo') + +hax.plot(s + nse, t, '^') +hax.hlines(t, [0], s, lw=2) +hax.set_xlabel('time (s)') +hax.set_title('Horizontal lines demo') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: vline_hline_demo.py](https://matplotlib.org/_downloads/vline_hline_demo.py) +- [下载Jupyter notebook: vline_hline_demo.ipynb](https://matplotlib.org/_downloads/vline_hline_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/lines_bars_and_markers/xcorr_acorr_demo.md b/Python/matplotlab/gallery/lines_bars_and_markers/xcorr_acorr_demo.md new file mode 100644 index 00000000..c4f59a65 --- /dev/null +++ b/Python/matplotlab/gallery/lines_bars_and_markers/xcorr_acorr_demo.md @@ -0,0 +1,32 @@ +# 交叉和自动关联演示 + +使用互相关(xcorr)和自相关(acorr)图的示例。 + +![交叉和自动关联演示图例](https://matplotlib.org/_images/sphx_glr_xcorr_acorr_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +x, y = np.random.randn(2, 100) +fig, [ax1, ax2] = plt.subplots(2, 1, sharex=True) +ax1.xcorr(x, y, usevlines=True, maxlags=50, normed=True, lw=2) +ax1.grid(True) +ax1.axhline(0, color='black', lw=2) + +ax2.acorr(x, usevlines=True, normed=True, maxlags=50, lw=2) +ax2.grid(True) +ax2.axhline(0, color='black', lw=2) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: xcorr_acorr_demo.py](https://matplotlib.org/_downloads/xcorr_acorr_demo.py) +- [下载Jupyter notebook: xcorr_acorr_demo.ipynb](https://matplotlib.org/_downloads/xcorr_acorr_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/agg_buffer.md b/Python/matplotlab/gallery/misc/agg_buffer.md new file mode 100644 index 00000000..50cc5cad --- /dev/null +++ b/Python/matplotlab/gallery/misc/agg_buffer.md @@ -0,0 +1,35 @@ +# Agg缓冲区 + +使用后端AGG以RGB字符串的形式访问地物画布,然后将其转换为数组并将其传递给Pillow进行渲染。 + +![Agg缓冲区](https://matplotlib.org/_images/sphx_glr_agg_buffer_001.png) + +```python +import numpy as np + +from matplotlib.backends.backend_agg import FigureCanvasAgg +import matplotlib.pyplot as plt + +plt.plot([1, 2, 3]) + +canvas = plt.get_current_fig_manager().canvas + +agg = canvas.switch_backends(FigureCanvasAgg) +agg.draw() +s, (width, height) = agg.print_to_buffer() + +# Convert to a NumPy array. +X = np.fromstring(s, np.uint8).reshape((height, width, 4)) + +# Pass off to PIL. +from PIL import Image +im = Image.frombytes("RGBA", (width, height), s) + +# Uncomment this line to display the image using ImageMagick's `display` tool. +# im.show() +``` + +## 下载这个示例 + +- [下载python源码: agg_buffer.py](https://matplotlib.org/_downloads/agg_buffer.py) +- [下载Jupyter notebook: agg_buffer.ipynb](https://matplotlib.org/_downloads/agg_buffer.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/agg_buffer_to_array.md b/Python/matplotlab/gallery/misc/agg_buffer_to_array.md new file mode 100644 index 00000000..75a645bc --- /dev/null +++ b/Python/matplotlab/gallery/misc/agg_buffer_to_array.md @@ -0,0 +1,32 @@ +# Agg缓冲区转换数组 + +将渲染图形转换为其图像(NumPy数组)表示形式。 + +![Agg缓冲区转换数组示例](https://matplotlib.org/_images/sphx_glr_agg_buffer_to_array_001.png) + +![Agg缓冲区转换数组示例2](https://matplotlib.org/_images/sphx_glr_agg_buffer_to_array_002.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# make an agg figure +fig, ax = plt.subplots() +ax.plot([1, 2, 3]) +ax.set_title('a simple figure') +fig.canvas.draw() + +# grab the pixel buffer and dump it into a numpy array +X = np.array(fig.canvas.renderer._renderer) + +# now display the array X as an Axes in a new figure +fig2 = plt.figure() +ax2 = fig2.add_subplot(111, frameon=False) +ax2.imshow(X) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: agg_buffer_to_array.py](https://matplotlib.org/_downloads/agg_buffer_to_array.py) +- [下载Jupyter notebook: agg_buffer_to_array.ipynb](https://matplotlib.org/_downloads/agg_buffer_to_array.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/anchored_artists.md b/Python/matplotlab/gallery/misc/anchored_artists.md new file mode 100644 index 00000000..f64eecfd --- /dev/null +++ b/Python/matplotlab/gallery/misc/anchored_artists.md @@ -0,0 +1,128 @@ +# 锚定艺术家对象 + +这个使用没有辅助类的锚定对象的示例在[Matplotlib axes_grid1 Toolkit](https://matplotlib.org/api/toolkits/axes_grid1.html#toolkit-axesgrid1-index)中找到。此版本的图与Simple [Anchored Artists](https://matplotlib.org/gallery/axes_grid1/simple_anchored_artists.html)中的版本类似,但它仅使用matplotlib命名空间实现,没有其他工具包的帮助。 + +![锚定艺术家对象示例](https://matplotlib.org/_images/sphx_glr_anchored_artists_001.png) + +```python +from matplotlib import pyplot as plt +from matplotlib.patches import Rectangle, Ellipse +from matplotlib.offsetbox import ( + AnchoredOffsetbox, AuxTransformBox, DrawingArea, TextArea, VPacker) + + +class AnchoredText(AnchoredOffsetbox): + def __init__(self, s, loc, pad=0.4, borderpad=0.5, + prop=None, frameon=True): + self.txt = TextArea(s, minimumdescent=False) + super().__init__(loc, pad=pad, borderpad=borderpad, + child=self.txt, prop=prop, frameon=frameon) + + +def draw_text(ax): + """ + Draw a text-box anchored to the upper-left corner of the figure. + """ + at = AnchoredText("Figure 1a", loc='upper left', frameon=True) + at.patch.set_boxstyle("round,pad=0.,rounding_size=0.2") + ax.add_artist(at) + + +class AnchoredDrawingArea(AnchoredOffsetbox): + def __init__(self, width, height, xdescent, ydescent, + loc, pad=0.4, borderpad=0.5, prop=None, frameon=True): + self.da = DrawingArea(width, height, xdescent, ydescent) + super().__init__(loc, pad=pad, borderpad=borderpad, + child=self.da, prop=None, frameon=frameon) + + +def draw_circle(ax): + """ + Draw a circle in axis coordinates + """ + from matplotlib.patches import Circle + ada = AnchoredDrawingArea(20, 20, 0, 0, + loc='upper right', pad=0., frameon=False) + p = Circle((10, 10), 10) + ada.da.add_artist(p) + ax.add_artist(ada) + + +class AnchoredEllipse(AnchoredOffsetbox): + def __init__(self, transform, width, height, angle, loc, + pad=0.1, borderpad=0.1, prop=None, frameon=True): + """ + Draw an ellipse the size in data coordinate of the give axes. + + pad, borderpad in fraction of the legend font size (or prop) + """ + self._box = AuxTransformBox(transform) + self.ellipse = Ellipse((0, 0), width, height, angle) + self._box.add_artist(self.ellipse) + super().__init__(loc, pad=pad, borderpad=borderpad, + child=self._box, prop=prop, frameon=frameon) + + +def draw_ellipse(ax): + """ + Draw an ellipse of width=0.1, height=0.15 in data coordinates + """ + ae = AnchoredEllipse(ax.transData, width=0.1, height=0.15, angle=0., + loc='lower left', pad=0.5, borderpad=0.4, + frameon=True) + + ax.add_artist(ae) + + +class AnchoredSizeBar(AnchoredOffsetbox): + def __init__(self, transform, size, label, loc, + pad=0.1, borderpad=0.1, sep=2, prop=None, frameon=True): + """ + Draw a horizontal bar with the size in data coordinate of the given + axes. A label will be drawn underneath (center-aligned). + + pad, borderpad in fraction of the legend font size (or prop) + sep in points. + """ + self.size_bar = AuxTransformBox(transform) + self.size_bar.add_artist(Rectangle((0, 0), size, 0, ec="black", lw=1.0)) + + self.txt_label = TextArea(label, minimumdescent=False) + + self._box = VPacker(children=[self.size_bar, self.txt_label], + align="center", + pad=0, sep=sep) + + super().__init__(loc, pad=pad, borderpad=borderpad, + child=self._box, prop=prop, frameon=frameon) + + +def draw_sizebar(ax): + """ + Draw a horizontal bar with length of 0.1 in data coordinates, + with a fixed label underneath. + """ + asb = AnchoredSizeBar(ax.transData, + 0.1, + r"1$^{\prime}$", + loc='lower center', + pad=0.1, borderpad=0.5, sep=5, + frameon=False) + ax.add_artist(asb) + + +ax = plt.gca() +ax.set_aspect(1.) + +draw_text(ax) +draw_circle(ax) +draw_ellipse(ax) +draw_sizebar(ax) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: anchored_artists.py](https://matplotlib.org/_downloads/anchored_artists.py) +- [下载Jupyter notebook: anchored_artists.ipynb](https://matplotlib.org/_downloads/anchored_artists.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/bbox_intersect.md b/Python/matplotlab/gallery/misc/bbox_intersect.md new file mode 100644 index 00000000..2e416cfb --- /dev/null +++ b/Python/matplotlab/gallery/misc/bbox_intersect.md @@ -0,0 +1,40 @@ +# 改变与盒子相交的线条的颜色 + +与矩形相交的线条用红色着色,而其他线条用蓝色线条留下。此示例展示了intersect_bbox函数。 + +![改变与盒子相交的线条的颜色示例](https://matplotlib.org/_images/sphx_glr_bbox_intersect_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.transforms import Bbox +from matplotlib.path import Path + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +left, bottom, width, height = (-1, -1, 2, 2) +rect = plt.Rectangle((left, bottom), width, height, facecolor="#aaaaaa") + +fig, ax = plt.subplots() +ax.add_patch(rect) + +bbox = Bbox.from_bounds(left, bottom, width, height) + +for i in range(12): + vertices = (np.random.random((2, 2)) - 0.5) * 6.0 + path = Path(vertices) + if path.intersects_bbox(bbox): + color = 'r' + else: + color = 'b' + ax.plot(vertices[:, 0], vertices[:, 1], color=color) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bbox_intersect.py](https://matplotlib.org/_downloads/bbox_intersect.py) +- [下载Jupyter notebook: bbox_intersect.ipynb](https://matplotlib.org/_downloads/bbox_intersect.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/contour_manual.md b/Python/matplotlab/gallery/misc/contour_manual.md new file mode 100644 index 00000000..8ab8ab46 --- /dev/null +++ b/Python/matplotlab/gallery/misc/contour_manual.md @@ -0,0 +1,65 @@ +# Contour手册 + +使用ContourSet显示自己的轮廓线和多边形的示例。 + +```python +import matplotlib.pyplot as plt +from matplotlib.contour import ContourSet +import matplotlib.cm as cm +``` + +每个级别的轮廓线是多边形的列表/元组。 + +```python +lines0 = [[[0, 0], [0, 4]]] +lines1 = [[[2, 0], [1, 2], [1, 3]]] +lines2 = [[[3, 0], [3, 2]], [[3, 3], [3, 4]]] # Note two lines. +``` + +两个级别之间的填充等高线也是多边形的列表/元组。点可以顺时针或逆时针排列。 + +```python +filled01 = [[[0, 0], [0, 4], [1, 3], [1, 2], [2, 0]]] +filled12 = [[[2, 0], [3, 0], [3, 2], [1, 3], [1, 2]], # Note two polygons. + [[1, 4], [3, 4], [3, 3]]] +``` + +```python +plt.figure() + +# Filled contours using filled=True. +cs = ContourSet(plt.gca(), [0, 1, 2], [filled01, filled12], filled=True, cmap=cm.bone) +cbar = plt.colorbar(cs) + +# Contour lines (non-filled). +lines = ContourSet(plt.gca(), [0, 1, 2], [lines0, lines1, lines2], cmap=cm.cool, + linewidths=3) +cbar.add_lines(lines) + +plt.axis([-0.5, 3.5, -0.5, 4.5]) +plt.title('User-specified contours') +``` + +![Contour手册示例](https://matplotlib.org/_images/sphx_glr_contour_manual_001.png) + +可以在单个多边形顶点列表中指定多个填充轮廓线以及Path类中描述的顶点种类(代码类型)列表。 这对于带孔的多边形特别有用。 代码类型1是MOVETO,2是LINETO。 + +```python +plt.figure() +filled01 = [[[0, 0], [3, 0], [3, 3], [0, 3], [1, 1], [1, 2], [2, 2], [2, 1]]] +kinds01 = [[1, 2, 2, 2, 1, 2, 2, 2]] +cs = ContourSet(plt.gca(), [0, 1], [filled01], [kinds01], filled=True) +cbar = plt.colorbar(cs) + +plt.axis([-0.5, 3.5, -0.5, 3.5]) +plt.title('User specified filled contours with holes') + +plt.show() +``` + +![Contour手册示例2](https://matplotlib.org/_images/sphx_glr_contour_manual_002.png) + +## 下载这个示例 + +- [下载python源码: contour_manual.py](https://matplotlib.org/_downloads/contour_manual.py) +- [下载Jupyter notebook: contour_manual.ipynb](https://matplotlib.org/_downloads/contour_manual.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/coords_report.md b/Python/matplotlab/gallery/misc/coords_report.md new file mode 100644 index 00000000..c82e5513 --- /dev/null +++ b/Python/matplotlab/gallery/misc/coords_report.md @@ -0,0 +1,32 @@ +# 坐标报告 + +覆盖coords的默认报告。 + +![坐标报告示例](https://matplotlib.org/_images/sphx_glr_coords_report_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def millions(x): + return '$%1.1fM' % (x*1e-6) + + +# Fixing random state for reproducibility +np.random.seed(19680801) + +x = np.random.rand(20) +y = 1e7*np.random.rand(20) + +fig, ax = plt.subplots() +ax.fmt_ydata = millions +plt.plot(x, y, 'o') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: coords_report.py](https://matplotlib.org/_downloads/coords_report.py) +- [下载Jupyter notebook: coords_report.ipynb](https://matplotlib.org/_downloads/coords_report.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/cursor_demo_sgskip.md b/Python/matplotlab/gallery/misc/cursor_demo_sgskip.md new file mode 100644 index 00000000..b4c354b6 --- /dev/null +++ b/Python/matplotlab/gallery/misc/cursor_demo_sgskip.md @@ -0,0 +1,87 @@ +# 光标演示 + +此示例显示如何使用matplotlib提供数据游标。 它使用matplotlib来绘制光标并且可能很慢,因为这需要在每次鼠标移动时重新绘制图形。 + +使用本机GUI绘图可以更快地进行镜像,就像在wxcursor_demo.py中一样。 + +mpldatacursor和mplcursors第三方包可用于实现类似的效果。参看这个: + +https://github.com/joferkington/mpldatacursor https://github.com/anntzer/mplcursors + + +```python +import matplotlib.pyplot as plt +import numpy as np + + +class Cursor(object): + def __init__(self, ax): + self.ax = ax + self.lx = ax.axhline(color='k') # the horiz line + self.ly = ax.axvline(color='k') # the vert line + + # text location in axes coords + self.txt = ax.text(0.7, 0.9, '', transform=ax.transAxes) + + def mouse_move(self, event): + if not event.inaxes: + return + + x, y = event.xdata, event.ydata + # update the line positions + self.lx.set_ydata(y) + self.ly.set_xdata(x) + + self.txt.set_text('x=%1.2f, y=%1.2f' % (x, y)) + plt.draw() + + +class SnaptoCursor(object): + """ + Like Cursor but the crosshair snaps to the nearest x,y point + For simplicity, I'm assuming x is sorted + """ + + def __init__(self, ax, x, y): + self.ax = ax + self.lx = ax.axhline(color='k') # the horiz line + self.ly = ax.axvline(color='k') # the vert line + self.x = x + self.y = y + # text location in axes coords + self.txt = ax.text(0.7, 0.9, '', transform=ax.transAxes) + + def mouse_move(self, event): + + if not event.inaxes: + return + + x, y = event.xdata, event.ydata + + indx = min(np.searchsorted(self.x, [x])[0], len(self.x) - 1) + x = self.x[indx] + y = self.y[indx] + # update the line positions + self.lx.set_ydata(y) + self.ly.set_xdata(x) + + self.txt.set_text('x=%1.2f, y=%1.2f' % (x, y)) + print('x=%1.2f, y=%1.2f' % (x, y)) + plt.draw() + +t = np.arange(0.0, 1.0, 0.01) +s = np.sin(2 * 2 * np.pi * t) +fig, ax = plt.subplots() + +# cursor = Cursor(ax) +cursor = SnaptoCursor(ax, t, s) +plt.connect('motion_notify_event', cursor.mouse_move) + +ax.plot(t, s, 'o') +plt.axis([0, 1, -1, 1]) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: cursor_demo_sgskip.py](https://matplotlib.org/_downloads/cursor_demo_sgskip.py) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/custom_projection.md b/Python/matplotlab/gallery/misc/custom_projection.md new file mode 100644 index 00000000..5d8683c0 --- /dev/null +++ b/Python/matplotlab/gallery/misc/custom_projection.md @@ -0,0 +1,470 @@ +# 自定义投影 + +通过减轻Matplotlib的许多功能来展示Hammer投影。 + +![自定义投影示例](https://matplotlib.org/_images/sphx_glr_custom_projection_001.png) + +```python +import matplotlib +from matplotlib.axes import Axes +from matplotlib.patches import Circle +from matplotlib.path import Path +from matplotlib.ticker import NullLocator, Formatter, FixedLocator +from matplotlib.transforms import Affine2D, BboxTransformTo, Transform +from matplotlib.projections import register_projection +import matplotlib.spines as mspines +import matplotlib.axis as maxis +import numpy as np + +rcParams = matplotlib.rcParams + +# This example projection class is rather long, but it is designed to +# illustrate many features, not all of which will be used every time. +# It is also common to factor out a lot of these methods into common +# code used by a number of projections with similar characteristics +# (see geo.py). + + +class GeoAxes(Axes): + """ + An abstract base class for geographic projections + """ + class ThetaFormatter(Formatter): + """ + Used to format the theta tick labels. Converts the native + unit of radians into degrees and adds a degree symbol. + """ + def __init__(self, round_to=1.0): + self._round_to = round_to + + def __call__(self, x, pos=None): + degrees = np.round(np.rad2deg(x) / self._round_to) * self._round_to + if rcParams['text.usetex'] and not rcParams['text.latex.unicode']: + return r"$%0.0f^\circ$" % degrees + else: + return "%0.0f\N{DEGREE SIGN}" % degrees + + RESOLUTION = 75 + + def _init_axis(self): + self.xaxis = maxis.XAxis(self) + self.yaxis = maxis.YAxis(self) + # Do not register xaxis or yaxis with spines -- as done in + # Axes._init_axis() -- until GeoAxes.xaxis.cla() works. + # self.spines['geo'].register_axis(self.yaxis) + self._update_transScale() + + def cla(self): + Axes.cla(self) + + self.set_longitude_grid(30) + self.set_latitude_grid(15) + self.set_longitude_grid_ends(75) + self.xaxis.set_minor_locator(NullLocator()) + self.yaxis.set_minor_locator(NullLocator()) + self.xaxis.set_ticks_position('none') + self.yaxis.set_ticks_position('none') + self.yaxis.set_tick_params(label1On=True) + # Why do we need to turn on yaxis tick labels, but + # xaxis tick labels are already on? + + self.grid(rcParams['axes.grid']) + + Axes.set_xlim(self, -np.pi, np.pi) + Axes.set_ylim(self, -np.pi / 2.0, np.pi / 2.0) + + def _set_lim_and_transforms(self): + # A (possibly non-linear) projection on the (already scaled) data + + # There are three important coordinate spaces going on here: + # + # 1. Data space: The space of the data itself + # + # 2. Axes space: The unit rectangle (0, 0) to (1, 1) + # covering the entire plot area. + # + # 3. Display space: The coordinates of the resulting image, + # often in pixels or dpi/inch. + + # This function makes heavy use of the Transform classes in + # ``lib/matplotlib/transforms.py.`` For more information, see + # the inline documentation there. + + # The goal of the first two transformations is to get from the + # data space (in this case longitude and latitude) to axes + # space. It is separated into a non-affine and affine part so + # that the non-affine part does not have to be recomputed when + # a simple affine change to the figure has been made (such as + # resizing the window or changing the dpi). + + # 1) The core transformation from data space into + # rectilinear space defined in the HammerTransform class. + self.transProjection = self._get_core_transform(self.RESOLUTION) + + # 2) The above has an output range that is not in the unit + # rectangle, so scale and translate it so it fits correctly + # within the axes. The peculiar calculations of xscale and + # yscale are specific to a Aitoff-Hammer projection, so don't + # worry about them too much. + self.transAffine = self._get_affine_transform() + + # 3) This is the transformation from axes space to display + # space. + self.transAxes = BboxTransformTo(self.bbox) + + # Now put these 3 transforms together -- from data all the way + # to display coordinates. Using the '+' operator, these + # transforms will be applied "in order". The transforms are + # automatically simplified, if possible, by the underlying + # transformation framework. + self.transData = \ + self.transProjection + \ + self.transAffine + \ + self.transAxes + + # The main data transformation is set up. Now deal with + # gridlines and tick labels. + + # Longitude gridlines and ticklabels. The input to these + # transforms are in display space in x and axes space in y. + # Therefore, the input values will be in range (-xmin, 0), + # (xmax, 1). The goal of these transforms is to go from that + # space to display space. The tick labels will be offset 4 + # pixels from the equator. + self._xaxis_pretransform = \ + Affine2D() \ + .scale(1.0, self._longitude_cap * 2.0) \ + .translate(0.0, -self._longitude_cap) + self._xaxis_transform = \ + self._xaxis_pretransform + \ + self.transData + self._xaxis_text1_transform = \ + Affine2D().scale(1.0, 0.0) + \ + self.transData + \ + Affine2D().translate(0.0, 4.0) + self._xaxis_text2_transform = \ + Affine2D().scale(1.0, 0.0) + \ + self.transData + \ + Affine2D().translate(0.0, -4.0) + + # Now set up the transforms for the latitude ticks. The input to + # these transforms are in axes space in x and display space in + # y. Therefore, the input values will be in range (0, -ymin), + # (1, ymax). The goal of these transforms is to go from that + # space to display space. The tick labels will be offset 4 + # pixels from the edge of the axes ellipse. + yaxis_stretch = Affine2D().scale(np.pi*2, 1).translate(-np.pi, 0) + yaxis_space = Affine2D().scale(1.0, 1.1) + self._yaxis_transform = \ + yaxis_stretch + \ + self.transData + yaxis_text_base = \ + yaxis_stretch + \ + self.transProjection + \ + (yaxis_space + + self.transAffine + + self.transAxes) + self._yaxis_text1_transform = \ + yaxis_text_base + \ + Affine2D().translate(-8.0, 0.0) + self._yaxis_text2_transform = \ + yaxis_text_base + \ + Affine2D().translate(8.0, 0.0) + + def _get_affine_transform(self): + transform = self._get_core_transform(1) + xscale, _ = transform.transform_point((np.pi, 0)) + _, yscale = transform.transform_point((0, np.pi / 2.0)) + return Affine2D() \ + .scale(0.5 / xscale, 0.5 / yscale) \ + .translate(0.5, 0.5) + + def get_xaxis_transform(self, which='grid'): + """ + Override this method to provide a transformation for the + x-axis tick labels. + + Returns a tuple of the form (transform, valign, halign) + """ + if which not in ['tick1', 'tick2', 'grid']: + raise ValueError( + "'which' must be one of 'tick1', 'tick2', or 'grid'") + return self._xaxis_transform + + def get_xaxis_text1_transform(self, pad): + return self._xaxis_text1_transform, 'bottom', 'center' + + def get_xaxis_text2_transform(self, pad): + """ + Override this method to provide a transformation for the + secondary x-axis tick labels. + + Returns a tuple of the form (transform, valign, halign) + """ + return self._xaxis_text2_transform, 'top', 'center' + + def get_yaxis_transform(self, which='grid'): + """ + Override this method to provide a transformation for the + y-axis grid and ticks. + """ + if which not in ['tick1', 'tick2', 'grid']: + raise ValueError( + "'which' must be one of 'tick1', 'tick2', or 'grid'") + return self._yaxis_transform + + def get_yaxis_text1_transform(self, pad): + """ + Override this method to provide a transformation for the + y-axis tick labels. + + Returns a tuple of the form (transform, valign, halign) + """ + return self._yaxis_text1_transform, 'center', 'right' + + def get_yaxis_text2_transform(self, pad): + """ + Override this method to provide a transformation for the + secondary y-axis tick labels. + + Returns a tuple of the form (transform, valign, halign) + """ + return self._yaxis_text2_transform, 'center', 'left' + + def _gen_axes_patch(self): + """ + Override this method to define the shape that is used for the + background of the plot. It should be a subclass of Patch. + + In this case, it is a Circle (that may be warped by the axes + transform into an ellipse). Any data and gridlines will be + clipped to this shape. + """ + return Circle((0.5, 0.5), 0.5) + + def _gen_axes_spines(self): + return {'geo': mspines.Spine.circular_spine(self, (0.5, 0.5), 0.5)} + + def set_yscale(self, *args, **kwargs): + if args[0] != 'linear': + raise NotImplementedError + + # Prevent the user from applying scales to one or both of the + # axes. In this particular case, scaling the axes wouldn't make + # sense, so we don't allow it. + set_xscale = set_yscale + + # Prevent the user from changing the axes limits. In our case, we + # want to display the whole sphere all the time, so we override + # set_xlim and set_ylim to ignore any input. This also applies to + # interactive panning and zooming in the GUI interfaces. + def set_xlim(self, *args, **kwargs): + raise TypeError("It is not possible to change axes limits " + "for geographic projections. Please consider " + "using Basemap or Cartopy.") + + set_ylim = set_xlim + + def format_coord(self, lon, lat): + """ + Override this method to change how the values are displayed in + the status bar. + + In this case, we want them to be displayed in degrees N/S/E/W. + """ + lon, lat = np.rad2deg([lon, lat]) + if lat >= 0.0: + ns = 'N' + else: + ns = 'S' + if lon >= 0.0: + ew = 'E' + else: + ew = 'W' + return ('%f\N{DEGREE SIGN}%s, %f\N{DEGREE SIGN}%s' + % (abs(lat), ns, abs(lon), ew)) + + def set_longitude_grid(self, degrees): + """ + Set the number of degrees between each longitude grid. + + This is an example method that is specific to this projection + class -- it provides a more convenient interface to set the + ticking than set_xticks would. + """ + # Skip -180 and 180, which are the fixed limits. + grid = np.arange(-180 + degrees, 180, degrees) + self.xaxis.set_major_locator(FixedLocator(np.deg2rad(grid))) + self.xaxis.set_major_formatter(self.ThetaFormatter(degrees)) + + def set_latitude_grid(self, degrees): + """ + Set the number of degrees between each longitude grid. + + This is an example method that is specific to this projection + class -- it provides a more convenient interface than + set_yticks would. + """ + # Skip -90 and 90, which are the fixed limits. + grid = np.arange(-90 + degrees, 90, degrees) + self.yaxis.set_major_locator(FixedLocator(np.deg2rad(grid))) + self.yaxis.set_major_formatter(self.ThetaFormatter(degrees)) + + def set_longitude_grid_ends(self, degrees): + """ + Set the latitude(s) at which to stop drawing the longitude grids. + + Often, in geographic projections, you wouldn't want to draw + longitude gridlines near the poles. This allows the user to + specify the degree at which to stop drawing longitude grids. + + This is an example method that is specific to this projection + class -- it provides an interface to something that has no + analogy in the base Axes class. + """ + self._longitude_cap = np.deg2rad(degrees) + self._xaxis_pretransform \ + .clear() \ + .scale(1.0, self._longitude_cap * 2.0) \ + .translate(0.0, -self._longitude_cap) + + def get_data_ratio(self): + """ + Return the aspect ratio of the data itself. + + This method should be overridden by any Axes that have a + fixed data ratio. + """ + return 1.0 + + # Interactive panning and zooming is not supported with this projection, + # so we override all of the following methods to disable it. + def can_zoom(self): + """ + Return *True* if this axes supports the zoom box button functionality. + This axes object does not support interactive zoom box. + """ + return False + + def can_pan(self): + """ + Return *True* if this axes supports the pan/zoom button functionality. + This axes object does not support interactive pan/zoom. + """ + return False + + def start_pan(self, x, y, button): + pass + + def end_pan(self): + pass + + def drag_pan(self, button, key, x, y): + pass + + +class HammerAxes(GeoAxes): + """ + A custom class for the Aitoff-Hammer projection, an equal-area map + projection. + + https://en.wikipedia.org/wiki/Hammer_projection + """ + + # The projection must specify a name. This will be used by the + # user to select the projection, + # i.e. ``subplot(111, projection='custom_hammer')``. + name = 'custom_hammer' + + class HammerTransform(Transform): + """ + The base Hammer transform. + """ + input_dims = 2 + output_dims = 2 + is_separable = False + + def __init__(self, resolution): + """ + Create a new Hammer transform. Resolution is the number of steps + to interpolate between each input line segment to approximate its + path in curved Hammer space. + """ + Transform.__init__(self) + self._resolution = resolution + + def transform_non_affine(self, ll): + longitude, latitude = ll.T + + # Pre-compute some values + half_long = longitude / 2 + cos_latitude = np.cos(latitude) + sqrt2 = np.sqrt(2) + + alpha = np.sqrt(1 + cos_latitude * np.cos(half_long)) + x = (2 * sqrt2) * (cos_latitude * np.sin(half_long)) / alpha + y = (sqrt2 * np.sin(latitude)) / alpha + return np.column_stack([x, y]) + transform_non_affine.__doc__ = Transform.transform_non_affine.__doc__ + + def transform_path_non_affine(self, path): + # vertices = path.vertices + ipath = path.interpolated(self._resolution) + return Path(self.transform(ipath.vertices), ipath.codes) + transform_path_non_affine.__doc__ = \ + Transform.transform_path_non_affine.__doc__ + + def inverted(self): + return HammerAxes.InvertedHammerTransform(self._resolution) + inverted.__doc__ = Transform.inverted.__doc__ + + class InvertedHammerTransform(Transform): + input_dims = 2 + output_dims = 2 + is_separable = False + + def __init__(self, resolution): + Transform.__init__(self) + self._resolution = resolution + + def transform_non_affine(self, xy): + x, y = xy.T + z = np.sqrt(1 - (x / 4) ** 2 - (y / 2) ** 2) + longitude = 2 * np.arctan((z * x) / (2 * (2 * z ** 2 - 1))) + latitude = np.arcsin(y*z) + return np.column_stack([longitude, latitude]) + transform_non_affine.__doc__ = Transform.transform_non_affine.__doc__ + + def inverted(self): + return HammerAxes.HammerTransform(self._resolution) + inverted.__doc__ = Transform.inverted.__doc__ + + def __init__(self, *args, **kwargs): + self._longitude_cap = np.pi / 2.0 + GeoAxes.__init__(self, *args, **kwargs) + self.set_aspect(0.5, adjustable='box', anchor='C') + self.cla() + + def _get_core_transform(self, resolution): + return self.HammerTransform(resolution) + + +# Now register the projection with matplotlib so the user can select +# it. +register_projection(HammerAxes) + + +if __name__ == '__main__': + import matplotlib.pyplot as plt + # Now make a simple example using the custom projection. + plt.subplot(111, projection="custom_hammer") + p = plt.plot([-1, 1, 1], [-1, -1, 1], "o-") + plt.grid(True) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: custom_projection.py](https://matplotlib.org/_downloads/custom_projection.py) +- [下载Jupyter notebook: custom_projection.ipynb](https://matplotlib.org/_downloads/custom_projection.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/customize_rc.md b/Python/matplotlab/gallery/misc/customize_rc.md new file mode 100644 index 00000000..4aa2d9f0 --- /dev/null +++ b/Python/matplotlab/gallery/misc/customize_rc.md @@ -0,0 +1,57 @@ +# 自定义Rc + +我不是想在这里做一个好看的人物,而只是为了展示一些动态定制rc params的例子 + +如果您希望以交互方式工作,并且需要为图形创建不同的默认设置(例如,一组用于发布的默认设置,一组用于交互式探索),您可能希望在自定义模块中定义一些设置默认值的函数, 例如,: + +```python +def set_pub(): + rc('font', weight='bold') # bold fonts are easier to see + rc('tick', labelsize=15) # tick labels bigger + rc('lines', lw=1, color='k') # thicker black lines + rc('grid', c='0.5', ls='-', lw=0.5) # solid gray grid lines + rc('savefig', dpi=300) # higher res outputs +``` + +然后,当您以交互方式工作时,您只需要: + +```python +>>> set_pub() +>>> subplot(111) +>>> plot([1,2,3]) +>>> savefig('myfig') +>>> rcdefaults() # restore the defaults +``` + +![自定义Rc示例](https://matplotlib.org/_images/sphx_glr_customize_rc_001.png) + +```python +import matplotlib.pyplot as plt + +plt.subplot(311) +plt.plot([1, 2, 3]) + +# the axes attributes need to be set before the call to subplot +plt.rc('font', weight='bold') +plt.rc('xtick.major', size=5, pad=7) +plt.rc('xtick', labelsize=15) + +# using aliases for color, linestyle and linewidth; gray, solid, thick +plt.rc('grid', c='0.5', ls='-', lw=5) +plt.rc('lines', lw=2, color='g') +plt.subplot(312) + +plt.plot([1, 2, 3]) +plt.grid(True) + +plt.rcdefaults() +plt.subplot(313) +plt.plot([1, 2, 3]) +plt.grid(True) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: customize_rc.py](https://matplotlib.org/_downloads/customize_rc.py) +- [下载Jupyter notebook: customize_rc.ipynb](https://matplotlib.org/_downloads/customize_rc.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/demo_agg_filter.md b/Python/matplotlab/gallery/misc/demo_agg_filter.md new file mode 100644 index 00000000..36bb2857 --- /dev/null +++ b/Python/matplotlab/gallery/misc/demo_agg_filter.md @@ -0,0 +1,334 @@ +# 演示Agg过滤器 + +![演示Agg过滤器示例](https://matplotlib.org/_images/sphx_glr_demo_agg_filter_001.png) + +```python +import matplotlib.pyplot as plt + +import numpy as np +import matplotlib.cm as cm +import matplotlib.transforms as mtransforms +from matplotlib.colors import LightSource +from matplotlib.artist import Artist + + +def smooth1d(x, window_len): + # copied from http://www.scipy.org/Cookbook/SignalSmooth + + s = np.r_[2*x[0] - x[window_len:1:-1], x, 2*x[-1] - x[-1:-window_len:-1]] + w = np.hanning(window_len) + y = np.convolve(w/w.sum(), s, mode='same') + return y[window_len-1:-window_len+1] + + +def smooth2d(A, sigma=3): + + window_len = max(int(sigma), 3)*2 + 1 + A1 = np.array([smooth1d(x, window_len) for x in np.asarray(A)]) + A2 = np.transpose(A1) + A3 = np.array([smooth1d(x, window_len) for x in A2]) + A4 = np.transpose(A3) + + return A4 + + +class BaseFilter(object): + def prepare_image(self, src_image, dpi, pad): + ny, nx, depth = src_image.shape + # tgt_image = np.zeros([pad*2+ny, pad*2+nx, depth], dtype="d") + padded_src = np.zeros([pad*2 + ny, pad*2 + nx, depth], dtype="d") + padded_src[pad:-pad, pad:-pad, :] = src_image[:, :, :] + + return padded_src # , tgt_image + + def get_pad(self, dpi): + return 0 + + def __call__(self, im, dpi): + pad = self.get_pad(dpi) + padded_src = self.prepare_image(im, dpi, pad) + tgt_image = self.process_image(padded_src, dpi) + return tgt_image, -pad, -pad + + +class OffsetFilter(BaseFilter): + def __init__(self, offsets=None): + if offsets is None: + self.offsets = (0, 0) + else: + self.offsets = offsets + + def get_pad(self, dpi): + return int(max(*self.offsets)/72.*dpi) + + def process_image(self, padded_src, dpi): + ox, oy = self.offsets + a1 = np.roll(padded_src, int(ox/72.*dpi), axis=1) + a2 = np.roll(a1, -int(oy/72.*dpi), axis=0) + return a2 + + +class GaussianFilter(BaseFilter): + "simple gauss filter" + + def __init__(self, sigma, alpha=0.5, color=None): + self.sigma = sigma + self.alpha = alpha + if color is None: + self.color = (0, 0, 0) + else: + self.color = color + + def get_pad(self, dpi): + return int(self.sigma*3/72.*dpi) + + def process_image(self, padded_src, dpi): + # offsetx, offsety = int(self.offsets[0]), int(self.offsets[1]) + tgt_image = np.zeros_like(padded_src) + aa = smooth2d(padded_src[:, :, -1]*self.alpha, + self.sigma/72.*dpi) + tgt_image[:, :, -1] = aa + tgt_image[:, :, :-1] = self.color + return tgt_image + + +class DropShadowFilter(BaseFilter): + def __init__(self, sigma, alpha=0.3, color=None, offsets=None): + self.gauss_filter = GaussianFilter(sigma, alpha, color) + self.offset_filter = OffsetFilter(offsets) + + def get_pad(self, dpi): + return max(self.gauss_filter.get_pad(dpi), + self.offset_filter.get_pad(dpi)) + + def process_image(self, padded_src, dpi): + t1 = self.gauss_filter.process_image(padded_src, dpi) + t2 = self.offset_filter.process_image(t1, dpi) + return t2 + + +class LightFilter(BaseFilter): + "simple gauss filter" + + def __init__(self, sigma, fraction=0.5): + self.gauss_filter = GaussianFilter(sigma, alpha=1) + self.light_source = LightSource() + self.fraction = fraction + + def get_pad(self, dpi): + return self.gauss_filter.get_pad(dpi) + + def process_image(self, padded_src, dpi): + t1 = self.gauss_filter.process_image(padded_src, dpi) + elevation = t1[:, :, 3] + rgb = padded_src[:, :, :3] + + rgb2 = self.light_source.shade_rgb(rgb, elevation, + fraction=self.fraction) + + tgt = np.empty_like(padded_src) + tgt[:, :, :3] = rgb2 + tgt[:, :, 3] = padded_src[:, :, 3] + + return tgt + + +class GrowFilter(BaseFilter): + "enlarge the area" + + def __init__(self, pixels, color=None): + self.pixels = pixels + if color is None: + self.color = (1, 1, 1) + else: + self.color = color + + def __call__(self, im, dpi): + pad = self.pixels + ny, nx, depth = im.shape + new_im = np.empty([pad*2 + ny, pad*2 + nx, depth], dtype="d") + alpha = new_im[:, :, 3] + alpha.fill(0) + alpha[pad:-pad, pad:-pad] = im[:, :, -1] + alpha2 = np.clip(smooth2d(alpha, self.pixels/72.*dpi) * 5, 0, 1) + new_im[:, :, -1] = alpha2 + new_im[:, :, :-1] = self.color + offsetx, offsety = -pad, -pad + + return new_im, offsetx, offsety + + +class FilteredArtistList(Artist): + """ + A simple container to draw filtered artist. + """ + + def __init__(self, artist_list, filter): + self._artist_list = artist_list + self._filter = filter + Artist.__init__(self) + + def draw(self, renderer): + renderer.start_rasterizing() + renderer.start_filter() + for a in self._artist_list: + a.draw(renderer) + renderer.stop_filter(self._filter) + renderer.stop_rasterizing() + + +def filtered_text(ax): + # mostly copied from contour_demo.py + + # prepare image + delta = 0.025 + x = np.arange(-3.0, 3.0, delta) + y = np.arange(-2.0, 2.0, delta) + X, Y = np.meshgrid(x, y) + Z1 = np.exp(-X**2 - Y**2) + Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) + Z = (Z1 - Z2) * 2 + + # draw + im = ax.imshow(Z, interpolation='bilinear', origin='lower', + cmap=cm.gray, extent=(-3, 3, -2, 2)) + levels = np.arange(-1.2, 1.6, 0.2) + CS = ax.contour(Z, levels, + origin='lower', + linewidths=2, + extent=(-3, 3, -2, 2)) + + ax.set_aspect("auto") + + # contour label + cl = ax.clabel(CS, levels[1::2], # label every second level + inline=1, + fmt='%1.1f', + fontsize=11) + + # change clable color to black + from matplotlib.patheffects import Normal + for t in cl: + t.set_color("k") + # to force TextPath (i.e., same font in all backends) + t.set_path_effects([Normal()]) + + # Add white glows to improve visibility of labels. + white_glows = FilteredArtistList(cl, GrowFilter(3)) + ax.add_artist(white_glows) + white_glows.set_zorder(cl[0].get_zorder() - 0.1) + + ax.xaxis.set_visible(False) + ax.yaxis.set_visible(False) + + +def drop_shadow_line(ax): + # copied from examples/misc/svg_filter_line.py + + # draw lines + l1, = ax.plot([0.1, 0.5, 0.9], [0.1, 0.9, 0.5], "bo-", + mec="b", mfc="w", lw=5, mew=3, ms=10, label="Line 1") + l2, = ax.plot([0.1, 0.5, 0.9], [0.5, 0.2, 0.7], "ro-", + mec="r", mfc="w", lw=5, mew=3, ms=10, label="Line 1") + + gauss = DropShadowFilter(4) + + for l in [l1, l2]: + + # draw shadows with same lines with slight offset. + + xx = l.get_xdata() + yy = l.get_ydata() + shadow, = ax.plot(xx, yy) + shadow.update_from(l) + + # offset transform + ot = mtransforms.offset_copy(l.get_transform(), ax.figure, + x=4.0, y=-6.0, units='points') + + shadow.set_transform(ot) + + # adjust zorder of the shadow lines so that it is drawn below the + # original lines + shadow.set_zorder(l.get_zorder() - 0.5) + shadow.set_agg_filter(gauss) + shadow.set_rasterized(True) # to support mixed-mode renderers + + ax.set_xlim(0., 1.) + ax.set_ylim(0., 1.) + + ax.xaxis.set_visible(False) + ax.yaxis.set_visible(False) + + +def drop_shadow_patches(ax): + # Copied from barchart_demo.py + N = 5 + menMeans = (20, 35, 30, 35, 27) + + ind = np.arange(N) # the x locations for the groups + width = 0.35 # the width of the bars + + rects1 = ax.bar(ind, menMeans, width, color='r', ec="w", lw=2) + + womenMeans = (25, 32, 34, 20, 25) + rects2 = ax.bar(ind + width + 0.1, womenMeans, width, + color='y', ec="w", lw=2) + + # gauss = GaussianFilter(1.5, offsets=(1,1), ) + gauss = DropShadowFilter(5, offsets=(1, 1), ) + shadow = FilteredArtistList(rects1 + rects2, gauss) + ax.add_artist(shadow) + shadow.set_zorder(rects1[0].get_zorder() - 0.1) + + ax.set_ylim(0, 40) + + ax.xaxis.set_visible(False) + ax.yaxis.set_visible(False) + + +def light_filter_pie(ax): + fracs = [15, 30, 45, 10] + explode = (0, 0.05, 0, 0) + pies = ax.pie(fracs, explode=explode) + ax.patch.set_visible(True) + + light_filter = LightFilter(9) + for p in pies[0]: + p.set_agg_filter(light_filter) + p.set_rasterized(True) # to support mixed-mode renderers + p.set(ec="none", + lw=2) + + gauss = DropShadowFilter(9, offsets=(3, 4), alpha=0.7) + shadow = FilteredArtistList(pies[0], gauss) + ax.add_artist(shadow) + shadow.set_zorder(pies[0][0].get_zorder() - 0.1) + + +if 1: + + plt.figure(1, figsize=(6, 6)) + plt.subplots_adjust(left=0.05, right=0.95) + + ax = plt.subplot(221) + filtered_text(ax) + + ax = plt.subplot(222) + drop_shadow_line(ax) + + ax = plt.subplot(223) + drop_shadow_patches(ax) + + ax = plt.subplot(224) + ax.set_aspect(1) + light_filter_pie(ax) + ax.set_frame_on(True) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_agg_filter.py](https://matplotlib.org/_downloads/demo_agg_filter.py) +- [下载Jupyter notebook: demo_agg_filter.ipynb](https://matplotlib.org/_downloads/demo_agg_filter.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/demo_ribbon_box.md b/Python/matplotlab/gallery/misc/demo_ribbon_box.md new file mode 100644 index 00000000..5ba038a6 --- /dev/null +++ b/Python/matplotlab/gallery/misc/demo_ribbon_box.md @@ -0,0 +1,105 @@ +# 演示丝带盒 + +![演示丝带盒示例](https://matplotlib.org/_images/sphx_glr_demo_ribbon_box_001.png) + +```python +import numpy as np + +from matplotlib import cbook, colors as mcolors +from matplotlib.image import BboxImage +import matplotlib.pyplot as plt + + +class RibbonBox: + + original_image = plt.imread( + cbook.get_sample_data("Minduka_Present_Blue_Pack.png")) + cut_location = 70 + b_and_h = original_image[:, :, 2:3] + color = original_image[:, :, 2:3] - original_image[:, :, 0:1] + alpha = original_image[:, :, 3:4] + nx = original_image.shape[1] + + def __init__(self, color): + rgb = mcolors.to_rgba(color)[:3] + self.im = np.dstack( + [self.b_and_h - self.color * (1 - np.array(rgb)), self.alpha]) + + def get_stretched_image(self, stretch_factor): + stretch_factor = max(stretch_factor, 1) + ny, nx, nch = self.im.shape + ny2 = int(ny*stretch_factor) + return np.vstack( + [self.im[:self.cut_location], + np.broadcast_to( + self.im[self.cut_location], (ny2 - ny, nx, nch)), + self.im[self.cut_location:]]) + + +class RibbonBoxImage(BboxImage): + zorder = 1 + + def __init__(self, bbox, color, **kwargs): + super().__init__(bbox, **kwargs) + self._ribbonbox = RibbonBox(color) + + def draw(self, renderer, *args, **kwargs): + bbox = self.get_window_extent(renderer) + stretch_factor = bbox.height / bbox.width + + ny = int(stretch_factor*self._ribbonbox.nx) + if self.get_array() is None or self.get_array().shape[0] != ny: + arr = self._ribbonbox.get_stretched_image(stretch_factor) + self.set_array(arr) + + super().draw(renderer, *args, **kwargs) + + +if True: + from matplotlib.transforms import Bbox, TransformedBbox + from matplotlib.ticker import ScalarFormatter + + # Fixing random state for reproducibility + np.random.seed(19680801) + + fig, ax = plt.subplots() + + years = np.arange(2004, 2009) + box_colors = [(0.8, 0.2, 0.2), + (0.2, 0.8, 0.2), + (0.2, 0.2, 0.8), + (0.7, 0.5, 0.8), + (0.3, 0.8, 0.7), + ] + heights = np.random.random(years.shape) * 7000 + 3000 + + fmt = ScalarFormatter(useOffset=False) + ax.xaxis.set_major_formatter(fmt) + + for year, h, bc in zip(years, heights, box_colors): + bbox0 = Bbox.from_extents(year - 0.4, 0., year + 0.4, h) + bbox = TransformedBbox(bbox0, ax.transData) + rb_patch = RibbonBoxImage(bbox, bc, interpolation="bicubic") + + ax.add_artist(rb_patch) + + ax.annotate(r"%d" % (int(h/100.)*100), + (year, h), va="bottom", ha="center") + + patch_gradient = BboxImage(ax.bbox, interpolation="bicubic", zorder=0.1) + gradient = np.zeros((2, 2, 4)) + gradient[:, :, :3] = [1, 1, 0.] + gradient[:, :, 3] = [[0.1, 0.3], [0.3, 0.5]] # alpha channel + patch_gradient.set_array(gradient) + ax.add_artist(patch_gradient) + + ax.set_xlim(years[0] - 0.5, years[-1] + 0.5) + ax.set_ylim(0, 10000) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_ribbon_box.py](https://matplotlib.org/_downloads/demo_ribbon_box.py) +- [下载Jupyter notebook: demo_ribbon_box.ipynb](https://matplotlib.org/_downloads/demo_ribbon_box.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/fill_spiral.md b/Python/matplotlab/gallery/misc/fill_spiral.md new file mode 100644 index 00000000..4706b1a5 --- /dev/null +++ b/Python/matplotlab/gallery/misc/fill_spiral.md @@ -0,0 +1,34 @@ +# 填充螺旋 + +![填充螺旋示例](https://matplotlib.org/_images/sphx_glr_fill_spiral_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +theta = np.arange(0, 8*np.pi, 0.1) +a = 1 +b = .2 + +for dt in np.arange(0, 2*np.pi, np.pi/2.0): + + x = a*np.cos(theta + dt)*np.exp(b*theta) + y = a*np.sin(theta + dt)*np.exp(b*theta) + + dt = dt + np.pi/4.0 + + x2 = a*np.cos(theta + dt)*np.exp(b*theta) + y2 = a*np.sin(theta + dt)*np.exp(b*theta) + + xf = np.concatenate((x, x2[::-1])) + yf = np.concatenate((y, y2[::-1])) + + p1 = plt.fill(xf, yf) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fill_spiral.py](https://matplotlib.org/_downloads/fill_spiral.py) +- [下载Jupyter notebook: fill_spiral.ipynb](https://matplotlib.org/_downloads/fill_spiral.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/findobj_demo.md b/Python/matplotlab/gallery/misc/findobj_demo.md new file mode 100644 index 00000000..ca60bb60 --- /dev/null +++ b/Python/matplotlab/gallery/misc/findobj_demo.md @@ -0,0 +1,47 @@ +# Findobj演示 + +递归查找符合某些条件的所有对象 + +![Findobj演示](https://matplotlib.org/_images/sphx_glr_findobj_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.text as text + +a = np.arange(0, 3, .02) +b = np.arange(0, 3, .02) +c = np.exp(a) +d = c[::-1] + +fig, ax = plt.subplots() +plt.plot(a, c, 'k--', a, d, 'k:', a, c + d, 'k') +plt.legend(('Model length', 'Data length', 'Total message length'), + loc='upper center', shadow=True) +plt.ylim([-1, 20]) +plt.grid(False) +plt.xlabel('Model complexity --->') +plt.ylabel('Message length --->') +plt.title('Minimum Message Length') + + +# match on arbitrary function +def myfunc(x): + return hasattr(x, 'set_color') and not hasattr(x, 'set_facecolor') + + +for o in fig.findobj(myfunc): + o.set_color('blue') + +# match on class instances +for o in fig.findobj(text.Text): + o.set_fontstyle('italic') + + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: findobj_demo.py](https://matplotlib.org/_downloads/findobj_demo.py) +- [下载Jupyter notebook: findobj_demo.ipynb](https://matplotlib.org/_downloads/findobj_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/font_indexing.md b/Python/matplotlab/gallery/misc/font_indexing.md new file mode 100644 index 00000000..9d302eb3 --- /dev/null +++ b/Python/matplotlab/gallery/misc/font_indexing.md @@ -0,0 +1,55 @@ +# 字体索引 + +一个小示例,它展示了对字体表的各种索引是如何相互关联的。主要是为MPL开发商.。 + +输出: + +```python +(6, 0, 519, 576) +36 57 65 86 +AV 0 +AV 0 +AV 0 +AV 0 +``` + +```python +import matplotlib +from matplotlib.ft2font import FT2Font, KERNING_DEFAULT, KERNING_UNFITTED, KERNING_UNSCALED + + +fname = matplotlib.get_data_path() + '/fonts/ttf/DejaVuSans.ttf' +font = FT2Font(fname) +font.set_charmap(0) + +codes = font.get_charmap().items() +#dsu = [(ccode, glyphind) for ccode, glyphind in codes] +#dsu.sort() +#for ccode, glyphind in dsu: +# try: name = font.get_glyph_name(glyphind) +# except RuntimeError: pass +# else: print('% 4d % 4d %s %s' % (glyphind, ccode, hex(int(ccode)), name)) + + +# make a charname to charcode and glyphind dictionary +coded = {} +glyphd = {} +for ccode, glyphind in codes: + name = font.get_glyph_name(glyphind) + coded[name] = ccode + glyphd[name] = glyphind + +code = coded['A'] +glyph = font.load_char(code) +print(glyph.bbox) +print(glyphd['A'], glyphd['V'], coded['A'], coded['V']) +print('AV', font.get_kerning(glyphd['A'], glyphd['V'], KERNING_DEFAULT)) +print('AV', font.get_kerning(glyphd['A'], glyphd['V'], KERNING_UNFITTED)) +print('AV', font.get_kerning(glyphd['A'], glyphd['V'], KERNING_UNSCALED)) +print('AV', font.get_kerning(glyphd['A'], glyphd['T'], KERNING_UNSCALED)) +``` + +## 下载这个示例 + +- [下载python源码: font_indexing.py](https://matplotlib.org/_downloads/font_indexing.py) +- [下载Jupyter notebook: font_indexing.ipynb](https://matplotlib.org/_downloads/font_indexing.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/ftface_props.md b/Python/matplotlab/gallery/misc/ftface_props.md new file mode 100644 index 00000000..43c742bd --- /dev/null +++ b/Python/matplotlab/gallery/misc/ftface_props.md @@ -0,0 +1,103 @@ +# Ftface属性 + +这是一个演示脚本,向您展示如何使用FT2Font对象的所有属性。这些描述了全局字体属性。对于单个字符度量标准,请使用load_char返回的Glyph对象 + +输出: + +```python +Num faces : 1 +Num glyphs : 5343 +Family name : DejaVu Sans +Style name : Oblique +PS name : DejaVuSans-Oblique +Num fixed : 0 +Bbox : (-2080, -717, 3398, 2187) +EM : 2048 +Ascender : 1901 +Descender : -483 +Height : 2384 +Max adv width : 3461 +Max adv height : 2384 +Underline pos : -175 +Underline thickness : 90 +Italic : True +Bold : False +Scalable : True +Fixed sizes : False +Fixed width : False +SFNT : False +Horizontal : False +Vertical : False +Kerning : False +Fast glyphs : False +Multiple masters : False +Glyph names : False +External stream : False +['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'ascender', 'bbox', 'clear', 'descender', 'draw_glyph_to_bitmap', 'draw_glyphs_to_bitmap', 'face_flags', 'family_name', 'fname', 'get_bitmap_offset', 'get_char_index', 'get_charmap', 'get_descent', 'get_glyph_name', 'get_image', 'get_kerning', 'get_name_index', 'get_num_glyphs', 'get_path', 'get_ps_font_info', 'get_sfnt', 'get_sfnt_table', 'get_width_height', 'get_xys', 'height', 'load_char', 'load_glyph', 'max_advance_height', 'max_advance_width', 'num_charmaps', 'num_faces', 'num_fixed_sizes', 'num_glyphs', 'postscript_name', 'scalable', 'select_charmap', 'set_charmap', 'set_size', 'set_text', 'style_flags', 'style_name', 'underline_position', 'underline_thickness', 'units_per_EM'] + +``` + +```python +import matplotlib +import matplotlib.ft2font as ft + + +#fname = '/usr/local/share/matplotlib/VeraIt.ttf' +fname = matplotlib.get_data_path() + '/fonts/ttf/DejaVuSans-Oblique.ttf' +#fname = '/usr/local/share/matplotlib/cmr10.ttf' + +font = ft.FT2Font(fname) + +print('Num faces :', font.num_faces) # number of faces in file +print('Num glyphs :', font.num_glyphs) # number of glyphs in the face +print('Family name :', font.family_name) # face family name +print('Style name :', font.style_name) # face style name +print('PS name :', font.postscript_name) # the postscript name +print('Num fixed :', font.num_fixed_sizes) # number of embedded bitmap in face + +# the following are only available if face.scalable +if font.scalable: + # the face global bounding box (xmin, ymin, xmax, ymax) + print('Bbox :', font.bbox) + # number of font units covered by the EM + print('EM :', font.units_per_EM) + # the ascender in 26.6 units + print('Ascender :', font.ascender) + # the descender in 26.6 units + print('Descender :', font.descender) + # the height in 26.6 units + print('Height :', font.height) + # maximum horizontal cursor advance + print('Max adv width :', font.max_advance_width) + # same for vertical layout + print('Max adv height :', font.max_advance_height) + # vertical position of the underline bar + print('Underline pos :', font.underline_position) + # vertical thickness of the underline + print('Underline thickness :', font.underline_thickness) + +for style in ('Italic', + 'Bold', + 'Scalable', + 'Fixed sizes', + 'Fixed width', + 'SFNT', + 'Horizontal', + 'Vertical', + 'Kerning', + 'Fast glyphs', + 'Multiple masters', + 'Glyph names', + 'External stream'): + bitpos = getattr(ft, style.replace(' ', '_').upper()) - 1 + print('%-17s:' % style, bool(font.style_flags & (1 << bitpos))) + +print(dir(font)) + +print(font.get_kerning) +``` + +## 下载这个示例 + +- [下载python源码: ftface_props.py](https://matplotlib.org/_downloads/ftface_props.py) +- [下载Jupyter notebook: ftface_props.ipynb](https://matplotlib.org/_downloads/ftface_props.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/histogram_path.md b/Python/matplotlab/gallery/misc/histogram_path.md new file mode 100644 index 00000000..848e1cd4 --- /dev/null +++ b/Python/matplotlab/gallery/misc/histogram_path.md @@ -0,0 +1,92 @@ +# 使用“矩形”和“多边形”构建直方图 + +使用路径补丁绘制矩形。 使用大量Rectangle实例的技术或使用PolyCollections的更快方法是在我们在mpl中使用moveto / lineto,closepoly等的正确路径之前实现的。 现在我们拥有它们,我们可以使用PathCollection更有效地绘制具有同质属性的常规形状对象的集合。 这个例子创建了一个直方图 - 在开始时设置顶点数组的工作量更大,但对于大量对象来说它应该更快。 + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.patches as patches +import matplotlib.path as path + +fig, ax = plt.subplots() + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# histogram our data with numpy + +data = np.random.randn(1000) +n, bins = np.histogram(data, 50) + +# get the corners of the rectangles for the histogram +left = np.array(bins[:-1]) +right = np.array(bins[1:]) +bottom = np.zeros(len(left)) +top = bottom + n + + +# we need a (numrects x numsides x 2) numpy array for the path helper +# function to build a compound path +XY = np.array([[left, left, right, right], [bottom, top, top, bottom]]).T + +# get the Path object +barpath = path.Path.make_compound_path_from_polys(XY) + +# make a patch out of it +patch = patches.PathPatch(barpath) +ax.add_patch(patch) + +# update the view limits +ax.set_xlim(left[0], right[-1]) +ax.set_ylim(bottom.min(), top.max()) + +plt.show() +``` + +![使用“矩形”和“多边形”构建直方图示例](https://matplotlib.org/_images/sphx_glr_histogram_path_001.png) + +应该注意的是,我们可以使用顶点和代码直接创建复合路径,而不是创建三维数组并使用[make_compound_path_from_polys](https://matplotlib.org/api/path_api.html#matplotlib.path.Path.make_compound_path_from_polys),如下所示 + +```python +nrects = len(left) +nverts = nrects*(1+3+1) +verts = np.zeros((nverts, 2)) +codes = np.ones(nverts, int) * path.Path.LINETO +codes[0::5] = path.Path.MOVETO +codes[4::5] = path.Path.CLOSEPOLY +verts[0::5, 0] = left +verts[0::5, 1] = bottom +verts[1::5, 0] = left +verts[1::5, 1] = top +verts[2::5, 0] = right +verts[2::5, 1] = top +verts[3::5, 0] = right +verts[3::5, 1] = bottom + +barpath = path.Path(verts, codes) +``` + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.patches +matplotlib.patches.PathPatch +matplotlib.path +matplotlib.path.Path +matplotlib.path.Path.make_compound_path_from_polys +matplotlib.axes.Axes.add_patch +matplotlib.collections.PathCollection + +# This example shows an alternative to +matplotlib.collections.PolyCollection +matplotlib.axes.Axes.hist +``` + +## 下载这个示例 + +- [下载python源码: histogram_path.py](https://matplotlib.org/_downloads/histogram_path.py) +- [下载Jupyter notebook: histogram_path.ipynb](https://matplotlib.org/_downloads/histogram_path.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/hyperlinks_sgskip.md b/Python/matplotlab/gallery/misc/hyperlinks_sgskip.md new file mode 100644 index 00000000..705a26b1 --- /dev/null +++ b/Python/matplotlab/gallery/misc/hyperlinks_sgskip.md @@ -0,0 +1,39 @@ +# 超链接 + +此示例演示如何在各种元素上设置超链接。 + +这目前只适用于SVG后端。 + +```python +import numpy as np +import matplotlib.cm as cm +import matplotlib.pyplot as plt +``` + +```python +f = plt.figure() +s = plt.scatter([1, 2, 3], [4, 5, 6]) +s.set_urls(['http://www.bbc.co.uk/news', 'http://www.google.com', None]) +f.savefig('scatter.svg') +``` + +```python +f = plt.figure() +delta = 0.025 +x = y = np.arange(-3.0, 3.0, delta) +X, Y = np.meshgrid(x, y) +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +im = plt.imshow(Z, interpolation='bilinear', cmap=cm.gray, + origin='lower', extent=[-3, 3, -3, 3]) + +im.set_url('http://www.google.com') +f.savefig('image.svg') +``` + +## 下载这个示例 + +- [下载python源码: hyperlinks_sgskip.py](https://matplotlib.org/_downloads/hyperlinks_sgskip.py) +- [下载Jupyter notebook: hyperlinks_sgskip.ipynb](https://matplotlib.org/_downloads/hyperlinks_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/image_thumbnail_sgskip.md b/Python/matplotlab/gallery/misc/image_thumbnail_sgskip.md new file mode 100644 index 00000000..2124a46d --- /dev/null +++ b/Python/matplotlab/gallery/misc/image_thumbnail_sgskip.md @@ -0,0 +1,35 @@ +# 图像缩略图 + +您可以使用matplotlib从现有图像生成缩略图。matplotlib本身支持输入端的PNG文件,如果安装了PIL,则透明地支持其他图像类型。 + +```python +# build thumbnails of all images in a directory +import sys +import os +import glob +import matplotlib.image as image + + +if len(sys.argv) != 2: + print('Usage: python %s IMAGEDIR' % __file__) + raise SystemExit +indir = sys.argv[1] +if not os.path.isdir(indir): + print('Could not find input directory "%s"' % indir) + raise SystemExit + +outdir = 'thumbs' +if not os.path.exists(outdir): + os.makedirs(outdir) + +for fname in glob.glob(os.path.join(indir, '*.png')): + basedir, basename = os.path.split(fname) + outfile = os.path.join(outdir, basename) + fig = image.thumbnail(fname, outfile, scale=0.15) + print('saved thumbnail of %s to %s' % (fname, outfile)) +``` + +## 下载这个示例 + +- [下载python源码: image_thumbnail_sgskip.py](https://matplotlib.org/_downloads/image_thumbnail_sgskip.py) +- [下载Jupyter notebook: image_thumbnail_sgskip.ipynb](https://matplotlib.org/_downloads/image_thumbnail_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/keyword_plotting.md b/Python/matplotlab/gallery/misc/keyword_plotting.md new file mode 100644 index 00000000..51b34e33 --- /dev/null +++ b/Python/matplotlab/gallery/misc/keyword_plotting.md @@ -0,0 +1,29 @@ +# 用关键字绘图 + +在某些情况下,您可以使用允许您使用字符串访问特定变量的格式的数据。 例如,使用 [numpy.recarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html#numpy.recarray) 或[pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame)。 + +Matplotlib允许您使用data关键字参数提供此类对象。如果提供,则可以生成具有与这些变量对应的字符串的图。 + +![用关键字绘图示例](https://matplotlib.org/_images/sphx_glr_keyword_plotting_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +np.random.seed(19680801) + +data = {'a': np.arange(50), + 'c': np.random.randint(0, 50, 50), + 'd': np.random.randn(50)} +data['b'] = data['a'] + 10 * np.random.randn(50) +data['d'] = np.abs(data['d']) * 100 + +fig, ax = plt.subplots() +ax.scatter('a', 'b', c='c', s='d', data=data) +ax.set(xlabel='entry a', ylabel='entry b') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: keyword_plotting.py](https://matplotlib.org/_downloads/keyword_plotting.py) +- [下载Jupyter notebook: keyword_plotting.ipynb](https://matplotlib.org/_downloads/keyword_plotting.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/load_converter.md b/Python/matplotlab/gallery/misc/load_converter.md new file mode 100644 index 00000000..875afc75 --- /dev/null +++ b/Python/matplotlab/gallery/misc/load_converter.md @@ -0,0 +1,33 @@ +# 负载转换器 + +![负载转换器示例](https://matplotlib.org/_images/sphx_glr_load_converter_001.png) + +输出: + +```python +loading /home/tcaswell/mc3/envs/dd37/lib/python3.7/site-packages/matplotlib/mpl-data/sample_data/msft.csv +``` + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook +from matplotlib.dates import bytespdate2num + +datafile = cbook.get_sample_data('msft.csv', asfileobj=False) +print('loading', datafile) + +dates, closes = np.loadtxt(datafile, delimiter=',', + converters={0: bytespdate2num('%d-%b-%y')}, + skiprows=1, usecols=(0, 2), unpack=True) + +fig, ax = plt.subplots() +ax.plot_date(dates, closes, '-') +fig.autofmt_xdate() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: load_converter.py](https://matplotlib.org/_downloads/load_converter.py) +- [下载Jupyter notebook: load_converter.ipynb](https://matplotlib.org/_downloads/load_converter.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/logos2.md b/Python/matplotlab/gallery/misc/logos2.md new file mode 100644 index 00000000..26c7a4b7 --- /dev/null +++ b/Python/matplotlab/gallery/misc/logos2.md @@ -0,0 +1,92 @@ +# Matplotlib标志 + +显示一些matplotlib徽标。 + +感谢Tony Yu 的标志设计 + +![Matplotlib标志示例](https://matplotlib.org/_images/sphx_glr_logos2_001.png) + +```python +import numpy as np +import matplotlib as mpl +import matplotlib.pyplot as plt +import matplotlib.cm as cm + +mpl.rcParams['xtick.labelsize'] = 10 +mpl.rcParams['ytick.labelsize'] = 12 +mpl.rcParams['axes.edgecolor'] = 'gray' + + +axalpha = 0.05 +figcolor = 'white' +dpi = 80 +fig = plt.figure(figsize=(6, 1.1), dpi=dpi) +fig.patch.set_edgecolor(figcolor) +fig.patch.set_facecolor(figcolor) + + +def add_math_background(): + ax = fig.add_axes([0., 0., 1., 1.]) + + text = [] + text.append( + (r"$W^{3\beta}_{\delta_1 \rho_1 \sigma_2} = " + r"U^{3\beta}_{\delta_1 \rho_1} + \frac{1}{8 \pi 2}" + r"\int^{\alpha_2}_{\alpha_2} d \alpha^\prime_2 " + r"\left[\frac{ U^{2\beta}_{\delta_1 \rho_1} - " + r"\alpha^\prime_2U^{1\beta}_{\rho_1 \sigma_2} " + r"}{U^{0\beta}_{\rho_1 \sigma_2}}\right]$", (0.7, 0.2), 20)) + text.append((r"$\frac{d\rho}{d t} + \rho \vec{v}\cdot\nabla\vec{v} " + r"= -\nabla p + \mu\nabla^2 \vec{v} + \rho \vec{g}$", + (0.35, 0.9), 20)) + text.append((r"$\int_{-\infty}^\infty e^{-x^2}dx=\sqrt{\pi}$", + (0.15, 0.3), 25)) + text.append((r"$F_G = G\frac{m_1m_2}{r^2}$", + (0.85, 0.7), 30)) + for eq, (x, y), size in text: + ax.text(x, y, eq, ha='center', va='center', color="#11557c", + alpha=0.25, transform=ax.transAxes, fontsize=size) + ax.set_axis_off() + return ax + + +def add_matplotlib_text(ax): + ax.text(0.95, 0.5, 'matplotlib', color='#11557c', fontsize=65, + ha='right', va='center', alpha=1.0, transform=ax.transAxes) + + +def add_polar_bar(): + ax = fig.add_axes([0.025, 0.075, 0.2, 0.85], projection='polar') + + ax.patch.set_alpha(axalpha) + ax.set_axisbelow(True) + N = 7 + arc = 2. * np.pi + theta = np.arange(0.0, arc, arc/N) + radii = 10 * np.array([0.2, 0.6, 0.8, 0.7, 0.4, 0.5, 0.8]) + width = np.pi / 4 * np.array([0.4, 0.4, 0.6, 0.8, 0.2, 0.5, 0.3]) + bars = ax.bar(theta, radii, width=width, bottom=0.0) + for r, bar in zip(radii, bars): + bar.set_facecolor(cm.jet(r/10.)) + bar.set_alpha(0.6) + + ax.tick_params(labelbottom=False, labeltop=False, + labelleft=False, labelright=False) + + ax.grid(lw=0.8, alpha=0.9, ls='-', color='0.5') + + ax.set_yticks(np.arange(1, 9, 2)) + ax.set_rmax(9) + + +if __name__ == '__main__': + main_axes = add_math_background() + add_polar_bar() + add_matplotlib_text(main_axes) + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: logos2.py](https://matplotlib.org/_downloads/logos2.py) +- [下载Jupyter notebook: logos2.ipynb](https://matplotlib.org/_downloads/logos2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/multipage_pdf.md b/Python/matplotlab/gallery/misc/multipage_pdf.md new file mode 100644 index 00000000..bc7e04a1 --- /dev/null +++ b/Python/matplotlab/gallery/misc/multipage_pdf.md @@ -0,0 +1,54 @@ +# 多页PDF + +这是一个创建包含多个页面的pdf文件,以及向pdf文件添加元数据和注释的演示。 + +如果要使用LaTeX使用多页pdf文件,则需要使用 ``matplotlib.backends.backend_pgf`` 导入PdfPages。 但是这个版本不支持 ``attach_note``。 + +```python +import datetime +import numpy as np +from matplotlib.backends.backend_pdf import PdfPages +import matplotlib.pyplot as plt + +# Create the PdfPages object to which we will save the pages: +# The with statement makes sure that the PdfPages object is closed properly at +# the end of the block, even if an Exception occurs. +with PdfPages('multipage_pdf.pdf') as pdf: + plt.figure(figsize=(3, 3)) + plt.plot(range(7), [3, 1, 4, 1, 5, 9, 2], 'r-o') + plt.title('Page One') + pdf.savefig() # saves the current figure into a pdf page + plt.close() + + # if LaTeX is not installed or error caught, change to `usetex=False` + plt.rc('text', usetex=True) + plt.figure(figsize=(8, 6)) + x = np.arange(0, 5, 0.1) + plt.plot(x, np.sin(x), 'b-') + plt.title('Page Two') + pdf.attach_note("plot of sin(x)") # you can add a pdf note to + # attach metadata to a page + pdf.savefig() + plt.close() + + plt.rc('text', usetex=False) + fig = plt.figure(figsize=(4, 5)) + plt.plot(x, x ** 2, 'ko') + plt.title('Page Three') + pdf.savefig(fig) # or you can pass a Figure object to pdf.savefig + plt.close() + + # We can also set the file's metadata via the PdfPages object: + d = pdf.infodict() + d['Title'] = 'Multipage PDF Example' + d['Author'] = 'Jouni K. Sepp\xe4nen' + d['Subject'] = 'How to create a multipage pdf file and set its metadata' + d['Keywords'] = 'PdfPages multipage keywords author title subject' + d['CreationDate'] = datetime.datetime(2009, 11, 13) + d['ModDate'] = datetime.datetime.today() +``` + +## 下载这个示例 + +- [下载python源码: multipage_pdf.py](https://matplotlib.org/_downloads/multipage_pdf.py) +- [下载Jupyter notebook: multipage_pdf.ipynb](https://matplotlib.org/_downloads/multipage_pdf.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/multiprocess_sgskip.md b/Python/matplotlab/gallery/misc/multiprocess_sgskip.md new file mode 100644 index 00000000..99f4f772 --- /dev/null +++ b/Python/matplotlab/gallery/misc/multiprocess_sgskip.md @@ -0,0 +1,98 @@ +# 多进程 + +演示使用多处理在一个过程中生成数据并在另一个过程中绘图。 + +由Robert Cimrman撰写 + +```python +import multiprocessing as mp +import time + +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) +``` + +## 进程类 + +此类绘制从管道接收的数据。 + +```python +class ProcessPlotter(object): + def __init__(self): + self.x = [] + self.y = [] + + def terminate(self): + plt.close('all') + + def call_back(self): + while self.pipe.poll(): + command = self.pipe.recv() + if command is None: + self.terminate() + return False + else: + self.x.append(command[0]) + self.y.append(command[1]) + self.ax.plot(self.x, self.y, 'ro') + self.fig.canvas.draw() + return True + + def __call__(self, pipe): + print('starting plotter...') + + self.pipe = pipe + self.fig, self.ax = plt.subplots() + timer = self.fig.canvas.new_timer(interval=1000) + timer.add_callback(self.call_back) + timer.start() + + print('...done') + plt.show() +``` + +## 绘图类 + +此类使用多处理来生成一个进程,以运行上面的类中的代码。 初始化时,它会创建一个管道和一个ProcessPlotter实例,它将在一个单独的进程中运行。 + +从命令行运行时,父进程将数据发送到生成的进程,然后通过ProcessPlotter中指定的回调函数绘制:__call__。 + +```python +class NBPlot(object): + def __init__(self): + self.plot_pipe, plotter_pipe = mp.Pipe() + self.plotter = ProcessPlotter() + self.plot_process = mp.Process( + target=self.plotter, args=(plotter_pipe,), daemon=True) + self.plot_process.start() + + def plot(self, finished=False): + send = self.plot_pipe.send + if finished: + send(None) + else: + data = np.random.random(2) + send(data) + + +def main(): + pl = NBPlot() + for ii in range(10): + pl.plot() + time.sleep(0.5) + pl.plot(finished=True) + + +if __name__ == '__main__': + if plt.get_backend() == "MacOSX": + mp.set_start_method("forkserver") + main() +``` + +## 下载这个示例 + +- [下载python源码: multiprocess_sgskip.py](https://matplotlib.org/_downloads/multiprocess_sgskip.py) +- [下载Jupyter notebook: multiprocess_sgskip.ipynb](https://matplotlib.org/_downloads/multiprocess_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/patheffect_demo.md b/Python/matplotlab/gallery/misc/patheffect_demo.md new file mode 100644 index 00000000..9b888e89 --- /dev/null +++ b/Python/matplotlab/gallery/misc/patheffect_demo.md @@ -0,0 +1,52 @@ +# 修补效果演示 + +![修补效果演示](https://matplotlib.org/_images/sphx_glr_patheffect_demo_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.patheffects as PathEffects +import numpy as np + +if 1: + plt.figure(1, figsize=(8, 3)) + ax1 = plt.subplot(131) + ax1.imshow([[1, 2], [2, 3]]) + txt = ax1.annotate("test", (1., 1.), (0., 0), + arrowprops=dict(arrowstyle="->", + connectionstyle="angle3", lw=2), + size=20, ha="center", + path_effects=[PathEffects.withStroke(linewidth=3, + foreground="w")]) + txt.arrow_patch.set_path_effects([ + PathEffects.Stroke(linewidth=5, foreground="w"), + PathEffects.Normal()]) + + pe = [PathEffects.withStroke(linewidth=3, + foreground="w")] + ax1.grid(True, linestyle="-", path_effects=pe) + + ax2 = plt.subplot(132) + arr = np.arange(25).reshape((5, 5)) + ax2.imshow(arr) + cntr = ax2.contour(arr, colors="k") + + plt.setp(cntr.collections, path_effects=[ + PathEffects.withStroke(linewidth=3, foreground="w")]) + + clbls = ax2.clabel(cntr, fmt="%2.0f", use_clabeltext=True) + plt.setp(clbls, path_effects=[ + PathEffects.withStroke(linewidth=3, foreground="w")]) + + # shadow as a path effect + ax3 = plt.subplot(133) + p1, = ax3.plot([0, 1], [0, 1]) + leg = ax3.legend([p1], ["Line 1"], fancybox=True, loc='upper left') + leg.legendPatch.set_path_effects([PathEffects.withSimplePatchShadow()]) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: patheffect_demo.py](https://matplotlib.org/_downloads/patheffect_demo.py) +- [下载Jupyter notebook: patheffect_demo.ipynb](https://matplotlib.org/_downloads/patheffect_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/plotfile_demo.md b/Python/matplotlab/gallery/misc/plotfile_demo.md new file mode 100644 index 00000000..82c8b2ba --- /dev/null +++ b/Python/matplotlab/gallery/misc/plotfile_demo.md @@ -0,0 +1,48 @@ +# Plotfile演示 + +使用plotfile直接从文件绘制数据的示例。 + +```python +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook + +fname = cbook.get_sample_data('msft.csv', asfileobj=False) +fname2 = cbook.get_sample_data('data_x_x2_x3.csv', asfileobj=False) + +# test 1; use ints +plt.plotfile(fname, (0, 5, 6)) + +# test 2; use names +plt.plotfile(fname, ('date', 'volume', 'adj_close')) + +# test 3; use semilogy for volume +plt.plotfile(fname, ('date', 'volume', 'adj_close'), + plotfuncs={'volume': 'semilogy'}) + +# test 4; use semilogy for volume +plt.plotfile(fname, (0, 5, 6), plotfuncs={5: 'semilogy'}) + +# test 5; single subplot +plt.plotfile(fname, ('date', 'open', 'high', 'low', 'close'), subplots=False) + +# test 6; labeling, if no names in csv-file +plt.plotfile(fname2, cols=(0, 1, 2), delimiter=' ', + names=['$x$', '$f(x)=x^2$', '$f(x)=x^3$']) + +# test 7; more than one file per figure--illustrated here with a single file +plt.plotfile(fname2, cols=(0, 1), delimiter=' ') +plt.plotfile(fname2, cols=(0, 2), newfig=False, + delimiter=' ') # use current figure +plt.xlabel(r'$x$') +plt.ylabel(r'$f(x) = x^2, x^3$') + +# test 8; use bar for volume +plt.plotfile(fname, (0, 5, 6), plotfuncs={5: 'bar'}) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: plotfile_demo.py](https://matplotlib.org/_downloads/plotfile_demo.py) +- [下载Jupyter notebook: plotfile_demo.ipynb](https://matplotlib.org/_downloads/plotfile_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/print_stdout_sgskip.md b/Python/matplotlab/gallery/misc/print_stdout_sgskip.md new file mode 100644 index 00000000..515ca02d --- /dev/null +++ b/Python/matplotlab/gallery/misc/print_stdout_sgskip.md @@ -0,0 +1,20 @@ +# 打印标准输出 + +将PNG打印到标准输出。 + +用法:python print_stdout.py > somefile.png + +```python +import sys +import matplotlib +matplotlib.use('Agg') +import matplotlib.pyplot as plt + +plt.plot([1, 2, 3]) +plt.savefig(sys.stdout.buffer) +``` + +## 下载这个示例 + +- [下载python源码: print_stdout_sgskip.py](https://matplotlib.org/_downloads/print_stdout_sgskip.py) +- [下载Jupyter notebook: print_stdout_sgskip.ipynb](https://matplotlib.org/_downloads/print_stdout_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/pythonic_matplotlib.md b/Python/matplotlab/gallery/misc/pythonic_matplotlib.md new file mode 100644 index 00000000..24d5ecf5 --- /dev/null +++ b/Python/matplotlab/gallery/misc/pythonic_matplotlib.md @@ -0,0 +1,72 @@ +# Pythonic Matplotlib + +有些人喜欢编写更多的面向对象的Python代码,而不是使用pyPLOT接口来编写matplotlib。此示例向您展示了如何实现。 + +除非您是应用程序开发人员,否则我建议使用部分pyplot接口,尤其是图形,close,subplot,axes和show命令。 这些隐藏了您在正常图形创建中不需要看到的很多复杂性,例如实例化DPI实例,管理图形元素的边界框,创建和实现GUI窗口以及在其中嵌入图形。 + +如果您是应用程序开发人员并希望在应用程序中嵌入matplotlib,请遵循示例/ embedding_in_wx.py,examples / embedding_in_gtk.py或examples / embedding_in_tk.py的主题。 在这种情况下,您需要控制所有图形的创建,将它们嵌入应用程序窗口等。 + +如果您是Web应用程序开发人员,您可能希望使用webapp_demo.py中的示例,该示例显示如何直接使用后端agg图形画布,而不包含pyplot中存在的全局变量(当前图形,当前轴) 接口。 但请注意,没有理由说pyplot接口不适用于Web应用程序开发人员。 + +如果您在pyplot接口中编写的示例目录中看到一个示例,并且您希望使用真正的python方法调用来模拟它,则可以轻松进行映射。 其中许多示例使用“set”来控制图形属性。 以下是将这些命令映射到实例方法的方法。 + +set的语法是: + +```python +plt.setp(object or sequence, somestring, attribute) +``` + +如果使用对象调用,则设置调用: + +```python +object.set_somestring(attribute) +``` + +如果使用序列调用,则set: + +```python +for object in sequence: + object.set_somestring(attribute) +``` + +因此,对于您的示例,如果a是您的轴对象,则可以执行以下操作: + +```python +a.set_xticklabels([]) +a.set_yticklabels([]) +a.set_xticks([]) +a.set_yticks([]) +``` + +![Pythonic Matplotlib示例](https://matplotlib.org/_images/sphx_glr_pythonic_matplotlib_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +t = np.arange(0.0, 1.0, 0.01) + +fig, (ax1, ax2) = plt.subplots(2) + +ax1.plot(t, np.sin(2*np.pi * t)) +ax1.grid(True) +ax1.set_ylim((-2, 2)) +ax1.set_ylabel('1 Hz') +ax1.set_title('A sine wave or two') + +ax1.xaxis.set_tick_params(labelcolor='r') + +ax2.plot(t, np.sin(2 * 2*np.pi * t)) +ax2.grid(True) +ax2.set_ylim((-2, 2)) +l = ax2.set_xlabel('Hi mom') +l.set_color('g') +l.set_fontsize('large') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: pythonic_matplotlib.py](https://matplotlib.org/_downloads/pythonic_matplotlib.py) +- [下载Jupyter notebook: pythonic_matplotlib.ipynb](https://matplotlib.org/_downloads/pythonic_matplotlib.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/rasterization_demo.md b/Python/matplotlab/gallery/misc/rasterization_demo.md new file mode 100644 index 00000000..7874e376 --- /dev/null +++ b/Python/matplotlab/gallery/misc/rasterization_demo.md @@ -0,0 +1,61 @@ +# 光栅化演示 + +![光栅化演示](https://matplotlib.org/_images/sphx_glr_rasterization_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +d = np.arange(100).reshape(10, 10) +x, y = np.meshgrid(np.arange(11), np.arange(11)) + +theta = 0.25*np.pi +xx = x*np.cos(theta) - y*np.sin(theta) +yy = x*np.sin(theta) + y*np.cos(theta) + +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) +ax1.set_aspect(1) +ax1.pcolormesh(xx, yy, d) +ax1.set_title("No Rasterization") + +ax2.set_aspect(1) +ax2.set_title("Rasterization") + +m = ax2.pcolormesh(xx, yy, d) +m.set_rasterized(True) + +ax3.set_aspect(1) +ax3.pcolormesh(xx, yy, d) +ax3.text(0.5, 0.5, "Text", alpha=0.2, + va="center", ha="center", size=50, transform=ax3.transAxes) + +ax3.set_title("No Rasterization") + + +ax4.set_aspect(1) +m = ax4.pcolormesh(xx, yy, d) +m.set_zorder(-20) + +ax4.text(0.5, 0.5, "Text", alpha=0.2, + zorder=-15, + va="center", ha="center", size=50, transform=ax4.transAxes) + +ax4.set_rasterization_zorder(-10) + +ax4.set_title("Rasterization z$<-10$") + + +# ax2.title.set_rasterized(True) # should display a warning + +plt.savefig("test_rasterization.pdf", dpi=150) +plt.savefig("test_rasterization.eps", dpi=150) + +if not plt.rcParams["text.usetex"]: + plt.savefig("test_rasterization.svg", dpi=150) + # svg backend currently ignores the dpi +``` + +## 下载这个示例 + +- [下载python源码: rasterization_demo.py](https://matplotlib.org/_downloads/rasterization_demo.py) +- [下载Jupyter notebook: rasterization_demo.ipynb](https://matplotlib.org/_downloads/rasterization_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/set_and_get.md b/Python/matplotlab/gallery/misc/set_and_get.md new file mode 100644 index 00000000..7cd61813 --- /dev/null +++ b/Python/matplotlab/gallery/misc/set_and_get.md @@ -0,0 +1,362 @@ +# 设置和获取 + +pylot接口允许您使用setp和getp来设置和获取对象属性,以及对象进行内省。 + +## 设置 + +要将线条的线型设置为虚线,您可以执行以下操作: + +```python +>>> line, = plt.plot([1,2,3]) +>>> plt.setp(line, linestyle='--') +``` + +如果要了解有效的参数类型,可以提供要设置的属性的名称而不使用值: + +```python +>>> plt.setp(line, 'linestyle') + linestyle: [ '-' | '--' | '-.' | ':' | 'steps' | 'None' ] +``` + +如果要查看可以设置的所有属性及其可能的值,您可以执行以下操作: + +```python +>>> plt.setp(line) +``` + +set在单个实例或实例列表上运行。如果您处于查询模式内省可能的值,则仅使用序列中的第一个实例。实际设置值时,全部 +实例将被设置。 例如,假设您有两行的列表,以下将使两行变粗和变红: + +```python +>>> x = np.arange(0,1.0,0.01) +>>> y1 = np.sin(2*np.pi*x) +>>> y2 = np.sin(4*np.pi*x) +>>> lines = plt.plot(x, y1, x, y2) +>>> plt.setp(lines, linewidth=2, color='r') +``` + +## 获取 + +get返回给定属性的值。 您可以使用get来查询单个属性的值: + +```python +>>> plt.getp(line, 'linewidth') + 0.5 +``` + +或所有属性/值对: + +```python +>>> plt.getp(line) + aa = True + alpha = 1.0 + antialiased = True + c = b + clip_on = True + color = b + ... long listing skipped ... +``` + +## 别名 + +为了减少交互模式中的击键,许多属性具有短的别名,例如,'linew'为'lw','markeredgecolor'为'mec'。 当调用set或get in introspection模式时,这些属性将被列为'fullname或aliasname'。 + +![别名](https://matplotlib.org/_images/sphx_glr_set_and_get_001.png) + +输出: + +```python +Line setters + agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array + alpha: float + animated: bool + antialiased: bool + clip_box: `.Bbox` + clip_on: bool + clip_path: [(`~matplotlib.path.Path`, `.Transform`) | `.Patch` | None] + color: color + contains: callable + dash_capstyle: {'butt', 'round', 'projecting'} + dash_joinstyle: {'miter', 'round', 'bevel'} + dashes: sequence of floats (on/off ink in points) or (None, None) + drawstyle: {'default', 'steps', 'steps-pre', 'steps-mid', 'steps-post'} + figure: `.Figure` + fillstyle: {'full', 'left', 'right', 'bottom', 'top', 'none'} + gid: str + in_layout: bool + label: object + linestyle: {'-', '--', '-.', ':', '', (offset, on-off-seq), ...} + linewidth: float + marker: unknown + markeredgecolor: color + markeredgewidth: float + markerfacecolor: color + markerfacecoloralt: color + markersize: float + markevery: unknown + path_effects: `.AbstractPathEffect` + picker: float or callable[[Artist, Event], Tuple[bool, dict]] + pickradius: float + rasterized: bool or None + sketch_params: (scale: float, length: float, randomness: float) + snap: bool or None + solid_capstyle: {'butt', 'round', 'projecting'} + solid_joinstyle: {'miter', 'round', 'bevel'} + transform: matplotlib.transforms.Transform + url: str + visible: bool + xdata: 1D array + ydata: 1D array + zorder: float +Line getters + agg_filter = None + alpha = None + animated = False + antialiased = True + children = [] + clip_box = TransformedBbox( Bbox(x0=0.0, y0=0.0, x1=1.0, ... + clip_on = True + clip_path = None + color = r + contains = None + dash_capstyle = butt + dash_joinstyle = round + data = (array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, ... + drawstyle = default + figure = Figure(640x480) + fillstyle = full + gid = None + in_layout = True + label = _line0 + linestyle = -- + linewidth = 2.0 + marker = None + markeredgecolor = r + markeredgewidth = 1.0 + markerfacecolor = r + markerfacecoloralt = none + markersize = 6.0 + markevery = None + path = Path(array([[ 0.00000000e+00, 0.00000000e+00], ... + path_effects = [] + picker = None + pickradius = 5 + rasterized = None + sketch_params = None + snap = None + solid_capstyle = projecting + solid_joinstyle = round + transform = CompositeGenericTransform( TransformWrapper( ... + transformed_clip_path_and_affine = (None, None) + url = None + visible = True + xdata = [0. 0.01 0.02 0.03 0.04 0.05]... + xydata = [[0. 0. ] [0.01 0.06279052] ... + ydata = [0. 0.06279052 0.12533323 0.18738131 0.248... + zorder = 2 +Rectangle setters + agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array + alpha: float or None + animated: bool + antialiased: unknown + capstyle: {'butt', 'round', 'projecting'} + clip_box: `.Bbox` + clip_on: bool + clip_path: [(`~matplotlib.path.Path`, `.Transform`) | `.Patch` | None] + color: color + contains: callable + edgecolor: color or None or 'auto' + facecolor: color or None + figure: `.Figure` + fill: bool + gid: str + hatch: {'/', '\\', '|', '-', '+', 'x', 'o', 'O', '.', '*'} + height: unknown + in_layout: bool + joinstyle: {'miter', 'round', 'bevel'} + label: object + linestyle: {'-', '--', '-.', ':', '', (offset, on-off-seq), ...} + linewidth: float or None for default + path_effects: `.AbstractPathEffect` + picker: None or bool or float or callable + rasterized: bool or None + sketch_params: (scale: float, length: float, randomness: float) + snap: bool or None + transform: `.Transform` + url: str + visible: bool + width: unknown + x: unknown + xy: (float, float) + y: unknown + zorder: float +Rectangle getters + agg_filter = None + alpha = None + animated = False + antialiased = True + bbox = Bbox(x0=0.0, y0=0.0, x1=1.0, y1=1.0) + capstyle = butt + children = [] + clip_box = None + clip_on = True + clip_path = None + contains = None + data_transform = BboxTransformTo( TransformedBbox( Bbox... + edgecolor = (0.0, 0.0, 0.0, 0.0) + extents = Bbox(x0=80.0, y0=52.8, x1=576.0, y1=422.4) + facecolor = (1.0, 1.0, 1.0, 1.0) + figure = Figure(640x480) + fill = True + gid = None + hatch = None + height = 1.0 + in_layout = True + joinstyle = miter + label = + linestyle = solid + linewidth = 0.0 + patch_transform = CompositeGenericTransform( BboxTransformTo( ... + path = Path(array([[0., 0.], [1., 0.], [1.,... + path_effects = [] + picker = None + rasterized = None + sketch_params = None + snap = None + transform = CompositeGenericTransform( CompositeGenericTra... + transformed_clip_path_and_affine = (None, None) + url = None + verts = [[ 80. 52.8] [576. 52.8] [576. 422.4] [ 80... + visible = True + width = 1.0 + window_extent = Bbox(x0=80.0, y0=52.8, x1=576.0, y1=422.4) + x = 0.0 + xy = (0.0, 0.0) + y = 0.0 + zorder = 1 +Text setters + agg_filter: a filter function, which takes a (m, n, 3) float array and a dpi value, and returns a (m, n, 3) array + alpha: float + animated: bool + backgroundcolor: color + bbox: dict with properties for `.patches.FancyBboxPatch` + clip_box: `matplotlib.transforms.Bbox` + clip_on: bool + clip_path: { (`.path.Path`, `.transforms.Transform`), `.patches.Patch`, None } + color: color + contains: callable + figure: `.Figure` + fontfamily: {FONTNAME, 'serif', 'sans-serif', 'cursive', 'fantasy', 'monospace'} + fontname: {FONTNAME, 'serif', 'sans-serif', 'cursive', 'fantasy', 'monospace'} + fontproperties: `.font_manager.FontProperties` + fontsize: {size in points, 'xx-small', 'x-small', 'small', 'medium', 'large', 'x-large', 'xx-large'} + fontstretch: {a numeric value in range 0-1000, 'ultra-condensed', 'extra-condensed', 'condensed', 'semi-condensed', 'normal', 'semi-expanded', 'expanded', 'extra-expanded', 'ultra-expanded'} + fontstyle: {'normal', 'italic', 'oblique'} + fontvariant: {'normal', 'small-caps'} + fontweight: {a numeric value in range 0-1000, 'ultralight', 'light', 'normal', 'regular', 'book', 'medium', 'roman', 'semibold', 'demibold', 'demi', 'bold', 'heavy', 'extra bold', 'black'} + gid: str + horizontalalignment: {'center', 'right', 'left'} + in_layout: bool + label: object + linespacing: float (multiple of font size) + multialignment: {'left', 'right', 'center'} + path_effects: `.AbstractPathEffect` + picker: None or bool or float or callable + position: (float, float) + rasterized: bool or None + rotation: {angle in degrees, 'vertical', 'horizontal'} + rotation_mode: {None, 'default', 'anchor'} + sketch_params: (scale: float, length: float, randomness: float) + snap: bool or None + text: string or object castable to string (but ``None`` becomes ``''``) + transform: `.Transform` + url: str + usetex: bool or None + verticalalignment: {'center', 'top', 'bottom', 'baseline', 'center_baseline'} + visible: bool + wrap: bool + x: float + y: float + zorder: float +Text getters + agg_filter = None + alpha = None + animated = False + bbox_patch = None + children = [] + clip_box = None + clip_on = True + clip_path = None + color = black + contains = None + figure = Figure(640x480) + fontfamily = ['sans-serif'] + fontname = DejaVu Sans + fontproperties = :family=sans-serif:style=normal:variant=normal:wei... + fontsize = 12.0 + fontstyle = normal + fontvariant = normal + fontweight = normal + gid = None + horizontalalignment = center + in_layout = True + label = + path_effects = [] + picker = None + position = (0.5, 1.0) + rasterized = None + rotation = 0.0 + rotation_mode = None + sketch_params = None + snap = None + stretch = normal + text = Hi mom + transform = CompositeGenericTransform( BboxTransformTo( ... + transformed_clip_path_and_affine = (None, None) + unitless_position = (0.5, 1.0) + url = None + usetex = False + verticalalignment = baseline + visible = True + wrap = False + zorder = 3 +``` + +```python +import matplotlib.pyplot as plt +import numpy as np + + +x = np.arange(0, 1.0, 0.01) +y1 = np.sin(2*np.pi*x) +y2 = np.sin(4*np.pi*x) +lines = plt.plot(x, y1, x, y2) +l1, l2 = lines +plt.setp(lines, linestyle='--') # set both to dashed +plt.setp(l1, linewidth=2, color='r') # line1 is thick and red +plt.setp(l2, linewidth=1, color='g') # line2 is thinner and green + + +print('Line setters') +plt.setp(l1) +print('Line getters') +plt.getp(l1) + +print('Rectangle setters') +plt.setp(plt.gca().patch) +print('Rectangle getters') +plt.getp(plt.gca().patch) + +t = plt.title('Hi mom') +print('Text setters') +plt.setp(t) +print('Text getters') +plt.getp(t) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: set_and_get.py](https://matplotlib.org/_downloads/set_and_get.py) +- [下载Jupyter notebook: set_and_get.ipynb](https://matplotlib.org/_downloads/set_and_get.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/svg_filter_line.md b/Python/matplotlab/gallery/misc/svg_filter_line.md new file mode 100644 index 00000000..47bc13a3 --- /dev/null +++ b/Python/matplotlab/gallery/misc/svg_filter_line.md @@ -0,0 +1,95 @@ +# SVG过滤线 + +演示可能与mpl一起使用的SVG过滤效果。 + +请注意,过滤效果仅在您的svg渲染器支持时才有效。 + +![SVG过滤线示例](https://matplotlib.org/_images/sphx_glr_svg_filter_line_001.png) + +输出: + +```python +Saving 'svg_filter_line.svg' +``` + +```python +import matplotlib.pyplot as plt +import matplotlib.transforms as mtransforms + +fig1 = plt.figure() +ax = fig1.add_axes([0.1, 0.1, 0.8, 0.8]) + +# draw lines +l1, = ax.plot([0.1, 0.5, 0.9], [0.1, 0.9, 0.5], "bo-", + mec="b", lw=5, ms=10, label="Line 1") +l2, = ax.plot([0.1, 0.5, 0.9], [0.5, 0.2, 0.7], "rs-", + mec="r", lw=5, ms=10, color="r", label="Line 2") + + +for l in [l1, l2]: + + # draw shadows with same lines with slight offset and gray colors. + + xx = l.get_xdata() + yy = l.get_ydata() + shadow, = ax.plot(xx, yy) + shadow.update_from(l) + + # adjust color + shadow.set_color("0.2") + # adjust zorder of the shadow lines so that it is drawn below the + # original lines + shadow.set_zorder(l.get_zorder() - 0.5) + + # offset transform + ot = mtransforms.offset_copy(l.get_transform(), fig1, + x=4.0, y=-6.0, units='points') + + shadow.set_transform(ot) + + # set the id for a later use + shadow.set_gid(l.get_label() + "_shadow") + + +ax.set_xlim(0., 1.) +ax.set_ylim(0., 1.) + +# save the figure as a bytes string in the svg format. +from io import BytesIO +f = BytesIO() +plt.savefig(f, format="svg") + + +import xml.etree.cElementTree as ET + +# filter definition for a gaussian blur +filter_def = """ + + + + + +""" + + +# read in the saved svg +tree, xmlid = ET.XMLID(f.getvalue()) + +# insert the filter definition in the svg dom tree. +tree.insert(0, ET.XML(filter_def)) + +for l in [l1, l2]: + # pick up the svg element with given id + shadow = xmlid[l.get_label() + "_shadow"] + # apply shadow filter + shadow.set("filter", 'url(#dropshadow)') + +fn = "svg_filter_line.svg" +print("Saving '%s'" % fn) +ET.ElementTree(tree).write(fn) +``` + +## 下载这个示例 + +- [下载python源码: svg_filter_line.py](https://matplotlib.org/_downloads/svg_filter_line.py) +- [下载Jupyter notebook: svg_filter_line.ipynb](https://matplotlib.org/_downloads/svg_filter_line.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/svg_filter_pie.md b/Python/matplotlab/gallery/misc/svg_filter_pie.md new file mode 100644 index 00000000..adeb9870 --- /dev/null +++ b/Python/matplotlab/gallery/misc/svg_filter_pie.md @@ -0,0 +1,104 @@ +# SVG过滤管道 + +演示可能与mpl一起使用的SVG过滤效果。饼图绘制代码借用了pie_demo.py + +请注意,过滤效果仅在您的svg渲染器支持时才有效。 + +![SVG过滤管道示例](https://matplotlib.org/_images/sphx_glr_svg_filter_pie_001.png) + +输出: + +```python +Saving 'svg_filter_pie.svg' +``` + +```python +import matplotlib.pyplot as plt +from matplotlib.patches import Shadow + +# make a square figure and axes +fig1 = plt.figure(1, figsize=(6, 6)) +ax = fig1.add_axes([0.1, 0.1, 0.8, 0.8]) + +labels = 'Frogs', 'Hogs', 'Dogs', 'Logs' +fracs = [15, 30, 45, 10] + +explode = (0, 0.05, 0, 0) + +# We want to draw the shadow for each pie but we will not use "shadow" +# option as it does'n save the references to the shadow patches. +pies = ax.pie(fracs, explode=explode, labels=labels, autopct='%1.1f%%') + +for w in pies[0]: + # set the id with the label. + w.set_gid(w.get_label()) + + # we don't want to draw the edge of the pie + w.set_ec("none") + +for w in pies[0]: + # create shadow patch + s = Shadow(w, -0.01, -0.01) + s.set_gid(w.get_gid() + "_shadow") + s.set_zorder(w.get_zorder() - 0.1) + ax.add_patch(s) + + +# save +from io import BytesIO +f = BytesIO() +plt.savefig(f, format="svg") + +import xml.etree.cElementTree as ET + + +# filter definition for shadow using a gaussian blur +# and lightening effect. +# The lightening filter is copied from http://www.w3.org/TR/SVG/filters.html + +# I tested it with Inkscape and Firefox3. "Gaussian blur" is supported +# in both, but the lightening effect only in the Inkscape. Also note +# that, Inkscape's exporting also may not support it. + +filter_def = """ + + + + + + + + + + + + + + + +""" + + +tree, xmlid = ET.XMLID(f.getvalue()) + +# insert the filter definition in the svg dom tree. +tree.insert(0, ET.XML(filter_def)) + +for i, pie_name in enumerate(labels): + pie = xmlid[pie_name] + pie.set("filter", 'url(#MyFilter)') + + shadow = xmlid[pie_name + "_shadow"] + shadow.set("filter", 'url(#dropshadow)') + +fn = "svg_filter_pie.svg" +print("Saving '%s'" % fn) +ET.ElementTree(tree).write(fn) +``` + +## 下载这个示例 + +- [下载python源码: svg_filter_pie.py](https://matplotlib.org/_downloads/svg_filter_pie.py) +- [下载Jupyter notebook: svg_filter_pie.ipynb](https://matplotlib.org/_downloads/svg_filter_pie.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/table_demo.md b/Python/matplotlab/gallery/misc/table_demo.md new file mode 100644 index 00000000..710984ed --- /dev/null +++ b/Python/matplotlab/gallery/misc/table_demo.md @@ -0,0 +1,65 @@ +# 表格演示 + +演示表函数以在图表中显示表格。 + +![表格演示示例](https://matplotlib.org/_images/sphx_glr_table_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +data = [[ 66386, 174296, 75131, 577908, 32015], + [ 58230, 381139, 78045, 99308, 160454], + [ 89135, 80552, 152558, 497981, 603535], + [ 78415, 81858, 150656, 193263, 69638], + [139361, 331509, 343164, 781380, 52269]] + +columns = ('Freeze', 'Wind', 'Flood', 'Quake', 'Hail') +rows = ['%d year' % x for x in (100, 50, 20, 10, 5)] + +values = np.arange(0, 2500, 500) +value_increment = 1000 + +# Get some pastel shades for the colors +colors = plt.cm.BuPu(np.linspace(0, 0.5, len(rows))) +n_rows = len(data) + +index = np.arange(len(columns)) + 0.3 +bar_width = 0.4 + +# Initialize the vertical-offset for the stacked bar chart. +y_offset = np.zeros(len(columns)) + +# Plot bars and create text labels for the table +cell_text = [] +for row in range(n_rows): + plt.bar(index, data[row], bar_width, bottom=y_offset, color=colors[row]) + y_offset = y_offset + data[row] + cell_text.append(['%1.1f' % (x / 1000.0) for x in y_offset]) +# Reverse colors and text labels to display the last value at the top. +colors = colors[::-1] +cell_text.reverse() + +# Add a table at the bottom of the axes +the_table = plt.table(cellText=cell_text, + rowLabels=rows, + rowColours=colors, + colLabels=columns, + loc='bottom') + +# Adjust layout to make room for the table: +plt.subplots_adjust(left=0.2, bottom=0.2) + +plt.ylabel("Loss in ${0}'s".format(value_increment)) +plt.yticks(values * value_increment, ['%d' % val for val in values]) +plt.xticks([]) +plt.title('Loss by Disaster') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: table_demo.py](https://matplotlib.org/_downloads/table_demo.py) +- [下载Jupyter notebook: table_demo.ipynb](https://matplotlib.org/_downloads/table_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/tight_bbox_test.md b/Python/matplotlab/gallery/misc/tight_bbox_test.md new file mode 100644 index 00000000..3d174491 --- /dev/null +++ b/Python/matplotlab/gallery/misc/tight_bbox_test.md @@ -0,0 +1,35 @@ +# 严密的Bbox测试 + +![严密的Bbox测试示例](https://matplotlib.org/_images/sphx_glr_tight_bbox_test_001.png) + +输出: + +```python +saving tight_bbox_test.png +saving tight_bbox_test.pdf +saving tight_bbox_test.svg +saving tight_bbox_test.svgz +saving tight_bbox_test.eps +``` + +```python +import matplotlib.pyplot as plt +import numpy as np + +ax = plt.axes([0.1, 0.3, 0.5, 0.5]) + +ax.pcolormesh(np.array([[1, 2], [3, 4]])) +plt.yticks([0.5, 1.5], ["long long tick label", + "tick label"]) +plt.ylabel("My y-label") +plt.title("Check saved figures for their bboxes") +for ext in ["png", "pdf", "svg", "svgz", "eps"]: + print("saving tight_bbox_test.%s" % (ext,)) + plt.savefig("tight_bbox_test.%s" % (ext,), bbox_inches="tight") +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: tight_bbox_test.py](https://matplotlib.org/_downloads/tight_bbox_test.py) +- [下载Jupyter notebook: tight_bbox_test.ipynb](https://matplotlib.org/_downloads/tight_bbox_test.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/transoffset.md b/Python/matplotlab/gallery/misc/transoffset.md new file mode 100644 index 00000000..d713dc42 --- /dev/null +++ b/Python/matplotlab/gallery/misc/transoffset.md @@ -0,0 +1,52 @@ +# Transoffset + +这说明了使用transforms.offset_copy进行变换,该变换将绘图元素(如文本字符串)定位在屏幕坐标(点或英寸)中相对于任何坐标中给出的位置的指定偏移处。 + +每个Artist - 从中派生Text和Line等类的mpl类 - 都有一个可以在创建Artist时设置的转换,例如通过相应的pyplot命令。 默认情况下,这通常是Axes.transData转换,从数据单元到屏幕点。 我们可以使用offset_copy函数来修改此转换的副本,其中修改包含偏移量。 + +![Transoffset示例](https://matplotlib.org/_images/sphx_glr_transoffset_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.transforms as mtransforms +import numpy as np + + +xs = np.arange(7) +ys = xs**2 + +fig = plt.figure(figsize=(5, 10)) +ax = plt.subplot(2, 1, 1) + +# If we want the same offset for each text instance, +# we only need to make one transform. To get the +# transform argument to offset_copy, we need to make the axes +# first; the subplot command above is one way to do this. +trans_offset = mtransforms.offset_copy(ax.transData, fig=fig, + x=0.05, y=0.10, units='inches') + +for x, y in zip(xs, ys): + plt.plot((x,), (y,), 'ro') + plt.text(x, y, '%d, %d' % (int(x), int(y)), transform=trans_offset) + + +# offset_copy works for polar plots also. +ax = plt.subplot(2, 1, 2, projection='polar') + +trans_offset = mtransforms.offset_copy(ax.transData, fig=fig, + y=6, units='dots') + +for x, y in zip(xs, ys): + plt.polar((x,), (y,), 'ro') + plt.text(x, y, '%d, %d' % (int(x), int(y)), + transform=trans_offset, + horizontalalignment='center', + verticalalignment='bottom') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: transoffset.py](https://matplotlib.org/_downloads/transoffset.py) +- [下载Jupyter notebook: transoffset.ipynb](https://matplotlib.org/_downloads/transoffset.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/misc/zorder_demo.md b/Python/matplotlab/gallery/misc/zorder_demo.md new file mode 100644 index 00000000..18a0a8c2 --- /dev/null +++ b/Python/matplotlab/gallery/misc/zorder_demo.md @@ -0,0 +1,71 @@ +# Zorder演示 + +轴的默认绘制顺序是补丁,线条,文本。 此顺序由zorder属性确定。 设置以下默认值 + +Artist | Z-order +---|--- +Patch / PatchCollection | 1 +Line2D / LineCollection | 2 +Text | 3 + +您可以通过设置zorder来更改单个艺术家的顺序。任何单独的plot() 调用都可以为该特定项的zorder设置一个值。 + +在下面的第一个子图中,线条在散点图上方的补丁集合上方绘制,这是默认值。 + +在下面的子图中,顺序颠倒过来。 + +第二个图显示了如何控制各行的zorder。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +x = np.random.random(20) +y = np.random.random(20) +``` + +分散的顶部的线 + +```python +plt.figure() +plt.subplot(211) +plt.plot(x, y, 'C3', lw=3) +plt.scatter(x, y, s=120) +plt.title('Lines on top of dots') + +# Scatter plot on top of lines +plt.subplot(212) +plt.plot(x, y, 'C3', zorder=1, lw=3) +plt.scatter(x, y, s=120, zorder=2) +plt.title('Dots on top of lines') +plt.tight_layout() +``` + +![Zorder演示](https://matplotlib.org/_images/sphx_glr_zorder_demo_001.png) + +一个新的图像,带有单独订购的物品 + +```python +x = np.linspace(0, 2*np.pi, 100) +plt.rcParams['lines.linewidth'] = 10 +plt.figure() +plt.plot(x, np.sin(x), label='zorder=10', zorder=10) # on top +plt.plot(x, np.sin(1.1*x), label='zorder=1', zorder=1) # bottom +plt.plot(x, np.sin(1.2*x), label='zorder=3', zorder=3) +plt.axhline(0, label='zorder=2', color='grey', zorder=2) +plt.title('Custom order of elements') +l = plt.legend(loc='upper right') +l.set_zorder(20) # put the legend on top +plt.show() +``` + +![Zorder演示2](https://matplotlib.org/_images/sphx_glr_zorder_demo_002.png) + +## 下载这个示例 + +- [下载python源码: zorder_demo.py](https://matplotlib.org/_downloads/zorder_demo.py) +- [下载Jupyter notebook: zorder_demo.ipynb](https://matplotlib.org/_downloads/zorder_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/2dcollections3d.md b/Python/matplotlab/gallery/mplot3d/2dcollections3d.md new file mode 100644 index 00000000..6a36876e --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/2dcollections3d.md @@ -0,0 +1,56 @@ +# 在3D绘图上绘制2D数据 + +演示使用ax.plot的zdir关键字在3D绘图的选择轴上绘制2D数据。 + +![在3D绘图上绘制2D数据示例](https://matplotlib.org/_images/sphx_glr_2dcollections3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import numpy as np +import matplotlib.pyplot as plt + +fig = plt.figure() +ax = fig.gca(projection='3d') + +# Plot a sin curve using the x and y axes. +x = np.linspace(0, 1, 100) +y = np.sin(x * 2 * np.pi) / 2 + 0.5 +ax.plot(x, y, zs=0, zdir='z', label='curve in (x,y)') + +# Plot scatterplot data (20 2D points per colour) on the x and z axes. +colors = ('r', 'g', 'b', 'k') + +# Fixing random state for reproducibility +np.random.seed(19680801) + +x = np.random.sample(20 * len(colors)) +y = np.random.sample(20 * len(colors)) +c_list = [] +for c in colors: + c_list.extend([c] * 20) +# By using zdir='y', the y value of these points is fixed to the zs value 0 +# and the (x,y) points are plotted on the x and z axes. +ax.scatter(x, y, zs=0, zdir='y', c=c_list, label='points in (x,z)') + +# Make legend, set axes limits and labels +ax.legend() +ax.set_xlim(0, 1) +ax.set_ylim(0, 1) +ax.set_zlim(0, 1) +ax.set_xlabel('X') +ax.set_ylabel('Y') +ax.set_zlabel('Z') + +# Customize the view angle so it's easier to see that the scatter points lie +# on the plane y=0 +ax.view_init(elev=20., azim=-35) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: 2dcollections3d.py](https://matplotlib.org/_downloads/2dcollections3d.py) +- [下载Jupyter notebook: 2dcollections3d.ipynb](https://matplotlib.org/_downloads/2dcollections3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/3d_bars.md b/Python/matplotlab/gallery/mplot3d/3d_bars.md new file mode 100644 index 00000000..a24bd643 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/3d_bars.md @@ -0,0 +1,41 @@ +# 3D条形图演示 + +有关如何使用和不使用着色绘制3D条形图的基本演示。 + +![3D条形图演示](https://matplotlib.org/_images/sphx_glr_3d_bars_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +# setup the figure and axes +fig = plt.figure(figsize=(8, 3)) +ax1 = fig.add_subplot(121, projection='3d') +ax2 = fig.add_subplot(122, projection='3d') + +# fake data +_x = np.arange(4) +_y = np.arange(5) +_xx, _yy = np.meshgrid(_x, _y) +x, y = _xx.ravel(), _yy.ravel() + +top = x + y +bottom = np.zeros_like(top) +width = depth = 1 + +ax1.bar3d(x, y, bottom, width, depth, top, shade=True) +ax1.set_title('Shaded') + +ax2.bar3d(x, y, bottom, width, depth, top, shade=False) +ax2.set_title('Not Shaded') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: 3d_bars.py](https://matplotlib.org/_downloads/3d_bars.py) +- [下载Jupyter notebook: 3d_bars.ipynb](https://matplotlib.org/_downloads/3d_bars.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/bars3d.md b/Python/matplotlab/gallery/mplot3d/bars3d.md new file mode 100644 index 00000000..7b8bc765 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/bars3d.md @@ -0,0 +1,49 @@ +# 在不同的平面中创建二维条形图 + +演示制作3D绘图,其中2D条形图投影到平面y = 0,y = 1等。 + +![在不同的平面中创建二维条形图示例](https://matplotlib.org/_images/sphx_glr_bars3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +colors = ['r', 'g', 'b', 'y'] +yticks = [3, 2, 1, 0] +for c, k in zip(colors, yticks): + # Generate the random data for the y=k 'layer'. + xs = np.arange(20) + ys = np.random.rand(20) + + # You can provide either a single color or an array with the same length as + # xs and ys. To demonstrate this, we color the first bar of each set cyan. + cs = [c] * len(xs) + cs[0] = 'c' + + # Plot the bar graph given by xs and ys on the plane y=k with 80% opacity. + ax.bar(xs, ys, zs=k, zdir='y', color=cs, alpha=0.8) + +ax.set_xlabel('X') +ax.set_ylabel('Y') +ax.set_zlabel('Z') + +# On the y axis let's only label the discrete values that we have data for. +ax.set_yticks(yticks) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bars3d.py](https://matplotlib.org/_downloads/bars3d.py) +- [下载Jupyter notebook: bars3d.ipynb](https://matplotlib.org/_downloads/bars3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/contour3d.md b/Python/matplotlab/gallery/mplot3d/contour3d.md new file mode 100644 index 00000000..0ed4552d --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/contour3d.md @@ -0,0 +1,27 @@ +# 演示在3D中绘制轮廓(水平)曲线 + +这类似于2D中的等高线图,除了f(x,y)= c曲线绘制在平面z = c上。 + +![演示在3D中绘制轮廓(水平)曲线示例](https://matplotlib.org/_images/sphx_glr_contour3d_001.png) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt +from matplotlib import cm + +fig = plt.figure() +ax = fig.gca(projection='3d') +X, Y, Z = axes3d.get_test_data(0.05) + +# Plot contour curves +cset = ax.contour(X, Y, Z, cmap=cm.coolwarm) + +ax.clabel(cset, fontsize=9, inline=1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: contour3d.py](https://matplotlib.org/_downloads/contour3d.py) +- [下载Jupyter notebook: contour3d.ipynb](https://matplotlib.org/_downloads/contour3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/contour3d_2.md b/Python/matplotlab/gallery/mplot3d/contour3d_2.md new file mode 100644 index 00000000..dafca3b0 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/contour3d_2.md @@ -0,0 +1,26 @@ +# 演示使用extend3d选项在3D中绘制轮廓(水平)曲线 + +contour3d_demo示例的这种修改使用extend3d = True将曲线垂直扩展为“ribbon”。 + +![演示使用extend3d选项在3D中绘制轮廓(水平)曲线示例](https://matplotlib.org/_images/sphx_glr_contour3d_2_001.png) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt +from matplotlib import cm + +fig = plt.figure() +ax = fig.gca(projection='3d') +X, Y, Z = axes3d.get_test_data(0.05) + +cset = ax.contour(X, Y, Z, extend3d=True, cmap=cm.coolwarm) + +ax.clabel(cset, fontsize=9, inline=1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: contour3d_2.py](https://matplotlib.org/_downloads/contour3d_2.py) +- [下载Jupyter notebook: contour3d_2.ipynb](https://matplotlib.org/_downloads/contour3d_2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/contour3d_3.md b/Python/matplotlab/gallery/mplot3d/contour3d_3.md new file mode 100644 index 00000000..ea9f4c81 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/contour3d_3.md @@ -0,0 +1,42 @@ +# 将轮廓轮廓投影到图形上 + +演示显示3D表面,同时还将轮廓“轮廓”投影到图形的“墙壁”上。 + +有关填充版本,请参见contourf3d_demo2。 + +![将轮廓轮廓投影到图形上](https://matplotlib.org/_images/sphx_glr_contour3d_3_001.png) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt +from matplotlib import cm + +fig = plt.figure() +ax = fig.gca(projection='3d') +X, Y, Z = axes3d.get_test_data(0.05) + +# Plot the 3D surface +ax.plot_surface(X, Y, Z, rstride=8, cstride=8, alpha=0.3) + +# Plot projections of the contours for each dimension. By choosing offsets +# that match the appropriate axes limits, the projected contours will sit on +# the 'walls' of the graph +cset = ax.contour(X, Y, Z, zdir='z', offset=-100, cmap=cm.coolwarm) +cset = ax.contour(X, Y, Z, zdir='x', offset=-40, cmap=cm.coolwarm) +cset = ax.contour(X, Y, Z, zdir='y', offset=40, cmap=cm.coolwarm) + +ax.set_xlim(-40, 40) +ax.set_ylim(-40, 40) +ax.set_zlim(-100, 100) + +ax.set_xlabel('X') +ax.set_ylabel('Y') +ax.set_zlabel('Z') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: contour3d_3.py](https://matplotlib.org/_downloads/contour3d_3.py) +- [下载Jupyter notebook: contour3d_3.ipynb](https://matplotlib.org/_downloads/contour3d_3.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/contourf3d.md b/Python/matplotlab/gallery/mplot3d/contourf3d.md new file mode 100644 index 00000000..4fdc28bd --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/contourf3d.md @@ -0,0 +1,28 @@ +# 填充轮廓 + +contourf与轮廓的不同之处在于它创建了填充轮廓,即。 离散数量的颜色用于遮蔽域。 + +这类似于2D中的等高线图,除了对应于等级c的阴影区域在平面z = c上绘制图形。 + +![填充轮廓示例](https://matplotlib.org/_images/sphx_glr_contourf3d_001.png) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt +from matplotlib import cm + +fig = plt.figure() +ax = fig.gca(projection='3d') +X, Y, Z = axes3d.get_test_data(0.05) + +cset = ax.contourf(X, Y, Z, cmap=cm.coolwarm) + +ax.clabel(cset, fontsize=9, inline=1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: contourf3d.py](https://matplotlib.org/_downloads/contourf3d.py) +- [下载Jupyter notebook: contourf3d.ipynb](https://matplotlib.org/_downloads/contourf3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/contourf3d_2.md b/Python/matplotlab/gallery/mplot3d/contourf3d_2.md new file mode 100644 index 00000000..cf7d9a19 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/contourf3d_2.md @@ -0,0 +1,42 @@ +# 将填充轮廓投影到图形上 + +演示显示3D表面,同时还将填充的轮廓“轮廓”投影到图形的“墙壁”上。 + +有关未填充的版本,请参见contour3d_demo2。 + +![将填充轮廓投影到图形上](https://matplotlib.org/_images/sphx_glr_contourf3d_2_0011.png) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt +from matplotlib import cm + +fig = plt.figure() +ax = fig.gca(projection='3d') +X, Y, Z = axes3d.get_test_data(0.05) + +# Plot the 3D surface +ax.plot_surface(X, Y, Z, rstride=8, cstride=8, alpha=0.3) + +# Plot projections of the contours for each dimension. By choosing offsets +# that match the appropriate axes limits, the projected contours will sit on +# the 'walls' of the graph +cset = ax.contourf(X, Y, Z, zdir='z', offset=-100, cmap=cm.coolwarm) +cset = ax.contourf(X, Y, Z, zdir='x', offset=-40, cmap=cm.coolwarm) +cset = ax.contourf(X, Y, Z, zdir='y', offset=40, cmap=cm.coolwarm) + +ax.set_xlim(-40, 40) +ax.set_ylim(-40, 40) +ax.set_zlim(-100, 100) + +ax.set_xlabel('X') +ax.set_ylabel('Y') +ax.set_zlabel('Z') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: contourf3d_2.py](https://matplotlib.org/_downloads/contourf3d_2.py) +- [下载Jupyter notebook: contourf3d_2.ipynb](https://matplotlib.org/_downloads/contourf3d_2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/custom_shaded_3d_surface.md b/Python/matplotlab/gallery/mplot3d/custom_shaded_3d_surface.md new file mode 100644 index 00000000..3700d851 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/custom_shaded_3d_surface.md @@ -0,0 +1,45 @@ +# 3D表面图中的自定义山体的阴影 + +演示在3D曲面图中使用自定义山体阴影。 + +![3D表面图中的自定义山体的阴影示例](https://matplotlib.org/_images/sphx_glr_custom_shaded_3d_surface_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +from matplotlib import cbook +from matplotlib import cm +from matplotlib.colors import LightSource +import matplotlib.pyplot as plt +import numpy as np + +# Load and format data +filename = cbook.get_sample_data('jacksboro_fault_dem.npz', asfileobj=False) +with np.load(filename) as dem: + z = dem['elevation'] + nrows, ncols = z.shape + x = np.linspace(dem['xmin'], dem['xmax'], ncols) + y = np.linspace(dem['ymin'], dem['ymax'], nrows) + x, y = np.meshgrid(x, y) + +region = np.s_[5:50, 5:50] +x, y, z = x[region], y[region], z[region] + +# Set up plot +fig, ax = plt.subplots(subplot_kw=dict(projection='3d')) + +ls = LightSource(270, 45) +# To use a custom hillshading mode, override the built-in shading and pass +# in the rgb colors of the shaded surface calculated from "shade". +rgb = ls.shade(z, cmap=cm.gist_earth, vert_exag=0.1, blend_mode='soft') +surf = ax.plot_surface(x, y, z, rstride=1, cstride=1, facecolors=rgb, + linewidth=0, antialiased=False, shade=False) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: custom_shaded_3d_surface.py](https://matplotlib.org/_downloads/custom_shaded_3d_surface.py) +- [下载Jupyter notebook: custom_shaded_3d_surface.ipynb](https://matplotlib.org/_downloads/custom_shaded_3d_surface.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/hist3d.md b/Python/matplotlab/gallery/mplot3d/hist3d.md new file mode 100644 index 00000000..c8a33702 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/hist3d.md @@ -0,0 +1,41 @@ +# 创建2D数据的3D直方图 + +将二维数据的直方图演示为3D中的条形图。 + +![创建2D数据的3D直方图示例](https://matplotlib.org/_images/sphx_glr_hist3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') +x, y = np.random.rand(2, 100) * 4 +hist, xedges, yedges = np.histogram2d(x, y, bins=4, range=[[0, 4], [0, 4]]) + +# Construct arrays for the anchor positions of the 16 bars. +xpos, ypos = np.meshgrid(xedges[:-1] + 0.25, yedges[:-1] + 0.25, indexing="ij") +xpos = xpos.ravel() +ypos = ypos.ravel() +zpos = 0 + +# Construct arrays with the dimensions for the 16 bars. +dx = dy = 0.5 * np.ones_like(zpos) +dz = hist.ravel() + +ax.bar3d(xpos, ypos, zpos, dx, dy, dz, color='b', zsort='average') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: hist3d.py](https://matplotlib.org/_downloads/hist3d.py) +- [下载Jupyter notebook: hist3d.ipynb](https://matplotlib.org/_downloads/hist3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/lines3d.md b/Python/matplotlab/gallery/mplot3d/lines3d.md new file mode 100644 index 00000000..0ce250ff --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/lines3d.md @@ -0,0 +1,36 @@ +# 参数曲线 + +此示例演示了如何在3D中绘制参数曲线。 + +![参数曲线示例](https://matplotlib.org/_images/sphx_glr_lines3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import numpy as np +import matplotlib.pyplot as plt + + +plt.rcParams['legend.fontsize'] = 10 + +fig = plt.figure() +ax = fig.gca(projection='3d') + +# Prepare arrays x, y, z +theta = np.linspace(-4 * np.pi, 4 * np.pi, 100) +z = np.linspace(-2, 2, 100) +r = z**2 + 1 +x = r * np.sin(theta) +y = r * np.cos(theta) + +ax.plot(x, y, z, label='parametric curve') +ax.legend() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: lines3d.py](https://matplotlib.org/_downloads/lines3d.py) +- [下载Jupyter notebook: lines3d.ipynb](https://matplotlib.org/_downloads/lines3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/lorenz_attractor.md b/Python/matplotlab/gallery/mplot3d/lorenz_attractor.md new file mode 100644 index 00000000..8234547d --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/lorenz_attractor.md @@ -0,0 +1,67 @@ +# 洛伦兹吸引力 + +这是使用mplot3d在3维空间中绘制Edward Lorenz 1963年的[“确定性非周期流”](http://journals.ametsoc.org/doi/abs/10.1175/1520-0469%281963%29020%3C0130%3ADNF%3E2.0.CO%3B2)的示例。 + +注意:因为这是一个简单的非线性ODE,使用SciPy的ode求解器会更容易完成,但这种方法仅取决于NumPy。 + +![洛伦兹吸引力示例](https://matplotlib.org/_images/sphx_glr_lorenz_attractor_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +def lorenz(x, y, z, s=10, r=28, b=2.667): + ''' + Given: + x, y, z: a point of interest in three dimensional space + s, r, b: parameters defining the lorenz attractor + Returns: + x_dot, y_dot, z_dot: values of the lorenz attractor's partial + derivatives at the point x, y, z + ''' + x_dot = s*(y - x) + y_dot = r*x - y - x*z + z_dot = x*y - b*z + return x_dot, y_dot, z_dot + + +dt = 0.01 +num_steps = 10000 + +# Need one more for the initial values +xs = np.empty((num_steps + 1,)) +ys = np.empty((num_steps + 1,)) +zs = np.empty((num_steps + 1,)) + +# Set initial values +xs[0], ys[0], zs[0] = (0., 1., 1.05) + +# Step through "time", calculating the partial derivatives at the current point +# and using them to estimate the next point +for i in range(num_steps): + x_dot, y_dot, z_dot = lorenz(xs[i], ys[i], zs[i]) + xs[i + 1] = xs[i] + (x_dot * dt) + ys[i + 1] = ys[i] + (y_dot * dt) + zs[i + 1] = zs[i] + (z_dot * dt) + + +# Plot +fig = plt.figure() +ax = fig.gca(projection='3d') + +ax.plot(xs, ys, zs, lw=0.5) +ax.set_xlabel("X Axis") +ax.set_ylabel("Y Axis") +ax.set_zlabel("Z Axis") +ax.set_title("Lorenz Attractor") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: lorenz_attractor.py](https://matplotlib.org/_downloads/lorenz_attractor.py) +- [下载Jupyter notebook: lorenz_attractor.ipynb](https://matplotlib.org/_downloads/lorenz_attractor.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/mixed_subplots.md b/Python/matplotlab/gallery/mplot3d/mixed_subplots.md new file mode 100644 index 00000000..0275b411 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/mixed_subplots.md @@ -0,0 +1,56 @@ +# 相同图中的2D和3D轴 + +此示例显示如何在同一图上绘制2D和3D绘图。 + +![相同图中的2D和3D轴示例](https://matplotlib.org/_images/sphx_glr_mixed_subplots_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + + +def f(t): + s1 = np.cos(2*np.pi*t) + e1 = np.exp(-t) + return np.multiply(s1, e1) + + +# Set up a figure twice as tall as it is wide +fig = plt.figure(figsize=plt.figaspect(2.)) +fig.suptitle('A tale of 2 subplots') + +# First subplot +ax = fig.add_subplot(2, 1, 1) + +t1 = np.arange(0.0, 5.0, 0.1) +t2 = np.arange(0.0, 5.0, 0.02) +t3 = np.arange(0.0, 2.0, 0.01) + +ax.plot(t1, f(t1), 'bo', + t2, f(t2), 'k--', markerfacecolor='green') +ax.grid(True) +ax.set_ylabel('Damped oscillation') + +# Second subplot +ax = fig.add_subplot(2, 1, 2, projection='3d') + +X = np.arange(-5, 5, 0.25) +Y = np.arange(-5, 5, 0.25) +X, Y = np.meshgrid(X, Y) +R = np.sqrt(X**2 + Y**2) +Z = np.sin(R) + +surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, + linewidth=0, antialiased=False) +ax.set_zlim(-1, 1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: mixed_subplots.py](https://matplotlib.org/_downloads/mixed_subplots.py) +- [下载Jupyter notebook: mixed_subplots.ipynb](https://matplotlib.org/_downloads/mixed_subplots.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/offset.md b/Python/matplotlab/gallery/mplot3d/offset.md new file mode 100644 index 00000000..77ba3d96 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/offset.md @@ -0,0 +1,77 @@ +# 自动文本偏移 + +演示如何使用pathpatch_2d_to_3d在3D绘图上“绘制”形状和文本。 + +![自动文本偏移示例](https://matplotlib.org/_images/sphx_glr_pathpatch3d_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.patches import Circle, PathPatch +from matplotlib.text import TextPath +from matplotlib.transforms import Affine2D +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import +import mpl_toolkits.mplot3d.art3d as art3d + + +def text3d(ax, xyz, s, zdir="z", size=None, angle=0, usetex=False, **kwargs): + ''' + Plots the string 's' on the axes 'ax', with position 'xyz', size 'size', + and rotation angle 'angle'. 'zdir' gives the axis which is to be treated + as the third dimension. usetex is a boolean indicating whether the string + should be interpreted as latex or not. Any additional keyword arguments + are passed on to transform_path. + + Note: zdir affects the interpretation of xyz. + ''' + x, y, z = xyz + if zdir == "y": + xy1, z1 = (x, z), y + elif zdir == "x": + xy1, z1 = (y, z), x + else: + xy1, z1 = (x, y), z + + text_path = TextPath((0, 0), s, size=size, usetex=usetex) + trans = Affine2D().rotate(angle).translate(xy1[0], xy1[1]) + + p1 = PathPatch(trans.transform_path(text_path), **kwargs) + ax.add_patch(p1) + art3d.pathpatch_2d_to_3d(p1, z=z1, zdir=zdir) + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +# Draw a circle on the x=0 'wall' +p = Circle((5, 5), 3) +ax.add_patch(p) +art3d.pathpatch_2d_to_3d(p, z=0, zdir="x") + +# Manually label the axes +text3d(ax, (4, -2, 0), "X-axis", zdir="z", size=.5, usetex=False, + ec="none", fc="k") +text3d(ax, (12, 4, 0), "Y-axis", zdir="z", size=.5, usetex=False, + angle=np.pi / 2, ec="none", fc="k") +text3d(ax, (12, 10, 4), "Z-axis", zdir="y", size=.5, usetex=False, + angle=np.pi / 2, ec="none", fc="k") + +# Write a Latex formula on the z=0 'floor' +text3d(ax, (1, 5, 0), + r"$\displaystyle G_{\mu\nu} + \Lambda g_{\mu\nu} = " + r"\frac{8\pi G}{c^4} T_{\mu\nu} $", + zdir="z", size=1, usetex=True, + ec="none", fc="k") + +ax.set_xlim(0, 10) +ax.set_ylim(0, 10) +ax.set_zlim(0, 10) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: pathpatch3d.py](https://matplotlib.org/_downloads/pathpatch3d.py) +- [下载Jupyter notebook: pathpatch3d.ipynb](https://matplotlib.org/_downloads/pathpatch3d.ipynb) diff --git a/Python/matplotlab/gallery/mplot3d/pathpatch3d.md b/Python/matplotlab/gallery/mplot3d/pathpatch3d.md new file mode 100644 index 00000000..09943123 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/pathpatch3d.md @@ -0,0 +1,78 @@ +# 在3D绘图中绘制平面对象 + +演示如何使用pathpatch_2d_to_3d在3D绘图上“绘制”形状和文本。 + +![在3D绘图中绘制平面对象示例](https://matplotlib.org/_images/sphx_glr_pathpatch3d_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.patches import Circle, PathPatch +from matplotlib.text import TextPath +from matplotlib.transforms import Affine2D +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import +import mpl_toolkits.mplot3d.art3d as art3d + + +def text3d(ax, xyz, s, zdir="z", size=None, angle=0, usetex=False, **kwargs): + ''' + Plots the string 's' on the axes 'ax', with position 'xyz', size 'size', + and rotation angle 'angle'. 'zdir' gives the axis which is to be treated + as the third dimension. usetex is a boolean indicating whether the string + should be interpreted as latex or not. Any additional keyword arguments + are passed on to transform_path. + + Note: zdir affects the interpretation of xyz. + ''' + x, y, z = xyz + if zdir == "y": + xy1, z1 = (x, z), y + elif zdir == "x": + xy1, z1 = (y, z), x + else: + xy1, z1 = (x, y), z + + text_path = TextPath((0, 0), s, size=size, usetex=usetex) + trans = Affine2D().rotate(angle).translate(xy1[0], xy1[1]) + + p1 = PathPatch(trans.transform_path(text_path), **kwargs) + ax.add_patch(p1) + art3d.pathpatch_2d_to_3d(p1, z=z1, zdir=zdir) + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +# Draw a circle on the x=0 'wall' +p = Circle((5, 5), 3) +ax.add_patch(p) +art3d.pathpatch_2d_to_3d(p, z=0, zdir="x") + +# Manually label the axes +text3d(ax, (4, -2, 0), "X-axis", zdir="z", size=.5, usetex=False, + ec="none", fc="k") +text3d(ax, (12, 4, 0), "Y-axis", zdir="z", size=.5, usetex=False, + angle=np.pi / 2, ec="none", fc="k") +text3d(ax, (12, 10, 4), "Z-axis", zdir="y", size=.5, usetex=False, + angle=np.pi / 2, ec="none", fc="k") + +# Write a Latex formula on the z=0 'floor' +text3d(ax, (1, 5, 0), + r"$\displaystyle G_{\mu\nu} + \Lambda g_{\mu\nu} = " + r"\frac{8\pi G}{c^4} T_{\mu\nu} $", + zdir="z", size=1, usetex=True, + ec="none", fc="k") + +ax.set_xlim(0, 10) +ax.set_ylim(0, 10) +ax.set_zlim(0, 10) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: pathpatch3d.py](https://matplotlib.org/_downloads/pathpatch3d.py) +- [下载Jupyter notebook: pathpatch3d.ipynb](https://matplotlib.org/_downloads/pathpatch3d.ipynb) + diff --git a/Python/matplotlab/gallery/mplot3d/polys3d.md b/Python/matplotlab/gallery/mplot3d/polys3d.md new file mode 100644 index 00000000..ce687734 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/polys3d.md @@ -0,0 +1,67 @@ +# 生成多边形以填充3D线图 + +演示如何创建填充线图下空间的多边形。 在这个例子中,多边形是半透明的,产生一种“锯齿状的彩色玻璃”效果。 + +![生成多边形以填充3D线图示例](https://matplotlib.org/_images/sphx_glr_polys3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +from matplotlib.collections import PolyCollection +import matplotlib.pyplot as plt +from matplotlib import colors as mcolors +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +def cc(arg): + ''' + Shorthand to convert 'named' colors to rgba format at 60% opacity. + ''' + return mcolors.to_rgba(arg, alpha=0.6) + + +def polygon_under_graph(xlist, ylist): + ''' + Construct the vertex list which defines the polygon filling the space under + the (xlist, ylist) line graph. Assumes the xs are in ascending order. + ''' + return [(xlist[0], 0.), *zip(xlist, ylist), (xlist[-1], 0.)] + + +fig = plt.figure() +ax = fig.gca(projection='3d') + +# Make verts a list, verts[i] will be a list of (x,y) pairs defining polygon i +verts = [] + +# Set up the x sequence +xs = np.linspace(0., 10., 26) + +# The ith polygon will appear on the plane y = zs[i] +zs = range(4) + +for i in zs: + ys = np.random.rand(len(xs)) + verts.append(polygon_under_graph(xs, ys)) + +poly = PolyCollection(verts, facecolors=[cc('r'), cc('g'), cc('b'), cc('y')]) +ax.add_collection3d(poly, zs=zs, zdir='y') + +ax.set_xlabel('X') +ax.set_ylabel('Y') +ax.set_zlabel('Z') +ax.set_xlim(0, 10) +ax.set_ylim(-1, 4) +ax.set_zlim(0, 1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: polys3d.py](https://matplotlib.org/_downloads/polys3d.py) +- [下载Jupyter notebook: polys3d.ipynb](https://matplotlib.org/_downloads/polys3d.ipynb) diff --git a/Python/matplotlab/gallery/mplot3d/quiver3d.md b/Python/matplotlab/gallery/mplot3d/quiver3d.md new file mode 100644 index 00000000..044f3b66 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/quiver3d.md @@ -0,0 +1,36 @@ +# 3D箭袋图像 + +演示在3d网格上的点处绘制方向箭头。 + +![3D箭袋图像示例](https://matplotlib.org/_images/sphx_glr_quiver3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + +fig = plt.figure() +ax = fig.gca(projection='3d') + +# Make the grid +x, y, z = np.meshgrid(np.arange(-0.8, 1, 0.2), + np.arange(-0.8, 1, 0.2), + np.arange(-0.8, 1, 0.8)) + +# Make the direction data for the arrows +u = np.sin(np.pi * x) * np.cos(np.pi * y) * np.cos(np.pi * z) +v = -np.cos(np.pi * x) * np.sin(np.pi * y) * np.cos(np.pi * z) +w = (np.sqrt(2.0 / 3.0) * np.cos(np.pi * x) * np.cos(np.pi * y) * + np.sin(np.pi * z)) + +ax.quiver(x, y, z, u, v, w, length=0.1, normalize=True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: quiver3d.py](https://matplotlib.org/_downloads/quiver3d.py) +- [下载Jupyter notebook: quiver3d.ipynb](https://matplotlib.org/_downloads/quiver3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/rotate_axes3d_sgskip.md b/Python/matplotlab/gallery/mplot3d/rotate_axes3d_sgskip.md new file mode 100644 index 00000000..b6c87893 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/rotate_axes3d_sgskip.md @@ -0,0 +1,30 @@ +# 旋转3D绘图 + +一个非常简单的旋转3D绘图动画。 + +有关动画3D绘图的另一个简单示例,请参阅wire3d_animation_demo。 + +(构建文档库时会跳过此示例,因为它有意运行需要很长时间) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +# load some test data for demonstration and plot a wireframe +X, Y, Z = axes3d.get_test_data(0.1) +ax.plot_wireframe(X, Y, Z, rstride=5, cstride=5) + +# rotate the axes and update +for angle in range(0, 360): + ax.view_init(30, angle) + plt.draw() + plt.pause(.001) +``` + +## 下载这个示例 + +- [下载python源码: rotate_axes3d_sgskip.py](https://matplotlib.org/_downloads/rotate_axes3d_sgskip.py) +- [下载Jupyter notebook: rotate_axes3d_sgskip.ipynb](https://matplotlib.org/_downloads/rotate_axes3d_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/scatter3d.md b/Python/matplotlab/gallery/mplot3d/scatter3d.md new file mode 100644 index 00000000..62d059a6 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/scatter3d.md @@ -0,0 +1,48 @@ +# 3D散点图 + +演示3D中的基本散点图。 + +![3D散点图示例](https://matplotlib.org/_images/sphx_glr_scatter3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +def randrange(n, vmin, vmax): + ''' + Helper function to make an array of random numbers having shape (n, ) + with each number distributed Uniform(vmin, vmax). + ''' + return (vmax - vmin)*np.random.rand(n) + vmin + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +n = 100 + +# For each set of style and range settings, plot n random points in the box +# defined by x in [23, 32], y in [0, 100], z in [zlow, zhigh]. +for c, m, zlow, zhigh in [('r', 'o', -50, -25), ('b', '^', -30, -5)]: + xs = randrange(n, 23, 32) + ys = randrange(n, 0, 100) + zs = randrange(n, zlow, zhigh) + ax.scatter(xs, ys, zs, c=c, marker=m) + +ax.set_xlabel('X Label') +ax.set_ylabel('Y Label') +ax.set_zlabel('Z Label') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scatter3d.py](https://matplotlib.org/_downloads/scatter3d.py) +- [下载Jupyter notebook: scatter3d.ipynb](https://matplotlib.org/_downloads/scatter3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/subplot3d.md b/Python/matplotlab/gallery/mplot3d/subplot3d.md new file mode 100644 index 00000000..12540d1d --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/subplot3d.md @@ -0,0 +1,53 @@ +# 3D绘图作为子图 + +展示包括3D图作为子图。 + +![3D绘图作为子图示例](https://matplotlib.org/_images/sphx_glr_subplot3d_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib import cm +import numpy as np + +from mpl_toolkits.mplot3d.axes3d import get_test_data +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +# set up a figure twice as wide as it is tall +fig = plt.figure(figsize=plt.figaspect(0.5)) + +#=============== +# First subplot +#=============== +# set up the axes for the first plot +ax = fig.add_subplot(1, 2, 1, projection='3d') + +# plot a 3D surface like in the example mplot3d/surface3d_demo +X = np.arange(-5, 5, 0.25) +Y = np.arange(-5, 5, 0.25) +X, Y = np.meshgrid(X, Y) +R = np.sqrt(X**2 + Y**2) +Z = np.sin(R) +surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm, + linewidth=0, antialiased=False) +ax.set_zlim(-1.01, 1.01) +fig.colorbar(surf, shrink=0.5, aspect=10) + +#=============== +# Second subplot +#=============== +# set up the axes for the second plot +ax = fig.add_subplot(1, 2, 2, projection='3d') + +# plot a 3D wireframe like in the example mplot3d/wire3d_demo +X, Y, Z = get_test_data(0.05) +ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: subplot3d.py](https://matplotlib.org/_downloads/subplot3d.py) +- [下载Jupyter notebook: subplot3d.ipynb](https://matplotlib.org/_downloads/subplot3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/surface3d.md b/Python/matplotlab/gallery/mplot3d/surface3d.md new file mode 100644 index 00000000..99719348 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/surface3d.md @@ -0,0 +1,47 @@ +# 三维曲面(颜色贴图) + +演示绘制使用coolwarm颜色贴图着色的3D表面。 使用antialiased = False使表面变得不透明。 + +还演示了使用LinearLocator和z轴刻度标签的自定义格式。 + +![三维曲面(颜色贴图)示例](https://matplotlib.org/_images/sphx_glr_surface3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +from matplotlib import cm +from matplotlib.ticker import LinearLocator, FormatStrFormatter +import numpy as np + + +fig = plt.figure() +ax = fig.gca(projection='3d') + +# Make data. +X = np.arange(-5, 5, 0.25) +Y = np.arange(-5, 5, 0.25) +X, Y = np.meshgrid(X, Y) +R = np.sqrt(X**2 + Y**2) +Z = np.sin(R) + +# Plot the surface. +surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm, + linewidth=0, antialiased=False) + +# Customize the z axis. +ax.set_zlim(-1.01, 1.01) +ax.zaxis.set_major_locator(LinearLocator(10)) +ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f')) + +# Add a color bar which maps values to colors. +fig.colorbar(surf, shrink=0.5, aspect=5) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: surface3d.py](https://matplotlib.org/_downloads/surface3d.py) +- [下载Jupyter notebook: surface3d.ipynb](https://matplotlib.org/_downloads/surface3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/surface3d_2.md b/Python/matplotlab/gallery/mplot3d/surface3d_2.md new file mode 100644 index 00000000..0c172df7 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/surface3d_2.md @@ -0,0 +1,34 @@ +# 三维曲面(纯色) + +使用纯色演示3D表面的基本图。 + +![三维曲面(纯色)示例](https://matplotlib.org/_images/sphx_glr_surface3d_2_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +# Make data +u = np.linspace(0, 2 * np.pi, 100) +v = np.linspace(0, np.pi, 100) +x = 10 * np.outer(np.cos(u), np.sin(v)) +y = 10 * np.outer(np.sin(u), np.sin(v)) +z = 10 * np.outer(np.ones(np.size(u)), np.cos(v)) + +# Plot the surface +ax.plot_surface(x, y, z, color='b') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: surface3d_2.py](https://matplotlib.org/_downloads/surface3d_2.py) +- [下载Jupyter notebook: surface3d_2.ipynb](https://matplotlib.org/_downloads/surface3d_2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/surface3d_3.md b/Python/matplotlab/gallery/mplot3d/surface3d_3.md new file mode 100644 index 00000000..313c77f8 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/surface3d_3.md @@ -0,0 +1,49 @@ +# 三维曲面(棋盘格) + +演示绘制以棋盘图案着色的3D表面。 + +![三维曲面(棋盘格)示例](https://matplotlib.org/_images/sphx_glr_surface3d_3_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +from matplotlib.ticker import LinearLocator +import numpy as np + + +fig = plt.figure() +ax = fig.gca(projection='3d') + +# Make data. +X = np.arange(-5, 5, 0.25) +xlen = len(X) +Y = np.arange(-5, 5, 0.25) +ylen = len(Y) +X, Y = np.meshgrid(X, Y) +R = np.sqrt(X**2 + Y**2) +Z = np.sin(R) + +# Create an empty array of strings with the same shape as the meshgrid, and +# populate it with two colors in a checkerboard pattern. +colortuple = ('y', 'b') +colors = np.empty(X.shape, dtype=str) +for y in range(ylen): + for x in range(xlen): + colors[x, y] = colortuple[(x + y) % len(colortuple)] + +# Plot the surface with face colors taken from the array we made. +surf = ax.plot_surface(X, Y, Z, facecolors=colors, linewidth=0) + +# Customize the z axis. +ax.set_zlim(-1, 1) +ax.w_zaxis.set_major_locator(LinearLocator(6)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: surface3d_3.py](https://matplotlib.org/_downloads/surface3d_3.py) +- [下载Jupyter notebook: surface3d_3.ipynb](https://matplotlib.org/_downloads/surface3d_3.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/surface3d_radial.md b/Python/matplotlab/gallery/mplot3d/surface3d_radial.md new file mode 100644 index 00000000..974bd3df --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/surface3d_radial.md @@ -0,0 +1,44 @@ +# 极坐标下的三维曲面 + +演示绘制在极坐标中定义的曲面。使用YlGnBu颜色映射的反转版本。还演示了使用乳胶数学模式编写轴标签。 + +示例由Armin Moser提供。 + +![极坐标下的三维曲面示例](https://matplotlib.org/_images/sphx_glr_surface3d_radial_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +# Create the mesh in polar coordinates and compute corresponding Z. +r = np.linspace(0, 1.25, 50) +p = np.linspace(0, 2*np.pi, 50) +R, P = np.meshgrid(r, p) +Z = ((R**2 - 1)**2) + +# Express the mesh in the cartesian system. +X, Y = R*np.cos(P), R*np.sin(P) + +# Plot the surface. +ax.plot_surface(X, Y, Z, cmap=plt.cm.YlGnBu_r) + +# Tweak the limits and add latex math labels. +ax.set_zlim(0, 1) +ax.set_xlabel(r'$\phi_\mathrm{real}$') +ax.set_ylabel(r'$\phi_\mathrm{im}$') +ax.set_zlabel(r'$V(\phi)$') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: surface3d_radial.py](https://matplotlib.org/_downloads/surface3d_radial.py) +- [下载Jupyter notebook: surface3d_radial.ipynb](https://matplotlib.org/_downloads/surface3d_radial.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/text3d.md b/Python/matplotlab/gallery/mplot3d/text3d.md new file mode 100644 index 00000000..8aa64b27 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/text3d.md @@ -0,0 +1,53 @@ +# 三维中的文字注释 + +演示在3D绘图上放置文本注释。 + +显示的功能: +- 使用具有三种“zdir”值的文本函数:无,轴名称(例如'x')或方向元组(例如(1,1,0))。 +- 使用带有color关键字的文本功能。 +- 使用text2D函数将文本放在ax对象上的固定位置。 + +![三维中的文字注释示例](https://matplotlib.org/_images/sphx_glr_text3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt + + +fig = plt.figure() +ax = fig.gca(projection='3d') + +# Demo 1: zdir +zdirs = (None, 'x', 'y', 'z', (1, 1, 0), (1, 1, 1)) +xs = (1, 4, 4, 9, 4, 1) +ys = (2, 5, 8, 10, 1, 2) +zs = (10, 3, 8, 9, 1, 8) + +for zdir, x, y, z in zip(zdirs, xs, ys, zs): + label = '(%d, %d, %d), dir=%s' % (x, y, z, zdir) + ax.text(x, y, z, label, zdir) + +# Demo 2: color +ax.text(9, 0, 0, "red", color='red') + +# Demo 3: text2D +# Placement 0, 0 would be the bottom left, 1, 1 would be the top right. +ax.text2D(0.05, 0.95, "2D Text", transform=ax.transAxes) + +# Tweaking display region and labels +ax.set_xlim(0, 10) +ax.set_ylim(0, 10) +ax.set_zlim(0, 10) +ax.set_xlabel('X axis') +ax.set_ylabel('Y axis') +ax.set_zlabel('Z axis') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: text3d.py](https://matplotlib.org/_downloads/text3d.py) +- [下载Jupyter notebook: text3d.ipynb](https://matplotlib.org/_downloads/text3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/tricontour3d.md b/Python/matplotlab/gallery/mplot3d/tricontour3d.md new file mode 100644 index 00000000..e6f20ae6 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/tricontour3d.md @@ -0,0 +1,52 @@ +# 三角形三维等高线图 + +非结构化三角形网格的等高线图。 + +使用的数据与trisurf3d_demo2的第二个图中的数据相同。 tricontourf3d_demo显示此示例的填充版本。 + +![三角形三维等高线图示例](https://matplotlib.org/_images/sphx_glr_tricontour3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import matplotlib.tri as tri +import numpy as np + +n_angles = 48 +n_radii = 8 +min_radius = 0.25 + +# Create the mesh in polar coordinates and compute x, y, z. +radii = np.linspace(min_radius, 0.95, n_radii) +angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi/n_angles + +x = (radii*np.cos(angles)).flatten() +y = (radii*np.sin(angles)).flatten() +z = (np.cos(radii)*np.cos(3*angles)).flatten() + +# Create a custom triangulation. +triang = tri.Triangulation(x, y) + +# Mask off unwanted triangles. +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) + +fig = plt.figure() +ax = fig.gca(projection='3d') +ax.tricontour(triang, z, cmap=plt.cm.CMRmap) + +# Customize the view angle so it's easier to understand the plot. +ax.view_init(elev=45.) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: tricontour3d.py](https://matplotlib.org/_downloads/tricontour3d.py) +- [下载Jupyter notebook: tricontour3d.ipynb](https://matplotlib.org/_downloads/tricontour3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/tricontourf3d.md b/Python/matplotlab/gallery/mplot3d/tricontourf3d.md new file mode 100644 index 00000000..0826e6c3 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/tricontourf3d.md @@ -0,0 +1,53 @@ +# 三角形三维填充等高线图 + +非结构化三角形网格的填充等高线图。 + +使用的数据与trisurf3d_demo2的第二个图中的数据相同。 tricontour3d_demo显示此示例的未填充版本。 + +![三角形三维填充等高线图示例](https://matplotlib.org/_images/sphx_glr_tricontourf3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import matplotlib.tri as tri +import numpy as np + +# First create the x, y, z coordinates of the points. +n_angles = 48 +n_radii = 8 +min_radius = 0.25 + +# Create the mesh in polar coordinates and compute x, y, z. +radii = np.linspace(min_radius, 0.95, n_radii) +angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi/n_angles + +x = (radii*np.cos(angles)).flatten() +y = (radii*np.sin(angles)).flatten() +z = (np.cos(radii)*np.cos(3*angles)).flatten() + +# Create a custom triangulation. +triang = tri.Triangulation(x, y) + +# Mask off unwanted triangles. +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) + +fig = plt.figure() +ax = fig.gca(projection='3d') +ax.tricontourf(triang, z, cmap=plt.cm.CMRmap) + +# Customize the view angle so it's easier to understand the plot. +ax.view_init(elev=45.) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: tricontourf3d.py](https://matplotlib.org/_downloads/tricontourf3d.py) +- [下载Jupyter notebook: tricontourf3d.ipynb](https://matplotlib.org/_downloads/tricontourf3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/trisurf3d.md b/Python/matplotlab/gallery/mplot3d/trisurf3d.md new file mode 100644 index 00000000..81c12968 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/trisurf3d.md @@ -0,0 +1,42 @@ +# 三角三维曲面 + +用三角形网格绘制3D表面。 + +![三角三维曲面示例](https://matplotlib.org/_images/sphx_glr_trisurf3d_001.png) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np + + +n_radii = 8 +n_angles = 36 + +# Make radii and angles spaces (radius r=0 omitted to eliminate duplication). +radii = np.linspace(0.125, 1.0, n_radii) +angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False)[..., np.newaxis] + +# Convert polar (radii, angles) coords to cartesian (x, y) coords. +# (0, 0) is manually added at this stage, so there will be no duplicate +# points in the (x, y) plane. +x = np.append(0, (radii*np.cos(angles)).flatten()) +y = np.append(0, (radii*np.sin(angles)).flatten()) + +# Compute z to make the pringle surface. +z = np.sin(-x*y) + +fig = plt.figure() +ax = fig.gca(projection='3d') + +ax.plot_trisurf(x, y, z, linewidth=0.2, antialiased=True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: trisurf3d.py](https://matplotlib.org/_downloads/trisurf3d.py) +- [下载Jupyter notebook: trisurf3d.ipynb](https://matplotlib.org/_downloads/trisurf3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/trisurf3d_2.md b/Python/matplotlab/gallery/mplot3d/trisurf3d_2.md new file mode 100644 index 00000000..050a8e78 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/trisurf3d_2.md @@ -0,0 +1,85 @@ +# 多三角三维曲面 + +使用三角形网格绘制曲面的另外两个示例。 + +第一个演示使用plot_trisurf的三角形参数,第二个设置Triangulation对象的蒙版并将对象直接传递给plot_trisurf。 + +![多三角三维曲面示例](https://matplotlib.org/_images/sphx_glr_trisurf3d_2_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.tri as mtri + +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +fig = plt.figure(figsize=plt.figaspect(0.5)) + +#============ +# First plot +#============ + +# Make a mesh in the space of parameterisation variables u and v +u = np.linspace(0, 2.0 * np.pi, endpoint=True, num=50) +v = np.linspace(-0.5, 0.5, endpoint=True, num=10) +u, v = np.meshgrid(u, v) +u, v = u.flatten(), v.flatten() + +# This is the Mobius mapping, taking a u, v pair and returning an x, y, z +# triple +x = (1 + 0.5 * v * np.cos(u / 2.0)) * np.cos(u) +y = (1 + 0.5 * v * np.cos(u / 2.0)) * np.sin(u) +z = 0.5 * v * np.sin(u / 2.0) + +# Triangulate parameter space to determine the triangles +tri = mtri.Triangulation(u, v) + +# Plot the surface. The triangles in parameter space determine which x, y, z +# points are connected by an edge. +ax = fig.add_subplot(1, 2, 1, projection='3d') +ax.plot_trisurf(x, y, z, triangles=tri.triangles, cmap=plt.cm.Spectral) +ax.set_zlim(-1, 1) + + +#============ +# Second plot +#============ + +# Make parameter spaces radii and angles. +n_angles = 36 +n_radii = 8 +min_radius = 0.25 +radii = np.linspace(min_radius, 0.95, n_radii) + +angles = np.linspace(0, 2*np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi/n_angles + +# Map radius, angle pairs to x, y, z points. +x = (radii*np.cos(angles)).flatten() +y = (radii*np.sin(angles)).flatten() +z = (np.cos(radii)*np.cos(3*angles)).flatten() + +# Create the Triangulation; no triangles so Delaunay triangulation created. +triang = mtri.Triangulation(x, y) + +# Mask off unwanted triangles. +xmid = x[triang.triangles].mean(axis=1) +ymid = y[triang.triangles].mean(axis=1) +mask = np.where(xmid**2 + ymid**2 < min_radius**2, 1, 0) +triang.set_mask(mask) + +# Plot the surface. +ax = fig.add_subplot(1, 2, 2, projection='3d') +ax.plot_trisurf(triang, z, cmap=plt.cm.CMRmap) + + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: trisurf3d_2.py](https://matplotlib.org/_downloads/trisurf3d_2.py) +- [下载Jupyter notebook: trisurf3d_2.ipynb](https://matplotlib.org/_downloads/trisurf3d_2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/voxels.md b/Python/matplotlab/gallery/mplot3d/voxels.md new file mode 100644 index 00000000..27aa4dae --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/voxels.md @@ -0,0 +1,43 @@ +# 三维体素/体积绘制 + +演示使用ax.voxels绘制3D体积对象 + +![三维体素/体积绘制示例](https://matplotlib.org/_images/sphx_glr_voxels_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +# prepare some coordinates +x, y, z = np.indices((8, 8, 8)) + +# draw cuboids in the top left and bottom right corners, and a link between them +cube1 = (x < 3) & (y < 3) & (z < 3) +cube2 = (x >= 5) & (y >= 5) & (z >= 5) +link = abs(x - y) + abs(y - z) + abs(z - x) <= 2 + +# combine the objects into a single boolean array +voxels = cube1 | cube2 | link + +# set the colors of each object +colors = np.empty(voxels.shape, dtype=object) +colors[link] = 'red' +colors[cube1] = 'blue' +colors[cube2] = 'green' + +# and plot everything +fig = plt.figure() +ax = fig.gca(projection='3d') +ax.voxels(voxels, facecolors=colors, edgecolor='k') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: voxels.py](https://matplotlib.org/_downloads/voxels.py) +- [下载Jupyter notebook: voxels.ipynb](https://matplotlib.org/_downloads/voxels.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/voxels_numpy_logo.md b/Python/matplotlab/gallery/mplot3d/voxels_numpy_logo.md new file mode 100644 index 00000000..78b035b2 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/voxels_numpy_logo.md @@ -0,0 +1,55 @@ +# 三维体素绘制Numpy的Logo + +演示使用坐标不均匀的ax.voxels + +![三维体素绘制Numpy的Logo示例](https://matplotlib.org/_images/sphx_glr_voxels_numpy_logo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +def explode(data): + size = np.array(data.shape)*2 + data_e = np.zeros(size - 1, dtype=data.dtype) + data_e[::2, ::2, ::2] = data + return data_e + +# build up the numpy logo +n_voxels = np.zeros((4, 3, 4), dtype=bool) +n_voxels[0, 0, :] = True +n_voxels[-1, 0, :] = True +n_voxels[1, 0, 2] = True +n_voxels[2, 0, 1] = True +facecolors = np.where(n_voxels, '#FFD65DC0', '#7A88CCC0') +edgecolors = np.where(n_voxels, '#BFAB6E', '#7D84A6') +filled = np.ones(n_voxels.shape) + +# upscale the above voxel image, leaving gaps +filled_2 = explode(filled) +fcolors_2 = explode(facecolors) +ecolors_2 = explode(edgecolors) + +# Shrink the gaps +x, y, z = np.indices(np.array(filled_2.shape) + 1).astype(float) // 2 +x[0::2, :, :] += 0.05 +y[:, 0::2, :] += 0.05 +z[:, :, 0::2] += 0.05 +x[1::2, :, :] += 0.95 +y[:, 1::2, :] += 0.95 +z[:, :, 1::2] += 0.95 + +fig = plt.figure() +ax = fig.gca(projection='3d') +ax.voxels(x, y, z, filled_2, facecolors=fcolors_2, edgecolors=ecolors_2) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: voxels_numpy_logo.py](https://matplotlib.org/_downloads/voxels_numpy_logo.py) +- [下载Jupyter notebook: voxels_numpy_logo.ipynb](https://matplotlib.org/_downloads/voxels_numpy_logo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/voxels_rgb.md b/Python/matplotlab/gallery/mplot3d/voxels_rgb.md new file mode 100644 index 00000000..fbcdaa53 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/voxels_rgb.md @@ -0,0 +1,52 @@ +# 带有rgb颜色的3D体素/体积图 + +演示使用ax.voxels可视化颜色空间的各个部分 + +![带有rgb颜色的3D体素/体积图示例](https://matplotlib.org/_images/sphx_glr_voxels_rgb_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +def midpoints(x): + sl = () + for i in range(x.ndim): + x = (x[sl + np.index_exp[:-1]] + x[sl + np.index_exp[1:]]) / 2.0 + sl += np.index_exp[:] + return x + +# prepare some coordinates, and attach rgb values to each +r, g, b = np.indices((17, 17, 17)) / 16.0 +rc = midpoints(r) +gc = midpoints(g) +bc = midpoints(b) + +# define a sphere about [0.5, 0.5, 0.5] +sphere = (rc - 0.5)**2 + (gc - 0.5)**2 + (bc - 0.5)**2 < 0.5**2 + +# combine the color components +colors = np.zeros(sphere.shape + (3,)) +colors[..., 0] = rc +colors[..., 1] = gc +colors[..., 2] = bc + +# and plot everything +fig = plt.figure() +ax = fig.gca(projection='3d') +ax.voxels(r, g, b, sphere, + facecolors=colors, + edgecolors=np.clip(2*colors - 0.5, 0, 1), # brighter + linewidth=0.5) +ax.set(xlabel='r', ylabel='g', zlabel='b') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: voxels_rgb.py](https://matplotlib.org/_downloads/voxels_rgb.py) +- [下载Jupyter notebook: voxels_rgb.ipynb](https://matplotlib.org/_downloads/voxels_rgb.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/voxels_torus.md b/Python/matplotlab/gallery/mplot3d/voxels_torus.md new file mode 100644 index 00000000..3aac9fe3 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/voxels_torus.md @@ -0,0 +1,54 @@ +# 具有圆柱坐标的3D体素/体积图 + +演示使用ax.voxels的x,y,z参数。 + +![具有圆柱坐标的3D体素/体积图示例](https://matplotlib.org/_images/sphx_glr_voxels_torus_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.colors +import numpy as np + +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + + +def midpoints(x): + sl = () + for i in range(x.ndim): + x = (x[sl + np.index_exp[:-1]] + x[sl + np.index_exp[1:]]) / 2.0 + sl += np.index_exp[:] + return x + +# prepare some coordinates, and attach rgb values to each +r, theta, z = np.mgrid[0:1:11j, 0:np.pi*2:25j, -0.5:0.5:11j] +x = r*np.cos(theta) +y = r*np.sin(theta) + +rc, thetac, zc = midpoints(r), midpoints(theta), midpoints(z) + +# define a wobbly torus about [0.7, *, 0] +sphere = (rc - 0.7)**2 + (zc + 0.2*np.cos(thetac*2))**2 < 0.2**2 + +# combine the color components +hsv = np.zeros(sphere.shape + (3,)) +hsv[..., 0] = thetac / (np.pi*2) +hsv[..., 1] = rc +hsv[..., 2] = zc + 0.5 +colors = matplotlib.colors.hsv_to_rgb(hsv) + +# and plot everything +fig = plt.figure() +ax = fig.gca(projection='3d') +ax.voxels(x, y, z, sphere, + facecolors=colors, + edgecolors=np.clip(2*colors - 0.5, 0, 1), # brighter + linewidth=0.5) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: voxels_torus.py](https://matplotlib.org/_downloads/voxels_torus.py) +- [下载Jupyter notebook: voxels_torus.ipynb](https://matplotlib.org/_downloads/voxels_torus.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/wire3d.md b/Python/matplotlab/gallery/mplot3d/wire3d.md new file mode 100644 index 00000000..636db85c --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/wire3d.md @@ -0,0 +1,27 @@ +# 3D线框图 + +线框图的一个非常基本的演示。 + +![3D线框图示例](https://matplotlib.org/_images/sphx_glr_wire3d_001.png) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +# Grab some test data. +X, Y, Z = axes3d.get_test_data(0.05) + +# Plot a basic wireframe. +ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: wire3d.py](https://matplotlib.org/_downloads/wire3d.py) +- [下载Jupyter notebook: wire3d.ipynb](https://matplotlib.org/_downloads/wire3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/wire3d_animation_sgskip.md b/Python/matplotlab/gallery/mplot3d/wire3d_animation_sgskip.md new file mode 100644 index 00000000..1dd4ddb8 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/wire3d_animation_sgskip.md @@ -0,0 +1,54 @@ +# 旋转3D线框图 + +一个非常简单的3D动画“动画”。 另请参见rotate_axes3d_demo。 + +(构建文档库时会跳过此示例,因为它有意运行需要很长时间) + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +import matplotlib.pyplot as plt +import numpy as np +import time + + +def generate(X, Y, phi): + ''' + Generates Z data for the points in the X, Y meshgrid and parameter phi. + ''' + R = 1 - np.sqrt(X**2 + Y**2) + return np.cos(2 * np.pi * X + phi) * R + + +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') + +# Make the X, Y meshgrid. +xs = np.linspace(-1, 1, 50) +ys = np.linspace(-1, 1, 50) +X, Y = np.meshgrid(xs, ys) + +# Set the z axis limits so they aren't recalculated each frame. +ax.set_zlim(-1, 1) + +# Begin plotting. +wframe = None +tstart = time.time() +for phi in np.linspace(0, 180. / np.pi, 100): + # If a line collection is already remove it before drawing. + if wframe: + ax.collections.remove(wframe) + + # Plot the new wireframe and pause briefly before continuing. + Z = generate(X, Y, phi) + wframe = ax.plot_wireframe(X, Y, Z, rstride=2, cstride=2) + plt.pause(.001) + +print('Average FPS: %f' % (100 / (time.time() - tstart))) +``` + +## 下载这个示例 + +- [下载python源码: wire3d_animation_sgskip.py](https://matplotlib.org/_downloads/wire3d_animation_sgskip.py) +- [下载Jupyter notebook: wire3d_animation_sgskip.ipynb](https://matplotlib.org/_downloads/wire3d_animation_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/mplot3d/wire3d_zero_stride.md b/Python/matplotlab/gallery/mplot3d/wire3d_zero_stride.md new file mode 100644 index 00000000..0d0870f4 --- /dev/null +++ b/Python/matplotlab/gallery/mplot3d/wire3d_zero_stride.md @@ -0,0 +1,32 @@ +# 三维线框在一个方向上绘制 + +证明将rstride或cstride设置为0会导致在相应方向上不生成导线。 + +![三维线框在一个方向上绘制示例](https://matplotlib.org/_images/sphx_glr_wire3d_zero_stride_001.png) + +```python +from mpl_toolkits.mplot3d import axes3d +import matplotlib.pyplot as plt + + +fig, [ax1, ax2] = plt.subplots(2, 1, figsize=(8, 12), subplot_kw={'projection': '3d'}) + +# Get the test data +X, Y, Z = axes3d.get_test_data(0.05) + +# Give the first plot only wireframes of the type y = c +ax1.plot_wireframe(X, Y, Z, rstride=10, cstride=0) +ax1.set_title("Column (x) stride set to 0") + +# Give the second plot only wireframes of the type x = c +ax2.plot_wireframe(X, Y, Z, rstride=0, cstride=10) +ax2.set_title("Row (y) stride set to 0") + +plt.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: wire3d_zero_stride.py](https://matplotlib.org/_downloads/wire3d_zero_stride.py) +- [下载Jupyter notebook: wire3d_zero_stride.ipynb](https://matplotlib.org/_downloads/wire3d_zero_stride.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/nested_pie.md b/Python/matplotlab/gallery/pie_and_polar_charts/nested_pie.md new file mode 100644 index 00000000..2318d38d --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/nested_pie.md @@ -0,0 +1,89 @@ +# 嵌套饼图 + +以下示例显示了在Matplotlib中构建嵌套饼图的两种方法。 这些图表通常被称为空心饼图图表。 + +```python +import matplotlib.pyplot as plt +import numpy as np +``` + +构建饼图最简单的方法是使用饼图方法[(pie method)](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pie.html#matplotlib.axes.Axes.pie)。 + +在这种情况下,pie获取与组中的计数相对应的值。我们将首先生成一些假数据,对应三组。在内圈中,我们将每个数字视为属于自己的组。 在外圈,我们将它们绘制为原始3组的成员。 + +空心饼图形状的效果是通过``wedgeprops``参数设置馅饼楔形的``宽度``来实现的。 + +```python +fig, ax = plt.subplots() + +size = 0.3 +vals = np.array([[60., 32.], [37., 40.], [29., 10.]]) + +cmap = plt.get_cmap("tab20c") +outer_colors = cmap(np.arange(3)*4) +inner_colors = cmap(np.array([1, 2, 5, 6, 9, 10])) + +ax.pie(vals.sum(axis=1), radius=1, colors=outer_colors, + wedgeprops=dict(width=size, edgecolor='w')) + +ax.pie(vals.flatten(), radius=1-size, colors=inner_colors, + wedgeprops=dict(width=size, edgecolor='w')) + +ax.set(aspect="equal", title='Pie plot with `ax.pie`') +plt.show() +``` + +![嵌套饼图示例](https://matplotlib.org/_images/sphx_glr_nested_pie_001.png) + +但是,您可以通过在具有极坐标系的轴上使用条形图来完成相同的输出。 这可以为绘图的精确设计提供更大的灵活性。 + +在这种情况下,我们需要将条形图的x值映射到圆的弧度。这些值的累积和用作条的边。 + +```python +fig, ax = plt.subplots(subplot_kw=dict(polar=True)) + +size = 0.3 +vals = np.array([[60., 32.], [37., 40.], [29., 10.]]) +#normalize vals to 2 pi +valsnorm = vals/np.sum(vals)*2*np.pi +#obtain the ordinates of the bar edges +valsleft = np.cumsum(np.append(0, valsnorm.flatten()[:-1])).reshape(vals.shape) + +cmap = plt.get_cmap("tab20c") +outer_colors = cmap(np.arange(3)*4) +inner_colors = cmap(np.array([1, 2, 5, 6, 9, 10])) + +ax.bar(x=valsleft[:, 0], + width=valsnorm.sum(axis=1), bottom=1-size, height=size, + color=outer_colors, edgecolor='w', linewidth=1, align="edge") + +ax.bar(x=valsleft.flatten(), + width=valsnorm.flatten(), bottom=1-2*size, height=size, + color=inner_colors, edgecolor='w', linewidth=1, align="edge") + +ax.set(title="Pie plot with `ax.bar` and polar coordinates") +ax.set_axis_off() +plt.show() +``` + +![嵌套饼图示例2](https://matplotlib.org/_images/sphx_glr_nested_pie_002.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.pie +matplotlib.pyplot.pie +matplotlib.axes.Axes.bar +matplotlib.pyplot.bar +matplotlib.projections.polar +matplotlib.axes.Axes.set +matplotlib.axes.Axes.set_axis_off +``` + +## 下载这个示例 + +- [下载python源码: nested_pie.py](https://matplotlib.org/_downloads/nested_pie.py) +- [下载Jupyter notebook: nested_pie.ipynb](https://matplotlib.org/_downloads/nested_pie.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/pie_and_donut_labels.md b/Python/matplotlab/gallery/pie_and_polar_charts/pie_and_donut_labels.md new file mode 100644 index 00000000..76a01d48 --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/pie_and_donut_labels.md @@ -0,0 +1,112 @@ +# 标记饼图和空心饼图 + +欢迎来到Matplotlib面包店。我们将通过 pie 方法 [(pie method)](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pie.html#matplotlib.axes.Axes.pie) 创建一个饼图和一个空心饼图表,并展示如何使用[图例](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend)和[注释](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.annotate.html#matplotlib.axes.Axes.annotate)来标记它们。 + +与往常一样,我们将从定义导入开始,并创建一个带有子图的图形。现在是吃派的时候了。从饼图开始,我们从数据和标签列表中创建数据。 + +我们可以为``autopct``参数提供一个函数,它将通过显示绝对值来扩展自动百分比标记;我们从相对数据和已知的所有值之和计算出后者。 + +然后,我们创建饼图并存储返回的对象以供日后使用。返回元组的第一个返回元素是楔形列表。这些是[matplotlib.patches.WEdge](https://matplotlib.org/api/_as_gen/matplotlib.patches.Wedge.html#matplotlib.patches.Wedge) 补丁,可以直接用作图例的句柄。我们可以使用图例的bbox_to_anchor参数将图例放在饼图之外。这里我们使用轴坐标(1,0,0.5,1)和“中间左”的位置;即图例的左中心点位于边界框的左中心点,在轴坐标中从(1,0)到(1.5,1)。 + +然后我们创建馅饼并存储返回的对象,以备以后使用。返回的元组的第一个返回元素是楔体列表。这些是[matplotlib.patches.WEdge](https://matplotlib.org/api/_as_gen/matplotlib.patches.Wedge.html#matplotlib.patches.Wedge) 面片,可以直接用作图例的句柄。我们可以使用图例的 ``bbox_to_anchor`` 参数将图例放置在饼图外部。这里我们使用坐标轴(1, 0, 0.5, 1) 和位置 ``中心左(center left)``,即图例的左中心点将位于边界框的左中心点,坐标轴从(1, 0)到(1.5, 1)。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(aspect="equal")) + +recipe = ["375 g flour", + "75 g sugar", + "250 g butter", + "300 g berries"] + +data = [float(x.split()[0]) for x in recipe] +ingredients = [x.split()[-1] for x in recipe] + + +def func(pct, allvals): + absolute = int(pct/100.*np.sum(allvals)) + return "{:.1f}%\n({:d} g)".format(pct, absolute) + + +wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data), + textprops=dict(color="w")) + +ax.legend(wedges, ingredients, + title="Ingredients", + loc="center left", + bbox_to_anchor=(1, 0, 0.5, 1)) + +plt.setp(autotexts, size=8, weight="bold") + +ax.set_title("Matplotlib bakery: A pie") + +plt.show() +``` + +![标记饼图和空心饼图示例](https://matplotlib.org/_images/sphx_glr_pie_and_donut_labels_001.png) + +现在是空心饼图(甜甜圈)。从空心饼图(甜甜圈)开始,我们将数据转录为数字(将1个鸡蛋转换为50克),并直接绘制馅饼。馅饼?等等......这将是甜甜圈,不是吗? 好吧,正如我们在这里看到的,甜甜圈是一个馅饼,有一定的宽度设置到楔形,这与它的半径不同。 这很简单。这是通过wedgeprops参数完成的。 + +然后我们想通过[注释](annotations)标记楔形。我们首先创建一些公共属性的字典,我们稍后可以将其作为关键字参数传递。然后我们迭代所有的楔形和每个楔形 + +- 计算楔形中心的角度, +- 从那里获得圆周上该角度的点的坐标, +- 确定文本的水平对齐方式,具体取决于该点位于圆圈的哪一侧, +- 使用获得的角度更新连接样式,使注释箭头从甜甜圈向外指向, +- 最后,使用所有先前确定的参数创建注释。 + +```python +fig, ax = plt.subplots(figsize=(6, 3), subplot_kw=dict(aspect="equal")) + +recipe = ["225 g flour", + "90 g sugar", + "1 egg", + "60 g butter", + "100 ml milk", + "1/2 package of yeast"] + +data = [225, 90, 50, 60, 100, 5] + +wedges, texts = ax.pie(data, wedgeprops=dict(width=0.5), startangle=-40) + +bbox_props = dict(boxstyle="square,pad=0.3", fc="w", ec="k", lw=0.72) +kw = dict(xycoords='data', textcoords='data', arrowprops=dict(arrowstyle="-"), + bbox=bbox_props, zorder=0, va="center") + +for i, p in enumerate(wedges): + ang = (p.theta2 - p.theta1)/2. + p.theta1 + y = np.sin(np.deg2rad(ang)) + x = np.cos(np.deg2rad(ang)) + horizontalalignment = {-1: "right", 1: "left"}[int(np.sign(x))] + connectionstyle = "angle,angleA=0,angleB={}".format(ang) + kw["arrowprops"].update({"connectionstyle": connectionstyle}) + ax.annotate(recipe[i], xy=(x, y), xytext=(1.35*np.sign(x), 1.4*y), + horizontalalignment=horizontalalignment, **kw) + +ax.set_title("Matplotlib bakery: A donut") + +plt.show() +``` + +![标记饼图和空心饼图2](https://matplotlib.org/_images/sphx_glr_pie_and_donut_labels_002.png) + +这就是空心饼图(甜甜圈)。然而,请注意,如果我们使用这个食谱,材料将足够大约6个甜甜圈-生产一个巨大的甜甜圈是未经测试的,并可能会导致烹饪失败。 + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.pie +matplotlib.pyplot.pie +matplotlib.axes.Axes.legend +matplotlib.pyplot.legend +``` + +## 下载这个示例 + +- [下载python源码: pie_and_donut_labels.py](https://matplotlib.org/_downloads/pie_and_donut_labels.py) +- [下载Jupyter notebook: pie_and_donut_labels.ipynb](https://matplotlib.org/_downloads/pie_and_donut_labels.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/pie_demo2.md b/Python/matplotlab/gallery/pie_and_polar_charts/pie_demo2.md new file mode 100644 index 00000000..0fed28ce --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/pie_demo2.md @@ -0,0 +1,60 @@ +# Pie 绘制饼图演示 + +使用 [pie()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.pie.html#matplotlib.axes.Axes.pie). 制作饼图。 + +此示例演示了一些饼图功能,如标签、可变大小、自动标记百分比、偏移切片和添加阴影。 + +```python +import matplotlib.pyplot as plt + +# Some data +labels = 'Frogs', 'Hogs', 'Dogs', 'Logs' +fracs = [15, 30, 45, 10] + +# Make figure and axes +fig, axs = plt.subplots(2, 2) + +# A standard pie plot +axs[0, 0].pie(fracs, labels=labels, autopct='%1.1f%%', shadow=True) + +# Shift the second slice using explode +axs[0, 1].pie(fracs, labels=labels, autopct='%.0f%%', shadow=True, + explode=(0, 0.1, 0, 0)) + +# Adapt radius and text size for a smaller pie +patches, texts, autotexts = axs[1, 0].pie(fracs, labels=labels, + autopct='%.0f%%', + textprops={'size': 'smaller'}, + shadow=True, radius=0.5) +# Make percent texts even smaller +plt.setp(autotexts, size='x-small') +autotexts[0].set_color('white') + +# Use a smaller explode and turn of the shadow for better visibility +patches, texts, autotexts = axs[1, 1].pie(fracs, labels=labels, + autopct='%.0f%%', + textprops={'size': 'smaller'}, + shadow=False, radius=0.5, + explode=(0, 0.05, 0, 0)) +plt.setp(autotexts, size='x-small') +autotexts[0].set_color('white') + +plt.show() +``` + +![Pie 绘制饼图演示](https://matplotlib.org/_images/sphx_glr_pie_demo2_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.pie +matplotlib.pyplot.pie +``` + +## 下载这个示例 + +- [下载python源码: pie_demo2.py](https://matplotlib.org/_downloads/pie_demo2.py) +- [下载Jupyter notebook: pie_demo2.ipynb](https://matplotlib.org/_downloads/pie_demo2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/pie_features.md b/Python/matplotlab/gallery/pie_and_polar_charts/pie_features.md new file mode 100644 index 00000000..15cae594 --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/pie_features.md @@ -0,0 +1,48 @@ +# 基本饼图 + +演示一个基本的饼图和一些额外的功能。 + +除了基本饼图外,此演示还显示了以下几个可选功能: + +- 切片标签。 +- 自动标记百分比。 +- 用 ``explode`` 偏移切片。 +- 投影。 +- 自定义起始角度 + +请注意,自定义起点角度: + +默认的起始``角度(startangle)``为0,这将在正x轴上开始“Frogs”切片。此示例将 ``startangle设置为90`` ,以便将所有对象逆时针旋转90度,并且青蛙切片从正y轴开始。 + +```python +import matplotlib.pyplot as plt + +# Pie chart, where the slices will be ordered and plotted counter-clockwise: +labels = 'Frogs', 'Hogs', 'Dogs', 'Logs' +sizes = [15, 30, 45, 10] +explode = (0, 0.1, 0, 0) # only "explode" the 2nd slice (i.e. 'Hogs') + +fig1, ax1 = plt.subplots() +ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%', + shadow=True, startangle=90) +ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle. + +plt.show() +``` + +![基本饼图示例](https://matplotlib.org/_images/sphx_glr_pie_features_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.pie +matplotlib.pyplot.pie +``` + +## 下载这个示例 + +- [下载python源码: pie_features.py](https://matplotlib.org/_downloads/pie_features.py) +- [下载Jupyter notebook: pie_features.ipynb](https://matplotlib.org/_downloads/pie_features.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/polar_bar.md b/Python/matplotlab/gallery/pie_and_polar_charts/polar_bar.md new file mode 100644 index 00000000..c8f00d7d --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/polar_bar.md @@ -0,0 +1,46 @@ +# 极轴上的饼图 + +极轴上的饼状条形图演示。 + +```python +import numpy as np +import matplotlib.pyplot as plt + + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# Compute pie slices +N = 20 +theta = np.linspace(0.0, 2 * np.pi, N, endpoint=False) +radii = 10 * np.random.rand(N) +width = np.pi / 4 * np.random.rand(N) + +ax = plt.subplot(111, projection='polar') +bars = ax.bar(theta, radii, width=width, bottom=0.0) + +# Use custom colors and opacity +for r, bar in zip(radii, bars): + bar.set_facecolor(plt.cm.viridis(r / 10.)) + bar.set_alpha(0.5) + +plt.show() +``` + +![极轴上的饼图示例](https://matplotlib.org/_images/sphx_glr_polar_bar_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.bar +matplotlib.pyplot.bar +matplotlib.projections.polar +``` + +## 下载这个示例 + +- [下载python源码: polar_bar.py](https://matplotlib.org/_downloads/polar_bar.py) +- [下载Jupyter notebook: polar_bar.ipynb](https://matplotlib.org/_downloads/polar_bar.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/polar_demo.md b/Python/matplotlab/gallery/pie_and_polar_charts/polar_demo.md new file mode 100644 index 00000000..fac692a6 --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/polar_demo.md @@ -0,0 +1,43 @@ +# 极轴上绘制线段 + +在极轴上绘制线图的演示。 + +```python +import numpy as np +import matplotlib.pyplot as plt + + +r = np.arange(0, 2, 0.01) +theta = 2 * np.pi * r + +ax = plt.subplot(111, projection='polar') +ax.plot(theta, r) +ax.set_rmax(2) +ax.set_rticks([0.5, 1, 1.5, 2]) # Less radial ticks +ax.set_rlabel_position(-22.5) # Move radial labels away from plotted line +ax.grid(True) + +ax.set_title("A line plot on a polar axis", va='bottom') +plt.show() +``` + +![极轴上绘制线段示例](https://matplotlib.org/_images/sphx_glr_polar_demo_001.png) + +## 参考 + +本示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.plot +matplotlib.projections.polar +matplotlib.projections.polar.PolarAxes +matplotlib.projections.polar.PolarAxes.set_rticks +matplotlib.projections.polar.PolarAxes.set_rmax +matplotlib.projections.polar.PolarAxes.set_rlabel_position +``` + +## 下载这个示例 + +- [下载python源码: polar_demo.py](https://matplotlib.org/_downloads/polar_demo.py) +- [下载Jupyter notebook: polar_demo.ipynb](https://matplotlib.org/_downloads/polar_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/polar_legend.md b/Python/matplotlab/gallery/pie_and_polar_charts/polar_legend.md new file mode 100644 index 00000000..330602b5 --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/polar_legend.md @@ -0,0 +1,45 @@ +# 极轴上的图例 + +极轴图上的图例演示。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +# radar green, solid grid lines +plt.rc('grid', color='#316931', linewidth=1, linestyle='-') +plt.rc('xtick', labelsize=15) +plt.rc('ytick', labelsize=15) + +# force square figure and square axes looks better for polar, IMO +fig = plt.figure(figsize=(8, 8)) +ax = fig.add_axes([0.1, 0.1, 0.8, 0.8], + projection='polar', facecolor='#d5de9c') + +r = np.arange(0, 3.0, 0.01) +theta = 2 * np.pi * r +ax.plot(theta, r, color='#ee8d18', lw=3, label='a line') +ax.plot(0.5 * theta, r, color='blue', ls='--', lw=3, label='another line') +ax.legend() + +plt.show() +``` + +![极轴图上的图例演示](https://matplotlib.org/_images/sphx_glr_polar_legend_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.plot +matplotlib.axes.Axes.legend +matplotlib.projections.polar +matplotlib.projections.polar.PolarAxes +``` + +## 下载这个示例 + +- [下载python源码: polar_legend.py](https://matplotlib.org/_downloads/polar_legend.py) +- [下载Jupyter notebook: polar_legend.ipynb](https://matplotlib.org/_downloads/polar_legend.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pie_and_polar_charts/polar_scatter.md b/Python/matplotlab/gallery/pie_and_polar_charts/polar_scatter.md new file mode 100644 index 00000000..bdce7c04 --- /dev/null +++ b/Python/matplotlab/gallery/pie_and_polar_charts/polar_scatter.md @@ -0,0 +1,77 @@ +# 极轴上的散点图 + +在这个例子中,尺寸径向增加,颜色随角度增加(只是为了验证符号是否正确分散)。 + +```python +import numpy as np +import matplotlib.pyplot as plt + + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# Compute areas and colors +N = 150 +r = 2 * np.random.rand(N) +theta = 2 * np.pi * np.random.rand(N) +area = 200 * r**2 +colors = theta + +fig = plt.figure() +ax = fig.add_subplot(111, projection='polar') +c = ax.scatter(theta, r, c=colors, s=area, cmap='hsv', alpha=0.75) +``` + +![极轴上的散点图示例](https://matplotlib.org/_images/sphx_glr_polar_scatter_001.png) + +## 极轴上的散点图,具有偏移原点 + +与先前图的主要区别在于原点半径的配置,产生环。 此外,θ零位置设置为旋转图。 + +```python +fig = plt.figure() +ax = fig.add_subplot(111, polar=True) +c = ax.scatter(theta, r, c=colors, s=area, cmap='hsv', alpha=0.75) + +ax.set_rorigin(-2.5) +ax.set_theta_zero_location('W', offset=10) +``` + +![极轴上的散点图2](https://matplotlib.org/_images/sphx_glr_polar_scatter_002.png) + +## 极轴上的散点图局限于扇区 + +与之前的图表的主要区别在于theta开始和结束限制的配置,产生扇区而不是整圆。 + +```python +fig = plt.figure() +ax = fig.add_subplot(111, polar=True) +c = ax.scatter(theta, r, c=colors, s=area, cmap='hsv', alpha=0.75) + +ax.set_thetamin(45) +ax.set_thetamax(135) + +plt.show() +``` + +![极轴上的散点图示例3](https://matplotlib.org/_images/sphx_glr_polar_scatter_003.png) + +### 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.scatter +matplotlib.pyplot.scatter +matplotlib.projections.polar +matplotlib.projections.polar.PolarAxes.set_rorigin +matplotlib.projections.polar.PolarAxes.set_theta_zero_location +matplotlib.projections.polar.PolarAxes.set_thetamin +matplotlib.projections.polar.PolarAxes.set_thetamax +``` + +## 下载这个示例 + +- [下载python源码: polar_scatter.py](https://matplotlib.org/_downloads/polar_scatter.py) +- [下载Jupyter notebook: polar_scatter.ipynb](https://matplotlib.org/_downloads/polar_scatter.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/align_ylabels.md b/Python/matplotlab/gallery/pyplots/align_ylabels.md new file mode 100644 index 00000000..c2099be7 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/align_ylabels.md @@ -0,0 +1,86 @@ +# 对齐y标签 + +这里显示了两种方法,一种是使用对 [Figure.align_ylabels](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.align_ylabels) 的简短调用,另一种是使用手动方式来对齐标签。 + +```python +import numpy as np +import matplotlib.pyplot as plt + + +def make_plot(axs): + box = dict(facecolor='yellow', pad=5, alpha=0.2) + + # Fixing random state for reproducibility + np.random.seed(19680801) + ax1 = axs[0, 0] + ax1.plot(2000*np.random.rand(10)) + ax1.set_title('ylabels not aligned') + ax1.set_ylabel('misaligned 1', bbox=box) + ax1.set_ylim(0, 2000) + + ax3 = axs[1, 0] + ax3.set_ylabel('misaligned 2', bbox=box) + ax3.plot(np.random.rand(10)) + + ax2 = axs[0, 1] + ax2.set_title('ylabels aligned') + ax2.plot(2000*np.random.rand(10)) + ax2.set_ylabel('aligned 1', bbox=box) + ax2.set_ylim(0, 2000) + + ax4 = axs[1, 1] + ax4.plot(np.random.rand(10)) + ax4.set_ylabel('aligned 2', bbox=box) + + +# Plot 1: +fig, axs = plt.subplots(2, 2) +fig.subplots_adjust(left=0.2, wspace=0.6) +make_plot(axs) + +# just align the last column of axes: +fig.align_ylabels(axs[:, 1]) +plt.show() +``` + +![对齐y标签示例](https://matplotlib.org/_images/sphx_glr_align_ylabels_0011.png) + +> 另见 Figure.align_ylabels and Figure.align_labels for a direct method of doing the same thing. Also Aligning Labels + +或者,我们可以使用y轴对象的set_label_coords方法手动在子图之间手动对齐轴标签。请注意,这需要我们知道硬编码的良好偏移值。 + +```python +fig, axs = plt.subplots(2, 2) +fig.subplots_adjust(left=0.2, wspace=0.6) + +make_plot(axs) + +labelx = -0.3 # axes coords + +for j in range(2): + axs[j, 1].yaxis.set_label_coords(labelx, 0.5) + +plt.show() +``` + +![对齐y标签示例2](https://matplotlib.org/_images/sphx_glr_align_ylabels_002.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.figure.Figure.align_ylabels +matplotlib.axis.Axis.set_label_coords +matplotlib.axes.Axes.plot +matplotlib.pyplot.plot +matplotlib.axes.Axes.set_title +matplotlib.axes.Axes.set_ylabel +matplotlib.axes.Axes.set_ylim +``` + +## 下载这个示例 + +- [下载python源码: align_ylabels.py](https://matplotlib.org/_downloads/align_ylabels.py) +- [下载Jupyter notebook: align_ylabels.ipynb](https://matplotlib.org/_downloads/align_ylabels.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/annotate_transform.md b/Python/matplotlab/gallery/pyplots/annotate_transform.md new file mode 100644 index 00000000..0bdbdc64 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/annotate_transform.md @@ -0,0 +1,57 @@ +# 注释变换 + +此示例显示如何使用不同的坐标系进行注释。 有关注释功能的完整概述,另请参阅[注释教程](https://matplotlib.org/tutorials/text/annotations.html)。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +x = np.arange(0, 10, 0.005) +y = np.exp(-x/2.) * np.sin(2*np.pi*x) + +fig, ax = plt.subplots() +ax.plot(x, y) +ax.set_xlim(0, 10) +ax.set_ylim(-1, 1) + +xdata, ydata = 5, 0 +xdisplay, ydisplay = ax.transData.transform_point((xdata, ydata)) + +bbox = dict(boxstyle="round", fc="0.8") +arrowprops = dict( + arrowstyle = "->", + connectionstyle = "angle,angleA=0,angleB=90,rad=10") + +offset = 72 +ax.annotate('data = (%.1f, %.1f)'%(xdata, ydata), + (xdata, ydata), xytext=(-2*offset, offset), textcoords='offset points', + bbox=bbox, arrowprops=arrowprops) + + +disp = ax.annotate('display = (%.1f, %.1f)'%(xdisplay, ydisplay), + (xdisplay, ydisplay), xytext=(0.5*offset, -offset), + xycoords='figure pixels', + textcoords='offset points', + bbox=bbox, arrowprops=arrowprops) + + +plt.show() +``` + +![注释变换示例](https://matplotlib.org/_images/sphx_glr_annotate_transform_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.transforms.Transform.transform_point +matplotlib.axes.Axes.annotate +matplotlib.pyplot.annotate +``` + +## 下载这个示例 + +- [下载python源码: annotate_transform.py](https://matplotlib.org/_downloads/annotate_transform.py) +- [下载Jupyter notebook: annotate_transform.ipynb](https://matplotlib.org/_downloads/annotate_transform.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/annotation_basic.md b/Python/matplotlab/gallery/pyplots/annotation_basic.md new file mode 100644 index 00000000..6648ce67 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/annotation_basic.md @@ -0,0 +1,39 @@ +# 注释一个图像 + +此示例显示如何使用指向提供的坐标的箭头注释绘图。我们修改箭头的默认值,以“缩小”它。 + +有关注释功能的完整概述,另请参阅[注释教程](https://matplotlib.org/tutorials/text/annotations.html)。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() + +t = np.arange(0.0, 5.0, 0.01) +s = np.cos(2*np.pi*t) +line, = ax.plot(t, s, lw=2) + +ax.annotate('local max', xy=(2, 1), xytext=(3, 1.5), + arrowprops=dict(facecolor='black', shrink=0.05), + ) +ax.set_ylim(-2, 2) +plt.show() +``` + +![注释图像示例](https://matplotlib.org/_images/sphx_glr_annotation_basic_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.annotate +matplotlib.pyplot.annotate +``` + +## 下载这个示例 + +- [下载python源码: annotation_basic.py](https://matplotlib.org/_downloads/annotation_basic.py) +- [下载Jupyter notebook: annotation_basic.ipynb](https://matplotlib.org/_downloads/annotation_basic.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/annotation_polar.md b/Python/matplotlab/gallery/pyplots/annotation_polar.md new file mode 100644 index 00000000..f2095579 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/annotation_polar.md @@ -0,0 +1,47 @@ +# 注释极坐标 + +此示例显示如何在极坐标图上创建注释。 + +有关注释功能的完整概述,另请参阅[注释教程](https://matplotlib.org/tutorials/text/annotations.html)。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +fig = plt.figure() +ax = fig.add_subplot(111, polar=True) +r = np.arange(0,1,0.001) +theta = 2 * 2*np.pi * r +line, = ax.plot(theta, r, color='#ee8d18', lw=3) + +ind = 800 +thisr, thistheta = r[ind], theta[ind] +ax.plot([thistheta], [thisr], 'o') +ax.annotate('a polar annotation', + xy=(thistheta, thisr), # theta, radius + xytext=(0.05, 0.05), # fraction, fraction + textcoords='figure fraction', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='left', + verticalalignment='bottom', + ) +plt.show() +``` + +![注释极坐标](https://matplotlib.org/_images/sphx_glr_annotation_polar_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.projections.polar +matplotlib.axes.Axes.annotate +matplotlib.pyplot.annotate +``` + +## 下载这个示例 + +- [下载python源码: annotation_polar.py](https://matplotlib.org/_downloads/annotation_polar.py) +- [下载Jupyter notebook: annotation_polar.ipynb](https://matplotlib.org/_downloads/annotation_polar.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/auto_subplots_adjust.md b/Python/matplotlab/gallery/pyplots/auto_subplots_adjust.md new file mode 100644 index 00000000..4d46d0d4 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/auto_subplots_adjust.md @@ -0,0 +1,58 @@ +# 自动调整子图 + +自动调整子图参数。 此示例显示了一种使用[draw_event](https://matplotlib.org/users/event_handling.html)上的回调从ticklabels范围确定subplot参数的方法。 + +请注意,使用[tight_layout](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.tight_layout)或 ``constrained_layout`` 可以实现类似的结果; 此示例显示了如何自定义子图参数调整。 + +```python +import matplotlib.pyplot as plt +import matplotlib.transforms as mtransforms +fig, ax = plt.subplots() +ax.plot(range(10)) +ax.set_yticks((2,5,7)) +labels = ax.set_yticklabels(('really, really, really', 'long', 'labels')) + +def on_draw(event): + bboxes = [] + for label in labels: + bbox = label.get_window_extent() + # the figure transform goes from relative coords->pixels and we + # want the inverse of that + bboxi = bbox.inverse_transformed(fig.transFigure) + bboxes.append(bboxi) + + # this is the bbox that bounds all the bboxes, again in relative + # figure coords + bbox = mtransforms.Bbox.union(bboxes) + if fig.subplotpars.left < bbox.width: + # we need to move it over + fig.subplots_adjust(left=1.1*bbox.width) # pad a little + fig.canvas.draw() + return False + +fig.canvas.mpl_connect('draw_event', on_draw) + +plt.show() +``` + +![自动调整子图](https://matplotlib.org/_images/sphx_glr_auto_subplots_adjust_0011.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.artist.Artist.get_window_extent +matplotlib.transforms.Bbox +matplotlib.transforms.Bbox.inverse_transformed +matplotlib.transforms.Bbox.union +matplotlib.figure.Figure.subplots_adjust +matplotlib.figure.SubplotParams +matplotlib.backend_bases.FigureCanvasBase.mpl_connect +``` + +## 下载这个示例 + +- [下载python源码: auto_subplots_adjust.py](https://matplotlib.org/_downloads/auto_subplots_adjust.py) +- [下载Jupyter notebook: auto_subplots_adjust.ipynb](https://matplotlib.org/_downloads/auto_subplots_adjust.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/boxplot_demo_pyplot.md b/Python/matplotlab/gallery/pyplots/boxplot_demo_pyplot.md new file mode 100644 index 00000000..0291623f --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/boxplot_demo_pyplot.md @@ -0,0 +1,108 @@ +# Boxplot 演示 + +boxplot 的代码示例。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# fake up some data +spread = np.random.rand(50) * 100 +center = np.ones(25) * 50 +flier_high = np.random.rand(10) * 100 + 100 +flier_low = np.random.rand(10) * -100 +data = np.concatenate((spread, center, flier_high, flier_low)) +``` + +```python +fig1, ax1 = plt.subplots() +ax1.set_title('Basic Plot') +ax1.boxplot(data) +``` + +![Boxplot示例](https://matplotlib.org/_images/sphx_glr_boxplot_demo_pyplot_001.png) + +```python +fig2, ax2 = plt.subplots() +ax2.set_title('Notched boxes') +ax2.boxplot(data, notch=True) +``` + +![Boxplot示例2](https://matplotlib.org/_images/sphx_glr_boxplot_demo_pyplot_002.png) + +```python +green_diamond = dict(markerfacecolor='g', marker='D') +fig3, ax3 = plt.subplots() +ax3.set_title('Changed Outlier Symbols') +ax3.boxplot(data, flierprops=green_diamond) +``` + +![Boxplot示例3](https://matplotlib.org/_images/sphx_glr_boxplot_demo_pyplot_003.png) + +```python +fig4, ax4 = plt.subplots() +ax4.set_title('Hide Outlier Points') +ax4.boxplot(data, showfliers=False) +``` + +![Boxplot示例4](https://matplotlib.org/_images/sphx_glr_boxplot_demo_pyplot_004.png) + +```python +red_square = dict(markerfacecolor='r', marker='s') +fig5, ax5 = plt.subplots() +ax5.set_title('Horizontal Boxes') +ax5.boxplot(data, vert=False, flierprops=red_square) +``` + +![Boxplot示例5](https://matplotlib.org/_images/sphx_glr_boxplot_demo_pyplot_005.png) + +```python +fig6, ax6 = plt.subplots() +ax6.set_title('Shorter Whisker Length') +ax6.boxplot(data, flierprops=red_square, vert=False, whis=0.75) +``` + +![Boxplot示例6](https://matplotlib.org/_images/sphx_glr_boxplot_demo_pyplot_006.png) + +模拟一些更多的数据。 + +```python +spread = np.random.rand(50) * 100 +center = np.ones(25) * 40 +flier_high = np.random.rand(10) * 100 + 100 +flier_low = np.random.rand(10) * -100 +d2 = np.concatenate((spread, center, flier_high, flier_low)) +data.shape = (-1, 1) +d2.shape = (-1, 1) +``` + +仅当所有列的长度相同时,才能生成二维数组。 如果不是,则使用列表。 这实际上更有效,因为boxplot无论如何都会在内部将2-D数组转换为向量列表。 + +```python +data = [data, d2, d2[::2,0]] +fig7, ax7 = plt.subplots() +ax7.set_title('Multiple Samples with Different sizes') +ax7.boxplot(data) + +plt.show() +``` + +![Boxplot示例7](https://matplotlib.org/_images/sphx_glr_boxplot_demo_pyplot_007.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.boxplot +matplotlib.pyplot.boxplot +``` + +## 下载这个示例 + +- [下载python源码: boxplot_demo_pyplot.py](https://matplotlib.org/_downloads/boxplot_demo_pyplot.py) +- [下载Jupyter notebook: boxplot_demo_pyplot.ipynb](https://matplotlib.org/_downloads/boxplot_demo_pyplot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/dollar_ticks.md b/Python/matplotlab/gallery/pyplots/dollar_ticks.md new file mode 100644 index 00000000..081eb6e1 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/dollar_ticks.md @@ -0,0 +1,45 @@ +# 美元符刻度 + +使用 [FormatStrFormatter](https://matplotlib.org/api/ticker_api.html#matplotlib.ticker.FormatStrFormatter) 在y轴标签上添加美元符号。 + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker + +# Fixing random state for reproducibility +np.random.seed(19680801) + +fig, ax = plt.subplots() +ax.plot(100*np.random.rand(20)) + +formatter = ticker.FormatStrFormatter('$%1.2f') +ax.yaxis.set_major_formatter(formatter) + +for tick in ax.yaxis.get_major_ticks(): + tick.label1On = False + tick.label2On = True + tick.label2.set_color('green') + +plt.show() +``` + +![美元符号刻度示例](https://matplotlib.org/_images/sphx_glr_dollar_ticks_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.ticker +matplotlib.ticker.FormatStrFormatter +matplotlib.axis.Axis.set_major_formatter +matplotlib.axis.Axis.get_major_ticks +matplotlib.axis.Tick +``` + +## 下载这个示例 + +- [下载python源码: dollar_ticks.py](https://matplotlib.org/_downloads/dollar_ticks.py) +- [下载Jupyter notebook: dollar_ticks.ipynb](https://matplotlib.org/_downloads/dollar_ticks.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/fig_axes_customize_simple.md b/Python/matplotlab/gallery/pyplots/fig_axes_customize_simple.md new file mode 100644 index 00000000..bc0b7814 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/fig_axes_customize_simple.md @@ -0,0 +1,59 @@ +# 简单的图轴自定义 + +自定义简单绘图的背景,标签和刻度。 + +```python +import matplotlib.pyplot as plt +``` + +用 ``plt.figure`` 创建一个 ``matplotlib.figure.Figure`` 实例 + +```python +fig = plt.figure() +rect = fig.patch # a rectangle instance +rect.set_facecolor('lightgoldenrodyellow') + +ax1 = fig.add_axes([0.1, 0.3, 0.4, 0.4]) +rect = ax1.patch +rect.set_facecolor('lightslategray') + + +for label in ax1.xaxis.get_ticklabels(): + # label is a Text instance + label.set_color('red') + label.set_rotation(45) + label.set_fontsize(16) + +for line in ax1.yaxis.get_ticklines(): + # line is a Line2D instance + line.set_color('green') + line.set_markersize(25) + line.set_markeredgewidth(3) + +plt.show() +``` + +![简单的图轴自定义示例](https://matplotlib.org/_images/sphx_glr_fig_axes_customize_simple_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axis.Axis.get_ticklabels +matplotlib.axis.Axis.get_ticklines +matplotlib.text.Text.set_rotation +matplotlib.text.Text.set_fontsize +matplotlib.text.Text.set_color +matplotlib.lines.Line2D +matplotlib.lines.Line2D.set_color +matplotlib.lines.Line2D.set_markersize +matplotlib.lines.Line2D.set_markeredgewidth +matplotlib.patches.Patch.set_facecolor +``` + +## 下载这个示例 + +- [下载python源码: fig_axes_customize_simple.py](https://matplotlib.org/_downloads/fig_axes_customize_simple.py) +- [下载Jupyter notebook: fig_axes_customize_simple.ipynb](https://matplotlib.org/_downloads/fig_axes_customize_simple.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/fig_axes_labels_simple.md b/Python/matplotlab/gallery/pyplots/fig_axes_labels_simple.md new file mode 100644 index 00000000..9bc47b4e --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/fig_axes_labels_simple.md @@ -0,0 +1,50 @@ +# 简单的图轴标记 + +标记图的轴。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +fig = plt.figure() +fig.subplots_adjust(top=0.8) +ax1 = fig.add_subplot(211) +ax1.set_ylabel('volts') +ax1.set_title('a sine wave') + +t = np.arange(0.0, 1.0, 0.01) +s = np.sin(2*np.pi*t) +line, = ax1.plot(t, s, color='blue', lw=2) + +# Fixing random state for reproducibility +np.random.seed(19680801) + +ax2 = fig.add_axes([0.15, 0.1, 0.7, 0.3]) +n, bins, patches = ax2.hist(np.random.randn(1000), 50, + facecolor='yellow', edgecolor='yellow') +ax2.set_xlabel('time (s)') + +plt.show() +``` + +![简单的图轴标记示例](https://matplotlib.org/_images/sphx_glr_fig_axes_labels_simple_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.set_xlabel +matplotlib.axes.Axes.set_ylabel +matplotlib.axes.Axes.set_title +matplotlib.axes.Axes.plot +matplotlib.axes.Axes.hist +matplotlib.figure.Figure.add_axes +``` + +## 下载这个示例 + +- [下载python源码: fig_axes_labels_simple.py](https://matplotlib.org/_downloads/fig_axes_labels_simple.py) +- [下载Jupyter notebook: fig_axes_labels_simple.ipynb](https://matplotlib.org/_downloads/fig_axes_labels_simple.ipynb) + diff --git a/Python/matplotlab/gallery/pyplots/fig_x.md b/Python/matplotlab/gallery/pyplots/fig_x.md new file mode 100644 index 00000000..ac70f45d --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/fig_x.md @@ -0,0 +1,37 @@ +# X图 + +添加线条到图形(没有轴)。 + +```python +import matplotlib.pyplot as plt +import matplotlib.lines as lines + + +fig = plt.figure() + +l1 = lines.Line2D([0, 1], [0, 1], transform=fig.transFigure, figure=fig) + +l2 = lines.Line2D([0, 1], [1, 0], transform=fig.transFigure, figure=fig) + +fig.lines.extend([l1, l2]) + +plt.show() +``` + +![X图示例](https://matplotlib.org/_images/sphx_glr_fig_x_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.figure +matplotlib.lines +matplotlib.lines.Line2D +``` + +## 下载这个示例 + +- [下载python源码: fig_x.py](https://matplotlib.org/_downloads/fig_x.py) +- [下载Jupyter notebook: fig_x.ipynb](https://matplotlib.org/_downloads/fig_x.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/pyplot_formatstr.md b/Python/matplotlab/gallery/pyplots/pyplot_formatstr.md new file mode 100644 index 00000000..9a109278 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/pyplot_formatstr.md @@ -0,0 +1,27 @@ +# Pyplot 格式字符串(Formatstr) + +使用格式字符串为绘图([plot](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot))着色并设置其标记。 + +```python +import matplotlib.pyplot as plt +plt.plot([1,2,3,4], [1,4,9,16], 'ro') +plt.axis([0, 6, 0, 20]) +plt.show() +``` + +![格式字符串示例](https://matplotlib.org/_images/sphx_glr_pyplot_formatstr_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.plot +matplotlib.axes.Axes.plot +``` + +## 下载这个示例 + +- [下载python源码: pyplot_formatstr.py](https://matplotlib.org/_downloads/pyplot_formatstr.py) +- [下载Jupyter notebook: pyplot_formatstr.ipynb](https://matplotlib.org/_downloads/pyplot_formatstr.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/pyplot_mathtext.md b/Python/matplotlab/gallery/pyplots/pyplot_mathtext.md new file mode 100644 index 00000000..69860597 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/pyplot_mathtext.md @@ -0,0 +1,36 @@ +# Pyplot 数学文本(Mathtext) + +在文本标签中使用数学表达式。有关MathText的概述,请参阅[编写数学表达式](https://matplotlib.org/tutorials/text/mathtext.html)。 + +```python +import numpy as np +import matplotlib.pyplot as plt +t = np.arange(0.0, 2.0, 0.01) +s = np.sin(2*np.pi*t) + +plt.plot(t,s) +plt.title(r'$\alpha_i > \beta_i$', fontsize=20) +plt.text(1, -0.6, r'$\sum_{i=0}^\infty x_i$', fontsize=20) +plt.text(0.6, 0.6, r'$\mathcal{A}\mathrm{sin}(2 \omega t)$', + fontsize=20) +plt.xlabel('time (s)') +plt.ylabel('volts (mV)') +plt.show() +``` + +![数学文本示例](https://matplotlib.org/_images/sphx_glr_pyplot_mathtext_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.text +matplotlib.axes.Axes.text +``` + +## 下载这个示例 + +- [下载python源码: pyplot_mathtext.py](https://matplotlib.org/_downloads/pyplot_mathtext.py) +- [下载Jupyter notebook: pyplot_mathtext.ipynb](https://matplotlib.org/_downloads/pyplot_mathtext.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/pyplot_scales.md b/Python/matplotlab/gallery/pyplots/pyplot_scales.md new file mode 100644 index 00000000..d0a04d62 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/pyplot_scales.md @@ -0,0 +1,82 @@ +# Pyplot 比例尺(Scales) + +在不同的比例上创建图。这里显示了线性,对数,对称对数和对数标度。有关更多示例,请参阅库的[“缩放”](https://matplotlib.org/gallery/index.html#scales-examples)部分。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +from matplotlib.ticker import NullFormatter # useful for `logit` scale + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# make up some data in the interval ]0, 1[ +y = np.random.normal(loc=0.5, scale=0.4, size=1000) +y = y[(y > 0) & (y < 1)] +y.sort() +x = np.arange(len(y)) + +# plot with various axes scales +plt.figure(1) + +# linear +plt.subplot(221) +plt.plot(x, y) +plt.yscale('linear') +plt.title('linear') +plt.grid(True) + + +# log +plt.subplot(222) +plt.plot(x, y) +plt.yscale('log') +plt.title('log') +plt.grid(True) + + +# symmetric log +plt.subplot(223) +plt.plot(x, y - y.mean()) +plt.yscale('symlog', linthreshy=0.01) +plt.title('symlog') +plt.grid(True) + +# logit +plt.subplot(224) +plt.plot(x, y) +plt.yscale('logit') +plt.title('logit') +plt.grid(True) +# Format the minor tick labels of the y-axis into empty strings with +# `NullFormatter`, to avoid cumbering the axis with too many labels. +plt.gca().yaxis.set_minor_formatter(NullFormatter()) +# Adjust the subplot layout, because the logit one may take more space +# than usual, due to y-tick labels like "1 - 10^{-3}" +plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.25, + wspace=0.35) + +plt.show() +``` + +![Pyplot 比例尺示例](https://matplotlib.org/_images/sphx_glr_pyplot_scales_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.subplot +matplotlib.pyplot.subplots_adjust +matplotlib.pyplot.gca +matplotlib.pyplot.yscale +matplotlib.ticker.NullFormatter +matplotlib.axis.Axis.set_minor_formatter +``` + +## 下载这个示例 + +- [下载python源码: pyplot_scales.py](https://matplotlib.org/_downloads/pyplot_scales.py) +- [下载Jupyter notebook: pyplot_scales.ipynb](https://matplotlib.org/_downloads/pyplot_scales.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/pyplot_simple.md b/Python/matplotlab/gallery/pyplots/pyplot_simple.md new file mode 100644 index 00000000..0df02664 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/pyplot_simple.md @@ -0,0 +1,28 @@ +# Pyplot 简单图(Simple) + +A most simple plot, where a list of numbers is plotted against their index. + +```python +import matplotlib.pyplot as plt +plt.plot([1,2,3,4]) +plt.ylabel('some numbers') +plt.show() +``` + +![简单图示例](https://matplotlib.org/_images/sphx_glr_pyplot_simple_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.plot +matplotlib.pyplot.ylabel +matplotlib.pyplot.show +``` + +## 下载这个示例 + +- [下载python源码: pyplot_simple.py](https://matplotlib.org/_downloads/pyplot_simple.py) +- [下载Jupyter notebook: pyplot_simple.ipynb](https://matplotlib.org/_downloads/pyplot_simple.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/pyplot_text.md b/Python/matplotlab/gallery/pyplots/pyplot_text.md new file mode 100644 index 00000000..4b9143d7 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/pyplot_text.md @@ -0,0 +1,45 @@ +# Pyplot 文本(Text) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + +mu, sigma = 100, 15 +x = mu + sigma * np.random.randn(10000) + +# the histogram of the data +n, bins, patches = plt.hist(x, 50, density=True, facecolor='g', alpha=0.75) + + +plt.xlabel('Smarts') +plt.ylabel('Probability') +plt.title('Histogram of IQ') +plt.text(60, .025, r'$\mu=100,\ \sigma=15$') +plt.axis([40, 160, 0, 0.03]) +plt.grid(True) +plt.show() +``` + +![Pyplot 文本示例](https://matplotlib.org/_images/sphx_glr_pyplot_text_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.hist +matplotlib.pyplot.xlabel +matplotlib.pyplot.ylabel +matplotlib.pyplot.text +matplotlib.pyplot.grid +matplotlib.pyplot.show +``` + +## 下载这个示例 + +- [下载python源码: pyplot_text.py](https://matplotlib.org/_downloads/pyplot_text.py) +- [下载Jupyter notebook: pyplot_text.ipynb](https://matplotlib.org/_downloads/pyplot_text.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/pyplot_three.md b/Python/matplotlab/gallery/pyplots/pyplot_three.md new file mode 100644 index 00000000..c9228595 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/pyplot_three.md @@ -0,0 +1,32 @@ +# Pyplot 绘制三条线 + +在一次调用 [plot](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) 绘图中绘制三个线图。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# evenly sampled time at 200ms intervals +t = np.arange(0., 5., 0.2) + +# red dashes, blue squares and green triangles +plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^') +plt.show() +``` + +![Pyplot 绘制三条线示例](https://matplotlib.org/_images/sphx_glr_pyplot_three_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.plot +matplotlib.axes.Axes.plot +``` + +## 下载这个示例 + +- [下载python源码: pyplot_three.py](https://matplotlib.org/_downloads/pyplot_three.py) +- [下载Jupyter notebook: pyplot_three.ipynb](https://matplotlib.org/_downloads/pyplot_three.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/pyplot_two_subplots.md b/Python/matplotlab/gallery/pyplots/pyplot_two_subplots.md new file mode 100644 index 00000000..d467b400 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/pyplot_two_subplots.md @@ -0,0 +1,39 @@ +# Pyplot 绘制两个子图 + +使用pyplot.subplot创建带有两个子图的图形。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +def f(t): + return np.exp(-t) * np.cos(2*np.pi*t) + +t1 = np.arange(0.0, 5.0, 0.1) +t2 = np.arange(0.0, 5.0, 0.02) + +plt.figure(1) +plt.subplot(211) +plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k') + +plt.subplot(212) +plt.plot(t2, np.cos(2*np.pi*t2), 'r--') +plt.show() +``` + +![Pyplot 绘制两个子图示例](https://matplotlib.org/_images/sphx_glr_pyplot_two_subplots_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.pyplot.figure +matplotlib.pyplot.subplot +``` + +## 下载这个示例 + +- [下载python源码: pyplot_two_subplots.py](https://matplotlib.org/_downloads/pyplot_two_subplots.py) +- [下载Jupyter notebook: pyplot_two_subplots.ipynb](https://matplotlib.org/_downloads/pyplot_two_subplots.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/text_commands.md b/Python/matplotlab/gallery/pyplots/text_commands.md new file mode 100644 index 00000000..93da98a8 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/text_commands.md @@ -0,0 +1,61 @@ +# 绘制不同的文本 + +绘制许多不同种类的文本。 + +```python +import matplotlib.pyplot as plt + +fig = plt.figure() +fig.suptitle('bold figure suptitle', fontsize=14, fontweight='bold') + +ax = fig.add_subplot(111) +fig.subplots_adjust(top=0.85) +ax.set_title('axes title') + +ax.set_xlabel('xlabel') +ax.set_ylabel('ylabel') + +ax.text(3, 8, 'boxed italics text in data coords', style='italic', + bbox={'facecolor':'red', 'alpha':0.5, 'pad':10}) + +ax.text(2, 6, r'an equation: $E=mc^2$', fontsize=15) + +ax.text(3, 2, 'unicode: Institut f\374r Festk\366rperphysik') + +ax.text(0.95, 0.01, 'colored text in axes coords', + verticalalignment='bottom', horizontalalignment='right', + transform=ax.transAxes, + color='green', fontsize=15) + + +ax.plot([2], [1], 'o') +ax.annotate('annotate', xy=(2, 1), xytext=(3, 4), + arrowprops=dict(facecolor='black', shrink=0.05)) + +ax.axis([0, 10, 0, 10]) + +plt.show() +``` + +![绘制不同的文本示例](https://matplotlib.org/_images/sphx_glr_text_commands_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.figure.Figure.suptitle +matplotlib.figure.Figure.add_subplot +matplotlib.figure.Figure.subplots_adjust +matplotlib.axes.Axes.set_title +matplotlib.axes.Axes.set_xlabel +matplotlib.axes.Axes.set_ylabel +matplotlib.axes.Axes.text +matplotlib.axes.Axes.annotate +``` + +## 下载这个示例 + +- [下载python源码: text_commands.py](https://matplotlib.org/_downloads/text_commands.py) +- [下载Jupyter notebook: text_commands.ipynb](https://matplotlib.org/_downloads/text_commands.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/text_layout.md b/Python/matplotlab/gallery/pyplots/text_layout.md new file mode 100644 index 00000000..cd888293 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/text_layout.md @@ -0,0 +1,100 @@ +# 不同文本的布局 + +创建具有不同对齐和旋转的文本。 + +```python +import matplotlib.pyplot as plt +import matplotlib.patches as patches + +# build a rectangle in axes coords +left, width = .25, .5 +bottom, height = .25, .5 +right = left + width +top = bottom + height + +fig = plt.figure() +ax = fig.add_axes([0,0,1,1]) + +# axes coordinates are 0,0 is bottom left and 1,1 is upper right +p = patches.Rectangle( + (left, bottom), width, height, + fill=False, transform=ax.transAxes, clip_on=False + ) + +ax.add_patch(p) + +ax.text(left, bottom, 'left top', + horizontalalignment='left', + verticalalignment='top', + transform=ax.transAxes) + +ax.text(left, bottom, 'left bottom', + horizontalalignment='left', + verticalalignment='bottom', + transform=ax.transAxes) + +ax.text(right, top, 'right bottom', + horizontalalignment='right', + verticalalignment='bottom', + transform=ax.transAxes) + +ax.text(right, top, 'right top', + horizontalalignment='right', + verticalalignment='top', + transform=ax.transAxes) + +ax.text(right, bottom, 'center top', + horizontalalignment='center', + verticalalignment='top', + transform=ax.transAxes) + +ax.text(left, 0.5*(bottom+top), 'right center', + horizontalalignment='right', + verticalalignment='center', + rotation='vertical', + transform=ax.transAxes) + +ax.text(left, 0.5*(bottom+top), 'left center', + horizontalalignment='left', + verticalalignment='center', + rotation='vertical', + transform=ax.transAxes) + +ax.text(0.5*(left+right), 0.5*(bottom+top), 'middle', + horizontalalignment='center', + verticalalignment='center', + fontsize=20, color='red', + transform=ax.transAxes) + +ax.text(right, 0.5*(bottom+top), 'centered', + horizontalalignment='center', + verticalalignment='center', + rotation='vertical', + transform=ax.transAxes) + +ax.text(left, top, 'rotated\nwith newlines', + horizontalalignment='center', + verticalalignment='center', + rotation=45, + transform=ax.transAxes) + +ax.set_axis_off() +plt.show() +``` + +![不同文本的布局](https://matplotlib.org/_images/sphx_glr_text_layout_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.text +matplotlib.pyplot.text +``` + +## 下载这个示例 + +- [下载python源码: text_layout.py](https://matplotlib.org/_downloads/text_layout.py) +- [下载Jupyter notebook: text_layout.ipynb](https://matplotlib.org/_downloads/text_layout.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/whats_new_1_subplot3d.md b/Python/matplotlab/gallery/pyplots/whats_new_1_subplot3d.md new file mode 100644 index 00000000..1ad22302 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/whats_new_1_subplot3d.md @@ -0,0 +1,57 @@ +# 1.0版本新特性:3d子图 + +在同一图中创建两个三维图。 + +```python +# This import registers the 3D projection, but is otherwise unused. +from mpl_toolkits.mplot3d import Axes3D # noqa: F401 unused import + +from matplotlib import cm +#from matplotlib.ticker import LinearLocator, FixedLocator, FormatStrFormatter +import matplotlib.pyplot as plt +import numpy as np + +fig = plt.figure() + +ax = fig.add_subplot(1, 2, 1, projection='3d') +X = np.arange(-5, 5, 0.25) +Y = np.arange(-5, 5, 0.25) +X, Y = np.meshgrid(X, Y) +R = np.sqrt(X**2 + Y**2) +Z = np.sin(R) +surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet, + linewidth=0, antialiased=False) +ax.set_zlim3d(-1.01, 1.01) + +#ax.w_zaxis.set_major_locator(LinearLocator(10)) +#ax.w_zaxis.set_major_formatter(FormatStrFormatter('%.03f')) + +fig.colorbar(surf, shrink=0.5, aspect=5) + +from mpl_toolkits.mplot3d.axes3d import get_test_data +ax = fig.add_subplot(1, 2, 2, projection='3d') +X, Y, Z = get_test_data(0.05) +ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10) + +plt.show() +``` + +![3d子图示例](https://matplotlib.org/_images/sphx_glr_whats_new_1_subplot3d_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +import mpl_toolkits +matplotlib.figure.Figure.add_subplot +mpl_toolkits.mplot3d.axes3d.Axes3D.plot_surface +mpl_toolkits.mplot3d.axes3d.Axes3D.plot_wireframe +mpl_toolkits.mplot3d.axes3d.Axes3D.set_zlim3d +``` + +## 下载这个示例 + +- [下载python源码: whats_new_1_subplot3d.py](https://matplotlib.org/_downloads/whats_new_1_subplot3d.py) +- [下载Jupyter notebook: whats_new_1_subplot3d.ipynb](https://matplotlib.org/_downloads/whats_new_1_subplot3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/whats_new_98_4_fancy.md b/Python/matplotlab/gallery/pyplots/whats_new_98_4_fancy.md new file mode 100644 index 00000000..d340cdd0 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/whats_new_98_4_fancy.md @@ -0,0 +1,82 @@ +# 0.98.4版本新的炫酷特性 + +创建精美的盒子和箭头样式。 + +```python +import matplotlib.patches as mpatch +import matplotlib.pyplot as plt + +figheight = 8 +fig = plt.figure(1, figsize=(9, figheight), dpi=80) +fontsize = 0.4 * fig.dpi + +def make_boxstyles(ax): + styles = mpatch.BoxStyle.get_styles() + + for i, (stylename, styleclass) in enumerate(sorted(styles.items())): + ax.text(0.5, (float(len(styles)) - 0.5 - i)/len(styles), stylename, + ha="center", + size=fontsize, + transform=ax.transAxes, + bbox=dict(boxstyle=stylename, fc="w", ec="k")) + +def make_arrowstyles(ax): + styles = mpatch.ArrowStyle.get_styles() + + ax.set_xlim(0, 4) + ax.set_ylim(0, figheight) + + for i, (stylename, styleclass) in enumerate(sorted(styles.items())): + y = (float(len(styles)) -0.25 - i) # /figheight + p = mpatch.Circle((3.2, y), 0.2, fc="w") + ax.add_patch(p) + + ax.annotate(stylename, (3.2, y), + (2., y), + #xycoords="figure fraction", textcoords="figure fraction", + ha="right", va="center", + size=fontsize, + arrowprops=dict(arrowstyle=stylename, + patchB=p, + shrinkA=5, + shrinkB=5, + fc="w", ec="k", + connectionstyle="arc3,rad=-0.05", + ), + bbox=dict(boxstyle="square", fc="w")) + + ax.xaxis.set_visible(False) + ax.yaxis.set_visible(False) + + +ax1 = fig.add_subplot(121, frameon=False, xticks=[], yticks=[]) +make_boxstyles(ax1) + +ax2 = fig.add_subplot(122, frameon=False, xticks=[], yticks=[]) +make_arrowstyles(ax2) + + +plt.show() +``` + +![新特性示例](https://matplotlib.org/_images/sphx_glr_whats_new_98_4_fancy_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.patches +matplotlib.patches.BoxStyle +matplotlib.patches.BoxStyle.get_styles +matplotlib.patches.ArrowStyle +matplotlib.patches.ArrowStyle.get_styles +matplotlib.axes.Axes.text +matplotlib.axes.Axes.annotate +``` + +## 下载这个示例 + +- [下载python源码: whats_new_98_4_fancy.py](https://matplotlib.org/_downloads/whats_new_98_4_fancy.py) +- [下载Jupyter notebook: whats_new_98_4_fancy.ipynb](https://matplotlib.org/_downloads/whats_new_98_4_fancy.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/whats_new_98_4_fill_between.md b/Python/matplotlab/gallery/pyplots/whats_new_98_4_fill_between.md new file mode 100644 index 00000000..86565426 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/whats_new_98_4_fill_between.md @@ -0,0 +1,36 @@ +# 填充交叉区域 + +填充两条曲线之间的区域。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.arange(-5, 5, 0.01) +y1 = -5*x*x + x + 10 +y2 = 5*x*x + x + +fig, ax = plt.subplots() +ax.plot(x, y1, x, y2, color='black') +ax.fill_between(x, y1, y2, where=y2 >y1, facecolor='yellow', alpha=0.5) +ax.fill_between(x, y1, y2, where=y2 <=y1, facecolor='red', alpha=0.5) +ax.set_title('Fill Between') + +plt.show() +``` + +![填充示例](https://matplotlib.org/_images/sphx_glr_whats_new_98_4_fill_between_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.fill_between +``` + +## 下载这个示例 + +- [下载python源码: whats_new_98_4_fill_between.py](https://matplotlib.org/_downloads/whats_new_98_4_fill_between.py) +- [下载Jupyter notebook: whats_new_98_4_fill_between.ipynb](https://matplotlib.org/_downloads/whats_new_98_4_fill_between.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/whats_new_98_4_legend.md b/Python/matplotlab/gallery/pyplots/whats_new_98_4_legend.md new file mode 100644 index 00000000..c06d96bd --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/whats_new_98_4_legend.md @@ -0,0 +1,39 @@ +# 0.98.4版本图例新特性 + +创建图例并使用阴影和长方体对其进行调整。 + +```python +import matplotlib.pyplot as plt +import numpy as np + + +ax = plt.subplot(111) +t1 = np.arange(0.0, 1.0, 0.01) +for n in [1, 2, 3, 4]: + plt.plot(t1, t1**n, label="n=%d"%(n,)) + +leg = plt.legend(loc='best', ncol=2, mode="expand", shadow=True, fancybox=True) +leg.get_frame().set_alpha(0.5) + + +plt.show() +``` + +![新特性图例示例](https://matplotlib.org/_images/sphx_glr_whats_new_98_4_legend_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.legend +matplotlib.pyplot.legend +matplotlib.legend.Legend +matplotlib.legend.Legend.get_frame +``` + +## 下载这个示例 + +- [下载python源码: whats_new_98_4_legend.py](https://matplotlib.org/_downloads/whats_new_98_4_legend.py) +- [下载Jupyter notebook: whats_new_98_4_legend.ipynb](https://matplotlib.org/_downloads/whats_new_98_4_legend.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/whats_new_99_axes_grid.md b/Python/matplotlab/gallery/pyplots/whats_new_99_axes_grid.md new file mode 100644 index 00000000..a9056db2 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/whats_new_99_axes_grid.md @@ -0,0 +1,67 @@ +# 0.99版本轴网格新特性 + +创建RGB合成图像。 + +```python +import numpy as np +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1.axes_rgb import RGBAxes + + +def get_demo_image(): + # prepare image + delta = 0.5 + + extent = (-3, 4, -4, 3) + x = np.arange(-3.0, 4.001, delta) + y = np.arange(-4.0, 3.001, delta) + X, Y = np.meshgrid(x, y) + Z1 = np.exp(-X**2 - Y**2) + Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) + Z = (Z1 - Z2) * 2 + + return Z, extent + + +def get_rgb(): + Z, extent = get_demo_image() + + Z[Z < 0] = 0. + Z = Z / Z.max() + + R = Z[:13, :13] + G = Z[2:, 2:] + B = Z[:13, 2:] + + return R, G, B + + +fig = plt.figure(1) +ax = RGBAxes(fig, [0.1, 0.1, 0.8, 0.8]) + +r, g, b = get_rgb() +kwargs = dict(origin="lower", interpolation="nearest") +ax.imshow_rgb(r, g, b, **kwargs) + +ax.RGB.set_xlim(0., 9.5) +ax.RGB.set_ylim(0.9, 10.6) + +plt.show() +``` + +![创建RGB合成图像示例](https://matplotlib.org/_images/sphx_glr_whats_new_99_axes_grid_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import mpl_toolkits +mpl_toolkits.axes_grid1.axes_rgb.RGBAxes +mpl_toolkits.axes_grid1.axes_rgb.RGBAxes.imshow_rgb +``` + +## 下载这个示例 + +- [下载python源码: whats_new_99_axes_grid.py](https://matplotlib.org/_downloads/whats_new_99_axes_grid.py) +- [下载Jupyter notebook: whats_new_99_axes_grid.ipynb](https://matplotlib.org/_downloads/whats_new_99_axes_grid.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/whats_new_99_mplot3d.md b/Python/matplotlab/gallery/pyplots/whats_new_99_mplot3d.md new file mode 100644 index 00000000..d45a7f14 --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/whats_new_99_mplot3d.md @@ -0,0 +1,39 @@ +# 0.99版本新增Mplot3d对象 + +创建3D曲面图。 + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib import cm +from mpl_toolkits.mplot3d import Axes3D + +X = np.arange(-5, 5, 0.25) +Y = np.arange(-5, 5, 0.25) +X, Y = np.meshgrid(X, Y) +R = np.sqrt(X**2 + Y**2) +Z = np.sin(R) + +fig = plt.figure() +ax = Axes3D(fig) +ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.viridis) + +plt.show() +``` + +![3D曲面图示例](https://matplotlib.org/_images/sphx_glr_whats_new_99_mplot3d_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import mpl_toolkits +mpl_toolkits.mplot3d.Axes3D +mpl_toolkits.mplot3d.Axes3D.plot_surface +``` + +## 下载这个示例 + +- [下载python源码: whats_new_99_mplot3d.py](https://matplotlib.org/_downloads/whats_new_99_mplot3d.py) +- [下载Jupyter notebook: whats_new_99_mplot3d.ipynb](https://matplotlib.org/_downloads/whats_new_99_mplot3d.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/pyplots/whats_new_99_spines.md b/Python/matplotlab/gallery/pyplots/whats_new_99_spines.md new file mode 100644 index 00000000..c722af0e --- /dev/null +++ b/Python/matplotlab/gallery/pyplots/whats_new_99_spines.md @@ -0,0 +1,72 @@ +# 0.99版本新增Spines对象 + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def adjust_spines(ax,spines): + for loc, spine in ax.spines.items(): + if loc in spines: + spine.set_position(('outward',10)) # outward by 10 points + else: + spine.set_color('none') # don't draw spine + + # turn off ticks where there is no spine + if 'left' in spines: + ax.yaxis.set_ticks_position('left') + else: + # no yaxis ticks + ax.yaxis.set_ticks([]) + + if 'bottom' in spines: + ax.xaxis.set_ticks_position('bottom') + else: + # no xaxis ticks + ax.xaxis.set_ticks([]) + +fig = plt.figure() + +x = np.linspace(0,2*np.pi,100) +y = 2*np.sin(x) + +ax = fig.add_subplot(2,2,1) +ax.plot(x,y) +adjust_spines(ax,['left']) + +ax = fig.add_subplot(2,2,2) +ax.plot(x,y) +adjust_spines(ax,[]) + +ax = fig.add_subplot(2,2,3) +ax.plot(x,y) +adjust_spines(ax,['left','bottom']) + +ax = fig.add_subplot(2,2,4) +ax.plot(x,y) +adjust_spines(ax,['bottom']) + +plt.show() +``` + +![Spines对象绘图示例](https://matplotlib.org/_images/sphx_glr_whats_new_99_spines_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.axis.Axis.set_ticks +matplotlib.axis.XAxis.set_ticks_position +matplotlib.axis.YAxis.set_ticks_position +matplotlib.spines +matplotlib.spines.Spine +matplotlib.spines.Spine.set_color +matplotlib.spines.Spine.set_position +``` + +## 下载这个示例 + +- [下载python源码: whats_new_99_spines.py](https://matplotlib.org/_downloads/whats_new_99_spines.py) +- [下载Jupyter notebook: whats_new_99_spines.ipynb](https://matplotlib.org/_downloads/whats_new_99_spines.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/recipes/common_date_problems.md b/Python/matplotlab/gallery/recipes/common_date_problems.md new file mode 100644 index 00000000..19c73eb5 --- /dev/null +++ b/Python/matplotlab/gallery/recipes/common_date_problems.md @@ -0,0 +1,89 @@ +# 修复常见的日期困扰 + +Matplotlib允许您原生地绘制python日期时间实例,并且在大多数情况下可以很好地选择刻度位置和字符串格式。 有一些事情没有得到如此优雅的处理,这里有一些技巧可以帮助你解决它们。我们将在numpy记录数组中加载一些包含datetime.date对象的样本日期数据: + +```python +In [63]: datafile = cbook.get_sample_data('goog.npz') + +In [64]: r = np.load(datafile)['price_data'].view(np.recarray) + +In [65]: r.dtype +Out[65]: dtype([('date', '] +``` + +你会看到x刻度标签都被压扁了。 + +```python +import matplotlib.cbook as cbook +import matplotlib.dates as mdates +import numpy as np +import matplotlib.pyplot as plt + +with cbook.get_sample_data('goog.npz') as datafile: + r = np.load(datafile)['price_data'].view(np.recarray) + +# Matplotlib prefers datetime instead of np.datetime64. +date = r.date.astype('O') +fig, ax = plt.subplots() +ax.plot(date, r.close) +ax.set_title('Default date handling can cause overlapping labels') +``` + +![修复常见的日期困扰示例](https://matplotlib.org/_images/sphx_glr_common_date_problems_001.png) + +另一个烦恼是,如果您将鼠标悬停在窗口上并在x和y坐标处查看matplotlib工具栏([交互式导航](https://matplotlib.org/users/navigation_toolbar.html#navigation-toolbar))的右下角,您会看到x位置的格式与刻度标签的格式相同, 例如,“2004年12月”。 + +我们想要的是工具栏中的位置具有更高的精确度,例如,为我们提供鼠标悬停的确切日期。 为了解决第一个问题,我们可以使用[matplotlib.figure.Figure.autofmt_xdate()](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.autofmt_xdate) 来修复第二个问题,我们可以使用ax.fmt_xdata属性,该属性可以设置为任何带标量并返回字符串的函数。 matplotlib内置了许多日期格式化程序,因此我们将使用其中之一。 + + +```python +fig, ax = plt.subplots() +ax.plot(date, r.close) + +# rotate and align the tick labels so they look better +fig.autofmt_xdate() + +# use a more precise date string for the x axis locations in the +# toolbar +ax.fmt_xdata = mdates.DateFormatter('%Y-%m-%d') +ax.set_title('fig.autofmt_xdate fixes the labels') +``` + +![修复常见的日期困扰2](https://matplotlib.org/_images/sphx_glr_common_date_problems_002.png) + +现在,当您将鼠标悬停在绘制的数据上时,您将在工具栏中看到日期格式字符串,如2004-12-01。 + +```python +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: common_date_problems.py](https://matplotlib.org/_downloads/common_date_problems.py) +- [下载Jupyter notebook: common_date_problems.ipynb](https://matplotlib.org/_downloads/common_date_problems.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/recipes/create_subplots.md b/Python/matplotlab/gallery/recipes/create_subplots.md new file mode 100644 index 00000000..8f9df5b8 --- /dev/null +++ b/Python/matplotlab/gallery/recipes/create_subplots.md @@ -0,0 +1,46 @@ +# 轻松创建子图 + +在matplotlib的早期版本中,如果你想使用pythonic API并创建一个图形实例,并从中创建一个子图的网格,可能有共享轴,它涉及相当数量的样板代码。例如: + +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.random.randn(50) + +# old style +fig = plt.figure() +ax1 = fig.add_subplot(221) +ax2 = fig.add_subplot(222, sharex=ax1, sharey=ax1) +ax3 = fig.add_subplot(223, sharex=ax1, sharey=ax1) +ax3 = fig.add_subplot(224, sharex=ax1, sharey=ax1) +``` + +![轻松创建子图示例](https://matplotlib.org/_images/sphx_glr_create_subplots_001.png) + +费尔南多·佩雷斯提供了一个很好的方法来创建子图的一切 ``subplots()``(最后注意“s”),并为整个群体打开x和y共享。您可以单独打开轴... + +```python +# new style method 1; unpack the axes +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex=True, sharey=True) +ax1.plot(x) +``` + +![轻松创建子图示例2](https://matplotlib.org/_images/sphx_glr_create_subplots_002.png) + +或者将它们作为支持numpy索引的numrows x numcolumns对象数组返回 + +```python +# new style method 2; use an axes array +fig, axs = plt.subplots(2, 2, sharex=True, sharey=True) +axs[0, 0].plot(x) + +plt.show() +``` + +![轻松创建子图示例3](https://matplotlib.org/_images/sphx_glr_create_subplots_003.png) + +## 下载这个示例 + +- [下载python源码: create_subplots.py](https://matplotlib.org/_downloads/create_subplots.py) +- [下载Jupyter notebook: create_subplots.ipynb](https://matplotlib.org/_downloads/create_subplots.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/recipes/fill_between_alpha.md b/Python/matplotlab/gallery/recipes/fill_between_alpha.md new file mode 100644 index 00000000..343912c8 --- /dev/null +++ b/Python/matplotlab/gallery/recipes/fill_between_alpha.md @@ -0,0 +1,121 @@ +# 在和Alpha之间填充 + +[fill_between()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.fill_between.html#matplotlib.axes.Axes.fill_between)函数在最小和最大边界之间生成阴影区域,这对于说明范围很有用。 它具有非常方便的用于将填充与逻辑范围组合的参数,例如,仅在某个阈值上填充曲线。 + +在最基本的层面上,``fill_between`` 可用于增强图形的视觉外观。让我们将两个财务时间图与左边的简单线图和右边的实线进行比较。 + +```python +import matplotlib.pyplot as plt +import numpy as np +import matplotlib.cbook as cbook + +# load up some sample financial data +with cbook.get_sample_data('goog.npz') as datafile: + r = np.load(datafile)['price_data'].view(np.recarray) +# Matplotlib prefers datetime instead of np.datetime64. +date = r.date.astype('O') +# create two subplots with the shared x and y axes +fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True, sharey=True) + +pricemin = r.close.min() + +ax1.plot(date, r.close, lw=2) +ax2.fill_between(date, pricemin, r.close, facecolor='blue', alpha=0.5) + +for ax in ax1, ax2: + ax.grid(True) + +ax1.set_ylabel('price') +for label in ax2.get_yticklabels(): + label.set_visible(False) + +fig.suptitle('Google (GOOG) daily closing price') +fig.autofmt_xdate() +``` + +![在和Alpha之间填充示例](https://matplotlib.org/_images/sphx_glr_fill_between_alpha_001.png) + +此处不需要Alpha通道,但它可以用于软化颜色以获得更具视觉吸引力的图形。在其他示例中,正如我们将在下面看到的,alpha通道在功能上非常有用,因为阴影区域可以重叠,alpha允许您查看两者。请注意,postscript格式不支持alpha(这是postscript限制,而不是matplotlib限制),因此在使用alpha时保存PNG,PDF或SVG中的数字。 + +我们的下一个例子计算两个随机游走者群体,它们具有不同的正态分布的均值和标准差,从中得出步骤。我们使用共享区域绘制人口平均位置的+/-一个标准偏差。 这里的alpha通道非常有用,而不仅仅是审美。 + +```python +Nsteps, Nwalkers = 100, 250 +t = np.arange(Nsteps) + +# an (Nsteps x Nwalkers) array of random walk steps +S1 = 0.002 + 0.01*np.random.randn(Nsteps, Nwalkers) +S2 = 0.004 + 0.02*np.random.randn(Nsteps, Nwalkers) + +# an (Nsteps x Nwalkers) array of random walker positions +X1 = S1.cumsum(axis=0) +X2 = S2.cumsum(axis=0) + + +# Nsteps length arrays empirical means and standard deviations of both +# populations over time +mu1 = X1.mean(axis=1) +sigma1 = X1.std(axis=1) +mu2 = X2.mean(axis=1) +sigma2 = X2.std(axis=1) + +# plot it! +fig, ax = plt.subplots(1) +ax.plot(t, mu1, lw=2, label='mean population 1', color='blue') +ax.plot(t, mu2, lw=2, label='mean population 2', color='yellow') +ax.fill_between(t, mu1+sigma1, mu1-sigma1, facecolor='blue', alpha=0.5) +ax.fill_between(t, mu2+sigma2, mu2-sigma2, facecolor='yellow', alpha=0.5) +ax.set_title(r'random walkers empirical $\mu$ and $\pm \sigma$ interval') +ax.legend(loc='upper left') +ax.set_xlabel('num steps') +ax.set_ylabel('position') +ax.grid() +``` + +![在和Alpha之间填充示例2](https://matplotlib.org/_images/sphx_glr_fill_between_alpha_002.png) + +where关键字参数非常便于突出显示图形的某些区域。其中布尔掩码的长度与x,ymin和ymax参数的长度相同,并且仅填充布尔掩码为True的区域。在下面的示例中,我们模拟单个随机游走者并计算人口位置的分析平均值和标准差。总体平均值显示为黑色虚线,并且与平均值的正/负一西格玛偏差显示为黄色填充区域。我们使用where掩码X> upper_bound来找到walker在一个sigma边界之上的区域,并将该区域遮蔽为蓝色。 + +```python +Nsteps = 500 +t = np.arange(Nsteps) + +mu = 0.002 +sigma = 0.01 + +# the steps and position +S = mu + sigma*np.random.randn(Nsteps) +X = S.cumsum() + +# the 1 sigma upper and lower analytic population bounds +lower_bound = mu*t - sigma*np.sqrt(t) +upper_bound = mu*t + sigma*np.sqrt(t) + +fig, ax = plt.subplots(1) +ax.plot(t, X, lw=2, label='walker position', color='blue') +ax.plot(t, mu*t, lw=1, label='population mean', color='black', ls='--') +ax.fill_between(t, lower_bound, upper_bound, facecolor='yellow', alpha=0.5, + label='1 sigma range') +ax.legend(loc='upper left') + +# here we use the where argument to only fill the region where the +# walker is above the population 1 sigma boundary +ax.fill_between(t, upper_bound, X, where=X > upper_bound, facecolor='blue', + alpha=0.5) +ax.set_xlabel('num steps') +ax.set_ylabel('position') +ax.grid() +``` + +![在和Alpha之间填充示例3](https://matplotlib.org/_images/sphx_glr_fill_between_alpha_003.png) + +填充区域的另一个方便用途是突出显示轴的水平或垂直跨度 - 因为matplotlib具有一些辅助函数 [axhspan()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.axhspan.html#matplotlib.axes.Axes.axhspan) 和[axvspan()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.axvspan.html#matplotlib.axes.Axes.axvspan) 以及示例[axhspan Demo](https://matplotlib.org/gallery/subplots_axes_and_figures/axhspan_demo.html)。 + +```python +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fill_between_alpha.py](https://matplotlib.org/_downloads/fill_between_alpha.py) +- [下载Jupyter notebook: fill_between_alpha.ipynb](https://matplotlib.org/_downloads/fill_between_alpha.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/recipes/index.md b/Python/matplotlab/gallery/recipes/index.md new file mode 100644 index 00000000..d6d8ec62 --- /dev/null +++ b/Python/matplotlab/gallery/recipes/index.md @@ -0,0 +1,3 @@ +# 我们最喜欢的技巧 + +这是一个简短的教程,示例和代码片段的集合,说明了一些有用的惯例和技巧,以制作更流畅的图形和克服一些matplotlib缺陷。 diff --git a/Python/matplotlab/gallery/recipes/placing_text_boxes.md b/Python/matplotlab/gallery/recipes/placing_text_boxes.md new file mode 100644 index 00000000..98c5ff47 --- /dev/null +++ b/Python/matplotlab/gallery/recipes/placing_text_boxes.md @@ -0,0 +1,37 @@ +# 放置文本框 + +使用文本框装饰轴时,有两个有用的技巧是将文本放在轴坐标中(请参阅[转换教程](https://matplotlib.org/tutorials/advanced/transforms_tutorial.html)),因此文本不会随着x或y限制的变化而移动。 您还可以使用文本的bbox属性用[Patch](https://matplotlib.org/api/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch)实例包围文本 - bbox关键字参数使用带有Patch属性的键的字典。 + +![放置文本框](https://matplotlib.org/_images/sphx_glr_placing_text_boxes_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +np.random.seed(19680801) + +fig, ax = plt.subplots() +x = 30*np.random.randn(10000) +mu = x.mean() +median = np.median(x) +sigma = x.std() +textstr = '\n'.join(( + r'$\mu=%.2f$' % (mu, ), + r'$\mathrm{median}=%.2f$' % (median, ), + r'$\sigma=%.2f$' % (sigma, ))) + +ax.hist(x, 50) +# these are matplotlib.patch.Patch properties +props = dict(boxstyle='round', facecolor='wheat', alpha=0.5) + +# place a text box in upper left in axes coords +ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=14, + verticalalignment='top', bbox=props) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: placing_text_boxes.py](https://matplotlib.org/_downloads/placing_text_boxes.py) +- [下载Jupyter notebook: placing_text_boxes.ipynb](https://matplotlib.org/_downloads/placing_text_boxes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/recipes/share_axis_lims_views.md b/Python/matplotlab/gallery/recipes/share_axis_lims_views.md new file mode 100644 index 00000000..61ab400b --- /dev/null +++ b/Python/matplotlab/gallery/recipes/share_axis_lims_views.md @@ -0,0 +1,25 @@ +# 共享轴限制和视图 + +制作共享轴的两个或更多个图是常见的,例如,两个子图以时间作为公共轴。 当您平移和缩放其中一个时,您希望另一个随身携带。 为此,matplotlib Axes支持sharex和sharey属性。创建[subplot()](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplot.html#matplotlib.pyplot.subplot)或[axes()](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.axes.html#matplotlib.pyplot.axes)实例时,可以传入一个关键字,指示要与之共享的轴。 + +![共享轴限制和视图示例](https://matplotlib.org/_images/sphx_glr_share_axis_lims_views_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +t = np.arange(0, 10, 0.01) + +ax1 = plt.subplot(211) +ax1.plot(t, np.sin(2*np.pi*t)) + +ax2 = plt.subplot(212, sharex=ax1) +ax2.plot(t, np.sin(4*np.pi*t)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: share_axis_lims_views.py](https://matplotlib.org/_downloads/share_axis_lims_views.py) +- [下载Jupyter notebook: share_axis_lims_views.ipynb](https://matplotlib.org/_downloads/share_axis_lims_views.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/recipes/transparent_legends.md b/Python/matplotlab/gallery/recipes/transparent_legends.md new file mode 100644 index 00000000..4a3cddba --- /dev/null +++ b/Python/matplotlab/gallery/recipes/transparent_legends.md @@ -0,0 +1,38 @@ +# 透明、花式的图形 + +有时您在绘制数据之前就知道数据的样子,并且可能知道例如右上角没有太多数据。然后,您可以安全地创建不覆盖数据的图例: + +```python +ax.legend(loc='upper right') +``` + +其他时候你不知道你的数据在哪里,默认的loc ='best'会尝试放置图例: + +```python +ax.legend() +``` + +但是,您的图例可能会与您的数据重叠,在这些情况下,使图例框架透明是很好的。 + +![透明、花式的图形示例](https://matplotlib.org/_images/sphx_glr_transparent_legends_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +np.random.seed(1234) +fig, ax = plt.subplots(1) +ax.plot(np.random.randn(300), 'o-', label='normal distribution') +ax.plot(np.random.rand(300), 's-', label='uniform distribution') +ax.set_ylim(-3, 3) + +ax.legend(fancybox=True, framealpha=0.5) +ax.set_title('fancy, transparent legends') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: transparent_legends.py](https://matplotlib.org/_downloads/transparent_legends.py) +- [下载Jupyter notebook: transparent_legends.ipynb](https://matplotlib.org/_downloads/transparent_legends.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/aspect_loglog.md b/Python/matplotlab/gallery/scales/aspect_loglog.md new file mode 100644 index 00000000..6ad0c7a2 --- /dev/null +++ b/Python/matplotlab/gallery/scales/aspect_loglog.md @@ -0,0 +1,31 @@ +# 双对数 + +![双对数示例](https://matplotlib.org/_images/sphx_glr_aspect_loglog_001.png) + +```python +import matplotlib.pyplot as plt + +fig, (ax1, ax2) = plt.subplots(1, 2) +ax1.set_xscale("log") +ax1.set_yscale("log") +ax1.set_xlim(1e1, 1e3) +ax1.set_ylim(1e2, 1e3) +ax1.set_aspect(1) +ax1.set_title("adjustable = box") + +ax2.set_xscale("log") +ax2.set_yscale("log") +ax2.set_adjustable("datalim") +ax2.plot([1, 3, 10], [1, 9, 100], "o-") +ax2.set_xlim(1e-1, 1e2) +ax2.set_ylim(1e-1, 1e3) +ax2.set_aspect(1) +ax2.set_title("adjustable = datalim") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: aspect_loglog.py](https://matplotlib.org/_downloads/aspect_loglog.py) +- [下载Jupyter notebook: aspect_loglog.ipynb](https://matplotlib.org/_downloads/aspect_loglog.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/custom_scale.md b/Python/matplotlab/gallery/scales/custom_scale.md new file mode 100644 index 00000000..e2cfda0c --- /dev/null +++ b/Python/matplotlab/gallery/scales/custom_scale.md @@ -0,0 +1,187 @@ +# 自定义比例尺 + +通过在墨卡托投影中实现纬度数据的缩放用途来创建自定义比例。 + +![自定义比例尺示例](https://matplotlib.org/_images/sphx_glr_custom_scale_001.png) + +```python +import numpy as np +from numpy import ma +from matplotlib import scale as mscale +from matplotlib import transforms as mtransforms +from matplotlib.ticker import Formatter, FixedLocator +from matplotlib import rcParams + + +# BUG: this example fails with any other setting of axisbelow +rcParams['axes.axisbelow'] = False + + +class MercatorLatitudeScale(mscale.ScaleBase): + """ + Scales data in range -pi/2 to pi/2 (-90 to 90 degrees) using + the system used to scale latitudes in a Mercator projection. + + The scale function: + ln(tan(y) + sec(y)) + + The inverse scale function: + atan(sinh(y)) + + Since the Mercator scale tends to infinity at +/- 90 degrees, + there is user-defined threshold, above and below which nothing + will be plotted. This defaults to +/- 85 degrees. + + source: + http://en.wikipedia.org/wiki/Mercator_projection + """ + + # The scale class must have a member ``name`` that defines the + # string used to select the scale. For example, + # ``gca().set_yscale("mercator")`` would be used to select this + # scale. + name = 'mercator' + + def __init__(self, axis, *, thresh=np.deg2rad(85), **kwargs): + """ + Any keyword arguments passed to ``set_xscale`` and + ``set_yscale`` will be passed along to the scale's + constructor. + + thresh: The degree above which to crop the data. + """ + mscale.ScaleBase.__init__(self) + if thresh >= np.pi / 2: + raise ValueError("thresh must be less than pi/2") + self.thresh = thresh + + def get_transform(self): + """ + Override this method to return a new instance that does the + actual transformation of the data. + + The MercatorLatitudeTransform class is defined below as a + nested class of this one. + """ + return self.MercatorLatitudeTransform(self.thresh) + + def set_default_locators_and_formatters(self, axis): + """ + Override to set up the locators and formatters to use with the + scale. This is only required if the scale requires custom + locators and formatters. Writing custom locators and + formatters is rather outside the scope of this example, but + there are many helpful examples in ``ticker.py``. + + In our case, the Mercator example uses a fixed locator from + -90 to 90 degrees and a custom formatter class to put convert + the radians to degrees and put a degree symbol after the + value:: + """ + class DegreeFormatter(Formatter): + def __call__(self, x, pos=None): + return "%d\N{DEGREE SIGN}" % np.degrees(x) + + axis.set_major_locator(FixedLocator( + np.radians(np.arange(-90, 90, 10)))) + axis.set_major_formatter(DegreeFormatter()) + axis.set_minor_formatter(DegreeFormatter()) + + def limit_range_for_scale(self, vmin, vmax, minpos): + """ + Override to limit the bounds of the axis to the domain of the + transform. In the case of Mercator, the bounds should be + limited to the threshold that was passed in. Unlike the + autoscaling provided by the tick locators, this range limiting + will always be adhered to, whether the axis range is set + manually, determined automatically or changed through panning + and zooming. + """ + return max(vmin, -self.thresh), min(vmax, self.thresh) + + class MercatorLatitudeTransform(mtransforms.Transform): + # There are two value members that must be defined. + # ``input_dims`` and ``output_dims`` specify number of input + # dimensions and output dimensions to the transformation. + # These are used by the transformation framework to do some + # error checking and prevent incompatible transformations from + # being connected together. When defining transforms for a + # scale, which are, by definition, separable and have only one + # dimension, these members should always be set to 1. + input_dims = 1 + output_dims = 1 + is_separable = True + has_inverse = True + + def __init__(self, thresh): + mtransforms.Transform.__init__(self) + self.thresh = thresh + + def transform_non_affine(self, a): + """ + This transform takes an Nx1 ``numpy`` array and returns a + transformed copy. Since the range of the Mercator scale + is limited by the user-specified threshold, the input + array must be masked to contain only valid values. + ``matplotlib`` will handle masked arrays and remove the + out-of-range data from the plot. Importantly, the + ``transform`` method *must* return an array that is the + same shape as the input array, since these values need to + remain synchronized with values in the other dimension. + """ + masked = ma.masked_where((a < -self.thresh) | (a > self.thresh), a) + if masked.mask.any(): + return ma.log(np.abs(ma.tan(masked) + 1.0 / ma.cos(masked))) + else: + return np.log(np.abs(np.tan(a) + 1.0 / np.cos(a))) + + def inverted(self): + """ + Override this method so matplotlib knows how to get the + inverse transform for this transform. + """ + return MercatorLatitudeScale.InvertedMercatorLatitudeTransform( + self.thresh) + + class InvertedMercatorLatitudeTransform(mtransforms.Transform): + input_dims = 1 + output_dims = 1 + is_separable = True + has_inverse = True + + def __init__(self, thresh): + mtransforms.Transform.__init__(self) + self.thresh = thresh + + def transform_non_affine(self, a): + return np.arctan(np.sinh(a)) + + def inverted(self): + return MercatorLatitudeScale.MercatorLatitudeTransform(self.thresh) + +# Now that the Scale class has been defined, it must be registered so +# that ``matplotlib`` can find it. +mscale.register_scale(MercatorLatitudeScale) + + +if __name__ == '__main__': + import matplotlib.pyplot as plt + + t = np.arange(-180.0, 180.0, 0.1) + s = np.radians(t)/2. + + plt.plot(t, s, '-', lw=2) + plt.gca().set_yscale('mercator') + + plt.xlabel('Longitude') + plt.ylabel('Latitude') + plt.title('Mercator: Projection of the Oppressor') + plt.grid(True) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: custom_scale.py](https://matplotlib.org/_downloads/custom_scale.py) +- [下载Jupyter notebook: custom_scale.ipynb](https://matplotlib.org/_downloads/custom_scale.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/index.md b/Python/matplotlab/gallery/scales/index.md new file mode 100644 index 00000000..d9e1629c --- /dev/null +++ b/Python/matplotlab/gallery/scales/index.md @@ -0,0 +1,3 @@ +# 刻度、比例尺 + +这些示例介绍了如何在Matplotlib中处理不同的比例。 \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/log_bar.md b/Python/matplotlab/gallery/scales/log_bar.md new file mode 100644 index 00000000..a9b2e1b1 --- /dev/null +++ b/Python/matplotlab/gallery/scales/log_bar.md @@ -0,0 +1,35 @@ +# 对数条形图 + +绘制具有对数y轴的条形图。 + +![对数条形图示例](https://matplotlib.org/_images/sphx_glr_log_bar_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +data = ((3, 1000), (10, 3), (100, 30), (500, 800), (50, 1)) + +dim = len(data[0]) +w = 0.75 +dimw = w / dim + +fig, ax = plt.subplots() +x = np.arange(len(data)) +for i in range(len(data[0])): + y = [d[i] for d in data] + b = ax.bar(x + i * dimw, y, dimw, bottom=0.001) + +ax.set_xticks(x + dimw / 2, map(str, x)) +ax.set_yscale('log') + +ax.set_xlabel('x') +ax.set_ylabel('y') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: log_bar.py](https://matplotlib.org/_downloads/log_bar.py) +- [下载Jupyter notebook: log_bar.ipynb](https://matplotlib.org/_downloads/log_bar.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/log_demo.md b/Python/matplotlab/gallery/scales/log_demo.md new file mode 100644 index 00000000..d7162ac6 --- /dev/null +++ b/Python/matplotlab/gallery/scales/log_demo.md @@ -0,0 +1,51 @@ +# 对数演示 + +具有对数轴的图的示例。 + +![对数演示](https://matplotlib.org/_images/sphx_glr_log_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Data for plotting +t = np.arange(0.01, 20.0, 0.01) + +# Create figure +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) + +# log y axis +ax1.semilogy(t, np.exp(-t / 5.0)) +ax1.set(title='semilogy') +ax1.grid() + +# log x axis +ax2.semilogx(t, np.sin(2 * np.pi * t)) +ax2.set(title='semilogx') +ax2.grid() + +# log x and y axis +ax3.loglog(t, 20 * np.exp(-t / 10.0), basex=2) +ax3.set(title='loglog base 2 on x') +ax3.grid() + +# With errorbars: clip non-positive values +# Use new data for plotting +x = 10.0**np.linspace(0.0, 2.0, 20) +y = x**2.0 + +ax4.set_xscale("log", nonposx='clip') +ax4.set_yscale("log", nonposy='clip') +ax4.set(title='Errorbars go negative') +ax4.errorbar(x, y, xerr=0.1 * x, yerr=5.0 + 0.75 * y) +# ylim must be set after errorbar to allow errorbar to autoscale limits +ax4.set_ylim(bottom=0.1) + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: log_demo.py](https://matplotlib.org/_downloads/log_demo.py) +- [下载Jupyter notebook: log_demo.ipynb](https://matplotlib.org/_downloads/log_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/log_test.md b/Python/matplotlab/gallery/scales/log_test.md new file mode 100644 index 00000000..a06c3517 --- /dev/null +++ b/Python/matplotlab/gallery/scales/log_test.md @@ -0,0 +1,25 @@ +# 对数轴 + +这是使用semilogx为x轴分配对数刻度的示例。 + +![对数轴示例](https://matplotlib.org/_images/sphx_glr_log_test_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +fig, ax = plt.subplots() + +dt = 0.01 +t = np.arange(dt, 20.0, dt) + +ax.semilogx(t, np.exp(-t / 5.0)) +ax.grid() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: log_test.py](https://matplotlib.org/_downloads/log_test.py) +- [下载Jupyter notebook: log_test.ipynb](https://matplotlib.org/_downloads/log_test.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/power_norm.md b/Python/matplotlab/gallery/scales/power_norm.md new file mode 100644 index 00000000..522a98d2 --- /dev/null +++ b/Python/matplotlab/gallery/scales/power_norm.md @@ -0,0 +1,50 @@ +# 探索规范化 + +多元正态分布的各种归一化。 + +```python +import matplotlib.pyplot as plt +import matplotlib.colors as mcolors +import numpy as np +from numpy.random import multivariate_normal + +data = np.vstack([ + multivariate_normal([10, 10], [[3, 2], [2, 3]], size=100000), + multivariate_normal([30, 20], [[2, 3], [1, 3]], size=1000) +]) + +gammas = [0.8, 0.5, 0.3] + +fig, axes = plt.subplots(nrows=2, ncols=2) + +axes[0, 0].set_title('Linear normalization') +axes[0, 0].hist2d(data[:, 0], data[:, 1], bins=100) + +for ax, gamma in zip(axes.flat[1:], gammas): + ax.set_title(r'Power law $(\gamma=%1.1f)$' % gamma) + ax.hist2d(data[:, 0], data[:, 1], + bins=100, norm=mcolors.PowerNorm(gamma)) + +fig.tight_layout() + +plt.show() +``` + +![探索规范化示例](https://matplotlib.org/_images/sphx_glr_power_norm_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.colors +matplotlib.colors.PowerNorm +matplotlib.axes.Axes.hist2d +matplotlib.pyplot.hist2d +``` + +## 下载这个示例 + +- [下载python源码: power_norm.py](https://matplotlib.org/_downloads/power_norm.py) +- [下载Jupyter notebook: power_norm.ipynb](https://matplotlib.org/_downloads/power_norm.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/scales.md b/Python/matplotlab/gallery/scales/scales.md new file mode 100644 index 00000000..c6936fca --- /dev/null +++ b/Python/matplotlab/gallery/scales/scales.md @@ -0,0 +1,63 @@ +# 比例尺 + +说明应用于轴的比例变换,例如: log,symlog,logit。 + +![比例尺示例](https://matplotlib.org/_images/sphx_glr_scales_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.ticker import NullFormatter + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# make up some data in the interval ]0, 1[ +y = np.random.normal(loc=0.5, scale=0.4, size=1000) +y = y[(y > 0) & (y < 1)] +y.sort() +x = np.arange(len(y)) + +# plot with various axes scales +fig, axs = plt.subplots(2, 2, sharex=True) +fig.subplots_adjust(left=0.08, right=0.98, wspace=0.3) + +# linear +ax = axs[0, 0] +ax.plot(x, y) +ax.set_yscale('linear') +ax.set_title('linear') +ax.grid(True) + + +# log +ax = axs[0, 1] +ax.plot(x, y) +ax.set_yscale('log') +ax.set_title('log') +ax.grid(True) + + +# symmetric log +ax = axs[1, 1] +ax.plot(x, y - y.mean()) +ax.set_yscale('symlog', linthreshy=0.02) +ax.set_title('symlog') +ax.grid(True) + +# logit +ax = axs[1, 0] +ax.plot(x, y) +ax.set_yscale('logit') +ax.set_title('logit') +ax.grid(True) +ax.yaxis.set_minor_formatter(NullFormatter()) + + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: scales.py](https://matplotlib.org/_downloads/scales.py) +- [下载Jupyter notebook: scales.ipynb](https://matplotlib.org/_downloads/scales.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/scales/symlog_demo.md b/Python/matplotlab/gallery/scales/symlog_demo.md new file mode 100644 index 00000000..52d2a133 --- /dev/null +++ b/Python/matplotlab/gallery/scales/symlog_demo.md @@ -0,0 +1,41 @@ +# Symlog演示 + +示例使用symlog(对称对数)轴缩放。 + +![Symlog演示](https://matplotlib.org/_images/sphx_glr_symlog_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +dt = 0.01 +x = np.arange(-50.0, 50.0, dt) +y = np.arange(0, 100.0, dt) + +plt.subplot(311) +plt.plot(x, y) +plt.xscale('symlog') +plt.ylabel('symlogx') +plt.grid(True) +plt.gca().xaxis.grid(True, which='minor') # minor grid on too + +plt.subplot(312) +plt.plot(y, x) +plt.yscale('symlog') +plt.ylabel('symlogy') + +plt.subplot(313) +plt.plot(x, np.sin(x / 3.0)) +plt.xscale('symlog') +plt.yscale('symlog', linthreshy=0.015) +plt.grid(True) +plt.ylabel('symlog both') + +plt.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: symlog_demo.py](https://matplotlib.org/_downloads/symlog_demo.py) +- [下载Jupyter notebook: symlog_demo.ipynb](https://matplotlib.org/_downloads/symlog_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/arrow_guide.md b/Python/matplotlab/gallery/shapes_and_collections/arrow_guide.md new file mode 100644 index 00000000..97dee18e --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/arrow_guide.md @@ -0,0 +1,101 @@ +# 箭头指南 + +向图表添加箭头图像。 + +箭头通常用于注释图表。本教程介绍如何绘制在绘图上的数据限制发生更改时表现不同的箭头。通常,绘图上的点可以固定在“数据空间”或“显示空间”中。当数据限制被改变时,数据空间中绘制的东西会移动 - 散点图中的点就是一个例子。当数据限制被改变时,在显示空间中绘制的东西保持静止 - 例如图形标题或轴标签。 + +箭头由头部(可能是尾部)和在起点和终点之间绘制的杆组成,从现在开始称为“锚点”。 这里我们展示了绘制箭头的三个用例,具体取决于是否需要在数据或显示空间中修复头部或锚点: + +1. 头部形状固定在显示空间中,锚点固定在数据空间中。 +1. 头部形状和锚点固定在展示空间中。 +1. 再数据空间中固定的整个图像补丁的程序 + +下面依次介绍每个用例。 + +```python +import matplotlib.patches as mpatches +import matplotlib.pyplot as plt +x_tail = 0.1 +y_tail = 0.1 +x_head = 0.9 +y_head = 0.9 +dx = x_head - x_tail +dy = y_head - y_tail +``` + +## 头部形状固定在显示空间中,锚点固定在数据空间中 + +如果要注释绘图,并且如果平移或缩放绘图,则不希望箭头更改形状或位置,这非常有用。请注意,当轴限制发生变化时。 + +在这种情况下,我们使用 [patches.FancyArrowPatch](https://matplotlib.org/api/_as_gen/matplotlib.patches.FancyArrowPatch.html#matplotlib.patches.FancyArrowPatch)。 + +请注意,更改轴限制时,箭头形状保持不变,但锚点会移动。 + +```python +fig, axs = plt.subplots(nrows=2) +arrow = mpatches.FancyArrowPatch((x_tail, y_tail), (dx, dy), + mutation_scale=100) +axs[0].add_patch(arrow) + +arrow = mpatches.FancyArrowPatch((x_tail, y_tail), (dx, dy), + mutation_scale=100) +axs[1].add_patch(arrow) +axs[1].set_xlim(0, 2) +axs[1].set_ylim(0, 2) +``` + +![箭头指南示例](https://matplotlib.org/_images/sphx_glr_arrow_guide_001.png) + +## 头部形状和锚点固定在展示空间中 + +如果要注释绘图,并且如果平移或缩放绘图,则不希望箭头更改形状或位置,这非常有用。 + +在这种情况下,我们使用 [patches.FancyArrowPatch](https://matplotlib.org/api/_as_gen/matplotlib.patches.FancyArrowPatch.html#matplotlib.patches.FancyArrowPatch) ,并传递关键字参数transform = ax.transAxes,其中ax是我们添加补丁的轴。 + +请注意,更改轴限制时,箭头形状和位置保持不变。 + +```python +fig, axs = plt.subplots(nrows=2) +arrow = mpatches.FancyArrowPatch((x_tail, y_tail), (dx, dy), + mutation_scale=100, + transform=axs[0].transAxes) +axs[0].add_patch(arrow) + +arrow = mpatches.FancyArrowPatch((x_tail, y_tail), (dx, dy), + mutation_scale=100, + transform=axs[1].transAxes) +axs[1].add_patch(arrow) +axs[1].set_xlim(0, 2) +axs[1].set_ylim(0, 2) +``` + +![箭头指南2](https://matplotlib.org/_images/sphx_glr_arrow_guide_002.png) + +## 头部形状和锚点固定在数据空间中 + +在这种情况下,我们使用 [patches.Arrow](https://matplotlib.org/api/_as_gen/matplotlib.patches.Arrow.html#matplotlib.patches.Arrow) + +请注意,更改轴限制时,箭头形状和位置会发生变化。 + +```python +fig, axs = plt.subplots(nrows=2) + +arrow = mpatches.Arrow(x_tail, y_tail, dx, dy) +axs[0].add_patch(arrow) + +arrow = mpatches.Arrow(x_tail, y_tail, dx, dy) +axs[1].add_patch(arrow) +axs[1].set_xlim(0, 2) +axs[1].set_ylim(0, 2) +``` + +![箭头指南3](https://matplotlib.org/_images/sphx_glr_arrow_guide_003.png) + +```python +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: arrow_guide.py](https://matplotlib.org/_downloads/arrow_guide.py) +- [下载Jupyter notebook: arrow_guide.ipynb](https://matplotlib.org/_downloads/arrow_guide.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/artist_reference.md b/Python/matplotlab/gallery/shapes_and_collections/artist_reference.md new file mode 100644 index 00000000..6aef964a --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/artist_reference.md @@ -0,0 +1,132 @@ +# matplotlib艺术家对象的参考 + +此示例显示使用matplotlib API绘制的几个matplotlib的图形基元(艺术家)。[艺术家对象API](https://matplotlib.org/api/artist_api.html#artist-api)提供完整的艺术家列表和文档。 + +Copyright (c) 2010, Bartosz Telenczuk BSD License + +```python +import matplotlib.pyplot as plt +import numpy as np +import matplotlib.path as mpath +import matplotlib.lines as mlines +import matplotlib.patches as mpatches +from matplotlib.collections import PatchCollection + + +def label(xy, text): + y = xy[1] - 0.15 # shift y-value for label so that it's below the artist + plt.text(xy[0], y, text, ha="center", family='sans-serif', size=14) + + +fig, ax = plt.subplots() +# create 3x3 grid to plot the artists +grid = np.mgrid[0.2:0.8:3j, 0.2:0.8:3j].reshape(2, -1).T + +patches = [] + +# add a circle +circle = mpatches.Circle(grid[0], 0.1, ec="none") +patches.append(circle) +label(grid[0], "Circle") + +# add a rectangle +rect = mpatches.Rectangle(grid[1] - [0.025, 0.05], 0.05, 0.1, ec="none") +patches.append(rect) +label(grid[1], "Rectangle") + +# add a wedge +wedge = mpatches.Wedge(grid[2], 0.1, 30, 270, ec="none") +patches.append(wedge) +label(grid[2], "Wedge") + +# add a Polygon +polygon = mpatches.RegularPolygon(grid[3], 5, 0.1) +patches.append(polygon) +label(grid[3], "Polygon") + +# add an ellipse +ellipse = mpatches.Ellipse(grid[4], 0.2, 0.1) +patches.append(ellipse) +label(grid[4], "Ellipse") + +# add an arrow +arrow = mpatches.Arrow(grid[5, 0] - 0.05, grid[5, 1] - 0.05, 0.1, 0.1, + width=0.1) +patches.append(arrow) +label(grid[5], "Arrow") + +# add a path patch +Path = mpath.Path +path_data = [ + (Path.MOVETO, [0.018, -0.11]), + (Path.CURVE4, [-0.031, -0.051]), + (Path.CURVE4, [-0.115, 0.073]), + (Path.CURVE4, [-0.03, 0.073]), + (Path.LINETO, [-0.011, 0.039]), + (Path.CURVE4, [0.043, 0.121]), + (Path.CURVE4, [0.075, -0.005]), + (Path.CURVE4, [0.035, -0.027]), + (Path.CLOSEPOLY, [0.018, -0.11])] +codes, verts = zip(*path_data) +path = mpath.Path(verts + grid[6], codes) +patch = mpatches.PathPatch(path) +patches.append(patch) +label(grid[6], "PathPatch") + +# add a fancy box +fancybox = mpatches.FancyBboxPatch( + grid[7] - [0.025, 0.05], 0.05, 0.1, + boxstyle=mpatches.BoxStyle("Round", pad=0.02)) +patches.append(fancybox) +label(grid[7], "FancyBboxPatch") + +# add a line +x, y = np.array([[-0.06, 0.0, 0.1], [0.05, -0.05, 0.05]]) +line = mlines.Line2D(x + grid[8, 0], y + grid[8, 1], lw=5., alpha=0.3) +label(grid[8], "Line2D") + +colors = np.linspace(0, 1, len(patches)) +collection = PatchCollection(patches, cmap=plt.cm.hsv, alpha=0.3) +collection.set_array(np.array(colors)) +ax.add_collection(collection) +ax.add_line(line) + +plt.axis('equal') +plt.axis('off') +plt.tight_layout() + +plt.show() +``` + +![matplotlib艺术家对象的参考示例](https://matplotlib.org/_images/sphx_glr_artist_reference_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.lines +matplotlib.lines.Line2D +matplotlib.patches +matplotlib.patches.Circle +matplotlib.patches.Ellipse +matplotlib.patches.Wedge +matplotlib.patches.Rectangle +matplotlib.patches.Arrow +matplotlib.patches.PathPatch +matplotlib.patches.FancyBboxPatch +matplotlib.patches.RegularPolygon +matplotlib.collections +matplotlib.collections.PatchCollection +matplotlib.cm.ScalarMappable.set_array +matplotlib.axes.Axes.add_collection +matplotlib.axes.Axes.add_line +``` + +## 下载这个示例 + +- [下载python源码: artist_reference.py](https://matplotlib.org/_downloads/artist_reference.py) +- [下载Jupyter notebook: artist_reference.ipynb](https://matplotlib.org/_downloads/artist_reference.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/collections.md b/Python/matplotlab/gallery/shapes_and_collections/collections.md new file mode 100644 index 00000000..cb7dba93 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/collections.md @@ -0,0 +1,140 @@ +# 具有自动缩放功能的Line,Poly和RegularPoly Collection + +对于前两个子图,我们将使用螺旋。它们的大小将以图表单位设置,而不是数据单位。它们的位置将通过使用LineCollection和PolyCollection的“偏移”和“transOffset”kwargs以数据单位设置。 + +第三个子图将生成正多边形,具有与前两个相同类型的缩放和定位。 + +最后一个子图说明了使用 “offsets =(xo,yo)”,即单个元组而不是元组列表来生成连续的偏移曲线,其中偏移量以数据单位给出。 此行为仅适用于LineCollection。 + +```python +import matplotlib.pyplot as plt +from matplotlib import collections, colors, transforms +import numpy as np + +nverts = 50 +npts = 100 + +# Make some spirals +r = np.arange(nverts) +theta = np.linspace(0, 2*np.pi, nverts) +xx = r * np.sin(theta) +yy = r * np.cos(theta) +spiral = np.column_stack([xx, yy]) + +# Fixing random state for reproducibility +rs = np.random.RandomState(19680801) + +# Make some offsets +xyo = rs.randn(npts, 2) + +# Make a list of colors cycling through the default series. +colors = [colors.to_rgba(c) + for c in plt.rcParams['axes.prop_cycle'].by_key()['color']] + +fig, axes = plt.subplots(2, 2) +fig.subplots_adjust(top=0.92, left=0.07, right=0.97, + hspace=0.3, wspace=0.3) +((ax1, ax2), (ax3, ax4)) = axes # unpack the axes + + +col = collections.LineCollection([spiral], offsets=xyo, + transOffset=ax1.transData) +trans = fig.dpi_scale_trans + transforms.Affine2D().scale(1.0/72.0) +col.set_transform(trans) # the points to pixels transform +# Note: the first argument to the collection initializer +# must be a list of sequences of x,y tuples; we have only +# one sequence, but we still have to put it in a list. +ax1.add_collection(col, autolim=True) +# autolim=True enables autoscaling. For collections with +# offsets like this, it is neither efficient nor accurate, +# but it is good enough to generate a plot that you can use +# as a starting point. If you know beforehand the range of +# x and y that you want to show, it is better to set them +# explicitly, leave out the autolim kwarg (or set it to False), +# and omit the 'ax1.autoscale_view()' call below. + +# Make a transform for the line segments such that their size is +# given in points: +col.set_color(colors) + +ax1.autoscale_view() # See comment above, after ax1.add_collection. +ax1.set_title('LineCollection using offsets') + + +# The same data as above, but fill the curves. +col = collections.PolyCollection([spiral], offsets=xyo, + transOffset=ax2.transData) +trans = transforms.Affine2D().scale(fig.dpi/72.0) +col.set_transform(trans) # the points to pixels transform +ax2.add_collection(col, autolim=True) +col.set_color(colors) + + +ax2.autoscale_view() +ax2.set_title('PolyCollection using offsets') + +# 7-sided regular polygons + +col = collections.RegularPolyCollection( + 7, sizes=np.abs(xx) * 10.0, offsets=xyo, transOffset=ax3.transData) +trans = transforms.Affine2D().scale(fig.dpi / 72.0) +col.set_transform(trans) # the points to pixels transform +ax3.add_collection(col, autolim=True) +col.set_color(colors) +ax3.autoscale_view() +ax3.set_title('RegularPolyCollection using offsets') + + +# Simulate a series of ocean current profiles, successively +# offset by 0.1 m/s so that they form what is sometimes called +# a "waterfall" plot or a "stagger" plot. + +nverts = 60 +ncurves = 20 +offs = (0.1, 0.0) + +yy = np.linspace(0, 2*np.pi, nverts) +ym = np.max(yy) +xx = (0.2 + (ym - yy) / ym) ** 2 * np.cos(yy - 0.4) * 0.5 +segs = [] +for i in range(ncurves): + xxx = xx + 0.02*rs.randn(nverts) + curve = np.column_stack([xxx, yy * 100]) + segs.append(curve) + +col = collections.LineCollection(segs, offsets=offs) +ax4.add_collection(col, autolim=True) +col.set_color(colors) +ax4.autoscale_view() +ax4.set_title('Successive data offsets') +ax4.set_xlabel('Zonal velocity component (m/s)') +ax4.set_ylabel('Depth (m)') +# Reverse the y-axis so depth increases downward +ax4.set_ylim(ax4.get_ylim()[::-1]) + + +plt.show() +``` + +![缩放功能示例](https://matplotlib.org/_images/sphx_glr_collections_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.figure.Figure +matplotlib.collections +matplotlib.collections.LineCollection +matplotlib.collections.RegularPolyCollection +matplotlib.axes.Axes.add_collection +matplotlib.axes.Axes.autoscale_view +matplotlib.transforms.Affine2D +matplotlib.transforms.Affine2D.scale +``` + +## 下载这个示例 + +- [下载python源码: collections.py](https://matplotlib.org/_downloads/collections.py) +- [下载Jupyter notebook: collections.ipynb](https://matplotlib.org/_downloads/collections.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/compound_path.md b/Python/matplotlab/gallery/shapes_and_collections/compound_path.md new file mode 100644 index 00000000..d48d851f --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/compound_path.md @@ -0,0 +1,54 @@ +# 复合路径 + +制作复合路径 - 在这种情况下是两个简单的多边形,一个矩形和一个三角形。使用 ``CLOSEPOLY`` 和 ``MOVETO`` 作为复合路径的不同部分。 + +```python +import numpy as np +from matplotlib.path import Path +from matplotlib.patches import PathPatch +import matplotlib.pyplot as plt + + +vertices = [] +codes = [] + +codes = [Path.MOVETO] + [Path.LINETO]*3 + [Path.CLOSEPOLY] +vertices = [(1, 1), (1, 2), (2, 2), (2, 1), (0, 0)] + +codes += [Path.MOVETO] + [Path.LINETO]*2 + [Path.CLOSEPOLY] +vertices += [(4, 4), (5, 5), (5, 4), (0, 0)] + +vertices = np.array(vertices, float) +path = Path(vertices, codes) + +pathpatch = PathPatch(path, facecolor='None', edgecolor='green') + +fig, ax = plt.subplots() +ax.add_patch(pathpatch) +ax.set_title('A compound path') + +ax.autoscale_view() + +plt.show() +``` + +![复合路径示例](https://matplotlib.org/_images/sphx_glr_compound_path_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.patches +matplotlib.patches.PathPatch +matplotlib.axes.Axes.add_patch +matplotlib.axes.Axes.autoscale_view +``` + +## 下载这个示例 + +- [下载python源码: compound_path.py](https://matplotlib.org/_downloads/compound_path.py) +- [下载Jupyter notebook: compound_path.ipynb](https://matplotlib.org/_downloads/compound_path.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/dolphin.md b/Python/matplotlab/gallery/shapes_and_collections/dolphin.md new file mode 100644 index 00000000..4e9ed676 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/dolphin.md @@ -0,0 +1,123 @@ +# 绘制海豚 + +此示例显示如何使用[Path](https://matplotlib.org/api/path_api.html#matplotlib.path.Path),[PathPatch](https://matplotlib.org/api/_as_gen/matplotlib.patches.PathPatch.html#matplotlib.patches.PathPatch)和[transforms](https://matplotlib.org/api/transformations.html#module-matplotlib.transforms)类绘制和操作给定顶点和节点的形状。 + +```python +import matplotlib.cm as cm +import matplotlib.pyplot as plt +from matplotlib.patches import Circle, PathPatch +from matplotlib.path import Path +from matplotlib.transforms import Affine2D +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +r = np.random.rand(50) +t = np.random.rand(50) * np.pi * 2.0 +x = r * np.cos(t) +y = r * np.sin(t) + +fig, ax = plt.subplots(figsize=(6, 6)) +circle = Circle((0, 0), 1, facecolor='none', + edgecolor=(0, 0.8, 0.8), linewidth=3, alpha=0.5) +ax.add_patch(circle) + +im = plt.imshow(np.random.random((100, 100)), + origin='lower', cmap=cm.winter, + interpolation='spline36', + extent=([-1, 1, -1, 1])) +im.set_clip_path(circle) + +plt.plot(x, y, 'o', color=(0.9, 0.9, 1.0), alpha=0.8) + +# Dolphin from OpenClipart library by Andy Fitzsimon +# +# +# +# +# + +dolphin = """ +M -0.59739425,160.18173 C -0.62740401,160.18885 -0.57867129,160.11183 +-0.57867129,160.11183 C -0.57867129,160.11183 -0.5438361,159.89315 +-0.39514638,159.81496 C -0.24645668,159.73678 -0.18316813,159.71981 +-0.18316813,159.71981 C -0.18316813,159.71981 -0.10322971,159.58124 +-0.057804323,159.58725 C -0.029723983,159.58913 -0.061841603,159.60356 +-0.071265813,159.62815 C -0.080250183,159.65325 -0.082918513,159.70554 +-0.061841203,159.71248 C -0.040763903,159.7194 -0.0066711426,159.71091 +0.077336307,159.73612 C 0.16879567,159.76377 0.28380306,159.86448 +0.31516668,159.91533 C 0.3465303,159.96618 0.5011127,160.1771 +0.5011127,160.1771 C 0.63668998,160.19238 0.67763022,160.31259 +0.66556395,160.32668 C 0.65339985,160.34212 0.66350443,160.33642 +0.64907098,160.33088 C 0.63463742,160.32533 0.61309688,160.297 +0.5789627,160.29339 C 0.54348657,160.28968 0.52329693,160.27674 +0.50728856,160.27737 C 0.49060916,160.27795 0.48965803,160.31565 +0.46114204,160.33673 C 0.43329696,160.35786 0.4570711,160.39871 +0.43309565,160.40685 C 0.4105108,160.41442 0.39416631,160.33027 +0.3954995,160.2935 C 0.39683269,160.25672 0.43807996,160.21522 +0.44567915,160.19734 C 0.45327833,160.17946 0.27946869,159.9424 +-0.061852613,159.99845 C -0.083965233,160.0427 -0.26176109,160.06683 +-0.26176109,160.06683 C -0.30127962,160.07028 -0.21167141,160.09731 +-0.24649368,160.1011 C -0.32642366,160.11569 -0.34521187,160.06895 +-0.40622293,160.0819 C -0.467234,160.09485 -0.56738444,160.17461 +-0.59739425,160.18173 +""" + +vertices = [] +codes = [] +parts = dolphin.split() +i = 0 +code_map = { + 'M': (Path.MOVETO, 1), + 'C': (Path.CURVE4, 3), + 'L': (Path.LINETO, 1)} + +while i < len(parts): + code = parts[i] + path_code, npoints = code_map[code] + codes.extend([path_code] * npoints) + vertices.extend([[float(x) for x in y.split(',')] for y in + parts[i + 1:i + npoints + 1]]) + i += npoints + 1 +vertices = np.array(vertices, float) +vertices[:, 1] -= 160 + +dolphin_path = Path(vertices, codes) +dolphin_patch = PathPatch(dolphin_path, facecolor=(0.6, 0.6, 0.6), + edgecolor=(0.0, 0.0, 0.0)) +ax.add_patch(dolphin_patch) + +vertices = Affine2D().rotate_deg(60).transform(vertices) +dolphin_path2 = Path(vertices, codes) +dolphin_patch2 = PathPatch(dolphin_path2, facecolor=(0.5, 0.5, 0.5), + edgecolor=(0.0, 0.0, 0.0)) +ax.add_patch(dolphin_patch2) + +plt.show() +``` + +![绘制海豚示例](https://matplotlib.org/_images/sphx_glr_dolphin_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.patches +matplotlib.patches.PathPatch +matplotlib.patches.Circle +matplotlib.axes.Axes.add_patch +matplotlib.transforms +matplotlib.transforms.Affine2D +matplotlib.transforms.Affine2D.rotate_deg +``` + +## 下载这个示例 + +- [下载python源码: dolphin.py](https://matplotlib.org/_downloads/dolphin.py) +- [下载Jupyter notebook: dolphin.ipynb](https://matplotlib.org/_downloads/dolphin.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/donut.md b/Python/matplotlab/gallery/shapes_and_collections/donut.md new file mode 100644 index 00000000..94c94701 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/donut.md @@ -0,0 +1,86 @@ +# 绘制甜甜圈 + +Draw donuts (miam!) using [Path](https://matplotlib.org/api/path_api.html#matplotlib.path.Path)s and [PathPatches](https://matplotlib.org/api/_as_gen/matplotlib.patches.PathPatch.html#matplotlib.patches.PathPatch). This example shows the effect of the path's orientations in a compound path. + +```python +import numpy as np +import matplotlib.path as mpath +import matplotlib.patches as mpatches +import matplotlib.pyplot as plt + + +def wise(v): + if v == 1: + return "CCW" + else: + return "CW" + + +def make_circle(r): + t = np.arange(0, np.pi * 2.0, 0.01) + t = t.reshape((len(t), 1)) + x = r * np.cos(t) + y = r * np.sin(t) + return np.hstack((x, y)) + +Path = mpath.Path + +fig, ax = plt.subplots() + +inside_vertices = make_circle(0.5) +outside_vertices = make_circle(1.0) +codes = np.ones( + len(inside_vertices), dtype=mpath.Path.code_type) * mpath.Path.LINETO +codes[0] = mpath.Path.MOVETO + +for i, (inside, outside) in enumerate(((1, 1), (1, -1), (-1, 1), (-1, -1))): + # Concatenate the inside and outside subpaths together, changing their + # order as needed + vertices = np.concatenate((outside_vertices[::outside], + inside_vertices[::inside])) + # Shift the path + vertices[:, 0] += i * 2.5 + # The codes will be all "LINETO" commands, except for "MOVETO"s at the + # beginning of each subpath + all_codes = np.concatenate((codes, codes)) + # Create the Path object + path = mpath.Path(vertices, all_codes) + # Add plot it + patch = mpatches.PathPatch(path, facecolor='#885500', edgecolor='black') + ax.add_patch(patch) + + ax.annotate("Outside %s,\nInside %s" % (wise(outside), wise(inside)), + (i * 2.5, -1.5), va="top", ha="center") + +ax.set_xlim(-2, 10) +ax.set_ylim(-3, 2) +ax.set_title('Mmm, donuts!') +ax.set_aspect(1.0) +plt.show() +``` + +![绘制甜甜圈示例](https://matplotlib.org/_images/sphx_glr_donut_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.patches +matplotlib.patches.PathPatch +matplotlib.patches.Circle +matplotlib.axes.Axes.add_patch +matplotlib.axes.Axes.annotate +matplotlib.axes.Axes.set_aspect +matplotlib.axes.Axes.set_xlim +matplotlib.axes.Axes.set_ylim +matplotlib.axes.Axes.set_title +``` + +## 下载这个示例 + +- [下载python源码: donut.py](https://matplotlib.org/_downloads/donut.py) +- [下载Jupyter notebook: donut.ipynb](https://matplotlib.org/_downloads/donut.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/ellipse_collection.md b/Python/matplotlab/gallery/shapes_and_collections/ellipse_collection.md new file mode 100644 index 00000000..a2a9a36a --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/ellipse_collection.md @@ -0,0 +1,53 @@ +# 椭圆集合 + +绘制椭圆的集合。虽然使用 [EllipseCollection](https://matplotlib.org/api/collections_api.html#matplotlib.collections.EllipseCollection) 或[PathCollection](https://matplotlib.org/api/collections_api.html#matplotlib.collections.PathCollection) 同样可行,但使用[EllipseCollection](https://matplotlib.org/api/collections_api.html#matplotlib.collections.EllipseCollection) 可以实现更短的代码。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.collections import EllipseCollection + +x = np.arange(10) +y = np.arange(15) +X, Y = np.meshgrid(x, y) + +XY = np.column_stack((X.ravel(), Y.ravel())) + +ww = X / 10.0 +hh = Y / 15.0 +aa = X * 9 + + +fig, ax = plt.subplots() + +ec = EllipseCollection(ww, hh, aa, units='x', offsets=XY, + transOffset=ax.transData) +ec.set_array((X + Y).ravel()) +ax.add_collection(ec) +ax.autoscale_view() +ax.set_xlabel('X') +ax.set_ylabel('y') +cbar = plt.colorbar(ec) +cbar.set_label('X+Y') +plt.show() +``` + +![椭圆集合示例](https://matplotlib.org/_images/sphx_glr_ellipse_collection_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.collections +matplotlib.collections.EllipseCollection +matplotlib.axes.Axes.add_collection +matplotlib.axes.Axes.autoscale_view +matplotlib.cm.ScalarMappable.set_array +``` + +## 下载这个示例 + +- [下载python源码: ellipse_collection.py](https://matplotlib.org/_downloads/ellipse_collection.py) +- [下载Jupyter notebook: ellipse_collection.ipynb](https://matplotlib.org/_downloads/ellipse_collection.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/ellipse_demo.md b/Python/matplotlab/gallery/shapes_and_collections/ellipse_demo.md new file mode 100644 index 00000000..c580e7b1 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/ellipse_demo.md @@ -0,0 +1,78 @@ +# 椭圆演示 + +绘制多个椭圆。此处绘制单个椭圆。将其与[Ellipse集合示例](https://matplotlib.org/gallery/shapes_and_collections/ellipse_collection.html)进行比较。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.patches import Ellipse + +NUM = 250 + +ells = [Ellipse(xy=np.random.rand(2) * 10, + width=np.random.rand(), height=np.random.rand(), + angle=np.random.rand() * 360) + for i in range(NUM)] + +fig, ax = plt.subplots(subplot_kw={'aspect': 'equal'}) +for e in ells: + ax.add_artist(e) + e.set_clip_box(ax.bbox) + e.set_alpha(np.random.rand()) + e.set_facecolor(np.random.rand(3)) + +ax.set_xlim(0, 10) +ax.set_ylim(0, 10) + +plt.show() +``` + +![椭圆演示](https://matplotlib.org/_images/sphx_glr_ellipse_demo_001.png) + +# 椭圆旋转 + +绘制许多不同角度的椭圆。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.patches import Ellipse + +delta = 45.0 # degrees + +angles = np.arange(0, 360 + delta, delta) +ells = [Ellipse((1, 1), 4, 2, a) for a in angles] + +a = plt.subplot(111, aspect='equal') + +for e in ells: + e.set_clip_box(a.bbox) + e.set_alpha(0.1) + a.add_artist(e) + +plt.xlim(-2, 4) +plt.ylim(-1, 3) + +plt.show() +``` + +![椭圆演示2](https://matplotlib.org/_images/sphx_glr_ellipse_demo_002.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.patches +matplotlib.patches.Ellipse +matplotlib.axes.Axes.add_artist +matplotlib.artist.Artist.set_clip_box +matplotlib.artist.Artist.set_alpha +matplotlib.patches.Patch.set_facecolor +``` + +## 下载这个示例 + +- [下载python源码: ellipse_demo.py](https://matplotlib.org/_downloads/ellipse_demo.py) +- [下载Jupyter notebook: ellipse_demo.ipynb](https://matplotlib.org/_downloads/ellipse_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/fancybox_demo.md b/Python/matplotlab/gallery/shapes_and_collections/fancybox_demo.md new file mode 100644 index 00000000..41b3ac83 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/fancybox_demo.md @@ -0,0 +1,217 @@ +# Fancybox演示 + +使用Matplotlib绘制精美的盒子。 + +以下示例显示如何绘制具有不同视觉属性的框。 + +```python +import matplotlib.pyplot as plt +import matplotlib.transforms as mtransforms +import matplotlib.patches as mpatch +from matplotlib.patches import FancyBboxPatch +``` + +首先,我们将展示一些带有fancybox的样本盒。 + +```python +styles = mpatch.BoxStyle.get_styles() +spacing = 1.2 + +figheight = (spacing * len(styles) + .5) +fig1 = plt.figure(1, (4 / 1.5, figheight / 1.5)) +fontsize = 0.3 * 72 + +for i, stylename in enumerate(sorted(styles)): + fig1.text(0.5, (spacing * (len(styles) - i) - 0.5) / figheight, stylename, + ha="center", + size=fontsize, + transform=fig1.transFigure, + bbox=dict(boxstyle=stylename, fc="w", ec="k")) + +plt.show() +``` + +![Fancybox演示示例](https://matplotlib.org/_images/sphx_glr_fancybox_demo_001.png) + +接下来,我们将同时展示多个精美的盒子。 + +```python +# Bbox object around which the fancy box will be drawn. +bb = mtransforms.Bbox([[0.3, 0.4], [0.7, 0.6]]) + + +def draw_bbox(ax, bb): + # boxstyle=square with pad=0, i.e. bbox itself. + p_bbox = FancyBboxPatch((bb.xmin, bb.ymin), + abs(bb.width), abs(bb.height), + boxstyle="square,pad=0.", + ec="k", fc="none", zorder=10., + ) + ax.add_patch(p_bbox) + + +def test1(ax): + + # a fancy box with round corners. pad=0.1 + p_fancy = FancyBboxPatch((bb.xmin, bb.ymin), + abs(bb.width), abs(bb.height), + boxstyle="round,pad=0.1", + fc=(1., .8, 1.), + ec=(1., 0.5, 1.)) + + ax.add_patch(p_fancy) + + ax.text(0.1, 0.8, + r' boxstyle="round,pad=0.1"', + size=10, transform=ax.transAxes) + + # draws control points for the fancy box. + # l = p_fancy.get_path().vertices + # ax.plot(l[:,0], l[:,1], ".") + + # draw the original bbox in black + draw_bbox(ax, bb) + + +def test2(ax): + + # bbox=round has two optional argument. pad and rounding_size. + # They can be set during the initialization. + p_fancy = FancyBboxPatch((bb.xmin, bb.ymin), + abs(bb.width), abs(bb.height), + boxstyle="round,pad=0.1", + fc=(1., .8, 1.), + ec=(1., 0.5, 1.)) + + ax.add_patch(p_fancy) + + # boxstyle and its argument can be later modified with + # set_boxstyle method. Note that the old attributes are simply + # forgotten even if the boxstyle name is same. + + p_fancy.set_boxstyle("round,pad=0.1, rounding_size=0.2") + # or + # p_fancy.set_boxstyle("round", pad=0.1, rounding_size=0.2) + + ax.text(0.1, 0.8, + ' boxstyle="round,pad=0.1\n rounding_size=0.2"', + size=10, transform=ax.transAxes) + + # draws control points for the fancy box. + # l = p_fancy.get_path().vertices + # ax.plot(l[:,0], l[:,1], ".") + + draw_bbox(ax, bb) + + +def test3(ax): + + # mutation_scale determine overall scale of the mutation, + # i.e. both pad and rounding_size is scaled according to this + # value. + p_fancy = FancyBboxPatch((bb.xmin, bb.ymin), + abs(bb.width), abs(bb.height), + boxstyle="round,pad=0.1", + mutation_scale=2., + fc=(1., .8, 1.), + ec=(1., 0.5, 1.)) + + ax.add_patch(p_fancy) + + ax.text(0.1, 0.8, + ' boxstyle="round,pad=0.1"\n mutation_scale=2', + size=10, transform=ax.transAxes) + + # draws control points for the fancy box. + # l = p_fancy.get_path().vertices + # ax.plot(l[:,0], l[:,1], ".") + + draw_bbox(ax, bb) + + +def test4(ax): + + # When the aspect ratio of the axes is not 1, the fancy box may + # not be what you expected (green) + + p_fancy = FancyBboxPatch((bb.xmin, bb.ymin), + abs(bb.width), abs(bb.height), + boxstyle="round,pad=0.2", + fc="none", + ec=(0., .5, 0.), zorder=4) + + ax.add_patch(p_fancy) + + # You can compensate this by setting the mutation_aspect (pink). + p_fancy = FancyBboxPatch((bb.xmin, bb.ymin), + abs(bb.width), abs(bb.height), + boxstyle="round,pad=0.3", + mutation_aspect=.5, + fc=(1., 0.8, 1.), + ec=(1., 0.5, 1.)) + + ax.add_patch(p_fancy) + + ax.text(0.1, 0.8, + ' boxstyle="round,pad=0.3"\n mutation_aspect=.5', + size=10, transform=ax.transAxes) + + draw_bbox(ax, bb) + + +def test_all(): + plt.clf() + + ax = plt.subplot(2, 2, 1) + test1(ax) + ax.set_xlim(0., 1.) + ax.set_ylim(0., 1.) + ax.set_title("test1") + ax.set_aspect(1.) + + ax = plt.subplot(2, 2, 2) + ax.set_title("test2") + test2(ax) + ax.set_xlim(0., 1.) + ax.set_ylim(0., 1.) + ax.set_aspect(1.) + + ax = plt.subplot(2, 2, 3) + ax.set_title("test3") + test3(ax) + ax.set_xlim(0., 1.) + ax.set_ylim(0., 1.) + ax.set_aspect(1) + + ax = plt.subplot(2, 2, 4) + ax.set_title("test4") + test4(ax) + ax.set_xlim(-0.5, 1.5) + ax.set_ylim(0., 1.) + ax.set_aspect(2.) + + plt.show() + + +test_all() +``` + +![Fancybox演示2](https://matplotlib.org/_images/sphx_glr_fancybox_demo_002.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.patches +matplotlib.patches.FancyBboxPatch +matplotlib.patches.BoxStyle +matplotlib.patches.BoxStyle.get_styles +matplotlib.transforms.Bbox +``` + +## 下载这个示例 + +- [下载python源码: fancybox_demo.py](https://matplotlib.org/_downloads/fancybox_demo.py) +- [下载Jupyter notebook: fancybox_demo.ipynb](https://matplotlib.org/_downloads/fancybox_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/hatch_demo.md b/Python/matplotlab/gallery/shapes_and_collections/hatch_demo.md new file mode 100644 index 00000000..6b19c350 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/hatch_demo.md @@ -0,0 +1,55 @@ +# Hatch演示 + +目前仅在PS,PDF,SVG和Agg后端支持阴影线(图案填充多边形)。 + +```python +import matplotlib.pyplot as plt +from matplotlib.patches import Ellipse, Polygon + +fig, (ax1, ax2, ax3) = plt.subplots(3) + +ax1.bar(range(1, 5), range(1, 5), color='red', edgecolor='black', hatch="/") +ax1.bar(range(1, 5), [6] * 4, bottom=range(1, 5), + color='blue', edgecolor='black', hatch='//') +ax1.set_xticks([1.5, 2.5, 3.5, 4.5]) + +bars = ax2.bar(range(1, 5), range(1, 5), color='yellow', ecolor='black') + \ + ax2.bar(range(1, 5), [6] * 4, bottom=range(1, 5), + color='green', ecolor='black') +ax2.set_xticks([1.5, 2.5, 3.5, 4.5]) + +patterns = ('-', '+', 'x', '\\', '*', 'o', 'O', '.') +for bar, pattern in zip(bars, patterns): + bar.set_hatch(pattern) + +ax3.fill([1, 3, 3, 1], [1, 1, 2, 2], fill=False, hatch='\\') +ax3.add_patch(Ellipse((4, 1.5), 4, 0.5, fill=False, hatch='*')) +ax3.add_patch(Polygon([[0, 0], [4, 1.1], [6, 2.5], [2, 1.4]], closed=True, + fill=False, hatch='/')) +ax3.set_xlim((0, 6)) +ax3.set_ylim((0, 2.5)) + +plt.show() +``` + +![Hatch演示](https://matplotlib.org/_images/sphx_glr_hatch_demo_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.patches +matplotlib.patches.Ellipse +matplotlib.patches.Polygon +matplotlib.axes.Axes.add_patch +matplotlib.patches.Patch.set_hatch +matplotlib.axes.Axes.bar +matplotlib.pyplot.bar +``` + +## 下载这个示例 + +- [下载python源码: hatch_demo.py](https://matplotlib.org/_downloads/hatch_demo.py) +- [下载Jupyter notebook: hatch_demo.ipynb](https://matplotlib.org/_downloads/hatch_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/line_collection.md b/Python/matplotlab/gallery/shapes_and_collections/line_collection.md new file mode 100644 index 00000000..9d0603fb --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/line_collection.md @@ -0,0 +1,105 @@ +# 线段集合 + +使用Matplotlib绘制线条。 + +[LineCollection](https://matplotlib.org/api/collections_api.html#matplotlib.collections.LineCollection) 允许在图上绘制多条线。 下面我们展示它的一些属性。 + +```python +import matplotlib.pyplot as plt +from matplotlib.collections import LineCollection +from matplotlib import colors as mcolors + +import numpy as np + +# In order to efficiently plot many lines in a single set of axes, +# Matplotlib has the ability to add the lines all at once. Here is a +# simple example showing how it is done. + +x = np.arange(100) +# Here are many sets of y to plot vs x +ys = x[:50, np.newaxis] + x[np.newaxis, :] + +segs = np.zeros((50, 100, 2)) +segs[:, :, 1] = ys +segs[:, :, 0] = x + +# Mask some values to test masked array support: +segs = np.ma.masked_where((segs > 50) & (segs < 60), segs) + +# We need to set the plot limits. +fig, ax = plt.subplots() +ax.set_xlim(x.min(), x.max()) +ax.set_ylim(ys.min(), ys.max()) + +# colors is sequence of rgba tuples +# linestyle is a string or dash tuple. Legal string values are +# solid|dashed|dashdot|dotted. The dash tuple is (offset, onoffseq) +# where onoffseq is an even length tuple of on and off ink in points. +# If linestyle is omitted, 'solid' is used +# See :class:`matplotlib.collections.LineCollection` for more information +colors = [mcolors.to_rgba(c) + for c in plt.rcParams['axes.prop_cycle'].by_key()['color']] + +line_segments = LineCollection(segs, linewidths=(0.5, 1, 1.5, 2), + colors=colors, linestyle='solid') +ax.add_collection(line_segments) +ax.set_title('Line collection with masked arrays') +plt.show() +``` + +![线段集合示例](https://matplotlib.org/_images/sphx_glr_line_collection_001.png) + +为了在一组轴中有效地绘制多条线,Matplotlib能够一次性添加所有线。 这是一个简单的例子,展示了它是如何完成的。 + +```python +N = 50 +x = np.arange(N) +# Here are many sets of y to plot vs x +ys = [x + i for i in x] + +# We need to set the plot limits, they will not autoscale +fig, ax = plt.subplots() +ax.set_xlim(np.min(x), np.max(x)) +ax.set_ylim(np.min(ys), np.max(ys)) + +# colors is sequence of rgba tuples +# linestyle is a string or dash tuple. Legal string values are +# solid|dashed|dashdot|dotted. The dash tuple is (offset, onoffseq) +# where onoffseq is an even length tuple of on and off ink in points. +# If linestyle is omitted, 'solid' is used +# See :class:`matplotlib.collections.LineCollection` for more information + +# Make a sequence of x,y pairs +line_segments = LineCollection([np.column_stack([x, y]) for y in ys], + linewidths=(0.5, 1, 1.5, 2), + linestyles='solid') +line_segments.set_array(x) +ax.add_collection(line_segments) +axcb = fig.colorbar(line_segments) +axcb.set_label('Line Number') +ax.set_title('Line Collection with mapped colors') +plt.sci(line_segments) # This allows interactive changing of the colormap. +plt.show() +``` + +![线段集合示例2](https://matplotlib.org/_images/sphx_glr_line_collection_002.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.collections +matplotlib.collections.LineCollection +matplotlib.cm.ScalarMappable.set_array +matplotlib.axes.Axes.add_collection +matplotlib.figure.Figure.colorbar +matplotlib.pyplot.colorbar +matplotlib.pyplot.sci +``` + +## 下载这个示例 + +- [下载python源码: line_collection.py](https://matplotlib.org/_downloads/line_collection.py) +- [下载Jupyter notebook: line_collection.ipynb](https://matplotlib.org/_downloads/line_collection.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/marker_path.md b/Python/matplotlab/gallery/shapes_and_collections/marker_path.md new file mode 100644 index 00000000..fcf34740 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/marker_path.md @@ -0,0 +1,43 @@ +# 标记路径 + +使用路径([path](https://matplotlib.org/api/path_api.html#matplotlib.path.Path))作为绘图的标记([plot](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot))。 + +```python +import matplotlib.pyplot as plt +import matplotlib.path as mpath +import numpy as np + + +star = mpath.Path.unit_regular_star(6) +circle = mpath.Path.unit_circle() +# concatenate the circle with an internal cutout of the star +verts = np.concatenate([circle.vertices, star.vertices[::-1, ...]]) +codes = np.concatenate([circle.codes, star.codes]) +cut_star = mpath.Path(verts, codes) + + +plt.plot(np.arange(10)**2, '--r', marker=cut_star, markersize=15) + +plt.show() +``` + +![标记路径示例](https://matplotlib.org/_images/sphx_glr_marker_path_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.path.Path.unit_regular_star +matplotlib.path.Path.unit_circle +matplotlib.axes.Axes.plot +matplotlib.pyplot.plot +``` + +## 下载这个示例 + +- [下载python源码: marker_path.py](https://matplotlib.org/_downloads/marker_path.py) +- [下载Jupyter notebook: marker_path.ipynb](https://matplotlib.org/_downloads/marker_path.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/patch_collection.md b/Python/matplotlab/gallery/shapes_and_collections/patch_collection.md new file mode 100644 index 00000000..3ab0be4d --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/patch_collection.md @@ -0,0 +1,78 @@ +# 圆,楔和多边形 + +此示例演示如何使用修补程序集合。 + +```python +import numpy as np +from matplotlib.patches import Circle, Wedge, Polygon +from matplotlib.collections import PatchCollection +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig, ax = plt.subplots() + +resolution = 50 # the number of vertices +N = 3 +x = np.random.rand(N) +y = np.random.rand(N) +radii = 0.1*np.random.rand(N) +patches = [] +for x1, y1, r in zip(x, y, radii): + circle = Circle((x1, y1), r) + patches.append(circle) + +x = np.random.rand(N) +y = np.random.rand(N) +radii = 0.1*np.random.rand(N) +theta1 = 360.0*np.random.rand(N) +theta2 = 360.0*np.random.rand(N) +for x1, y1, r, t1, t2 in zip(x, y, radii, theta1, theta2): + wedge = Wedge((x1, y1), r, t1, t2) + patches.append(wedge) + +# Some limiting conditions on Wedge +patches += [ + Wedge((.3, .7), .1, 0, 360), # Full circle + Wedge((.7, .8), .2, 0, 360, width=0.05), # Full ring + Wedge((.8, .3), .2, 0, 45), # Full sector + Wedge((.8, .3), .2, 45, 90, width=0.10), # Ring sector +] + +for i in range(N): + polygon = Polygon(np.random.rand(N, 2), True) + patches.append(polygon) + +colors = 100*np.random.rand(len(patches)) +p = PatchCollection(patches, alpha=0.4) +p.set_array(np.array(colors)) +ax.add_collection(p) +fig.colorbar(p, ax=ax) + +plt.show() +``` + +![圆,楔和多边形示例](https://matplotlib.org/_images/sphx_glr_patch_collection_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.patches +matplotlib.patches.Circle +matplotlib.patches.Wedge +matplotlib.patches.Polygon +matplotlib.collections.PatchCollection +matplotlib.collections.Collection.set_array +matplotlib.axes.Axes.add_collection +matplotlib.figure.Figure.colorbar +``` + +## 下载这个示例 + +- [下载python源码: patch_collection.py](https://matplotlib.org/_downloads/patch_collection.py) +- [下载Jupyter notebook: patch_collection.ipynb](https://matplotlib.org/_downloads/patch_collection.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/path_patch.md b/Python/matplotlab/gallery/shapes_and_collections/path_patch.md new file mode 100644 index 00000000..520e741b --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/path_patch.md @@ -0,0 +1,57 @@ +# PathPatch对象 + +此示例显示如何通过Matplotlib的API创建 [Path](https://matplotlib.org/api/path_api.html#matplotlib.path.Path) 和 [PathPatch](https://matplotlib.org/api/_as_gen/matplotlib.patches.PathPatch.html#matplotlib.patches.PathPatch) 对象。 + +```python +import matplotlib.path as mpath +import matplotlib.patches as mpatches +import matplotlib.pyplot as plt + + +fig, ax = plt.subplots() + +Path = mpath.Path +path_data = [ + (Path.MOVETO, (1.58, -2.57)), + (Path.CURVE4, (0.35, -1.1)), + (Path.CURVE4, (-1.75, 2.0)), + (Path.CURVE4, (0.375, 2.0)), + (Path.LINETO, (0.85, 1.15)), + (Path.CURVE4, (2.2, 3.2)), + (Path.CURVE4, (3, 0.05)), + (Path.CURVE4, (2.0, -0.5)), + (Path.CLOSEPOLY, (1.58, -2.57)), + ] +codes, verts = zip(*path_data) +path = mpath.Path(verts, codes) +patch = mpatches.PathPatch(path, facecolor='r', alpha=0.5) +ax.add_patch(patch) + +# plot control points and connecting lines +x, y = zip(*path.vertices) +line, = ax.plot(x, y, 'go-') + +ax.grid() +ax.axis('equal') +plt.show() +``` + +![PathPatch对象](https://matplotlib.org/_images/sphx_glr_path_patch_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.patches +matplotlib.patches.PathPatch +matplotlib.axes.Axes.add_patch +``` + +## 下载这个示例 + +- [下载python源码: path_patch.py](https://matplotlib.org/_downloads/path_patch.py) +- [下载Jupyter notebook: path_patch.ipynb](https://matplotlib.org/_downloads/path_patch.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/quad_bezier.md b/Python/matplotlab/gallery/shapes_and_collections/quad_bezier.md new file mode 100644 index 00000000..8c2392d8 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/quad_bezier.md @@ -0,0 +1,43 @@ +# Bezier曲线 + +此示例展示 [PathPatch](https://matplotlib.org/api/_as_gen/matplotlib.patches.PathPatch.html#matplotlib.patches.PathPatch) 对象以创建Bezier多曲线路径修补程序。 + +```python +import matplotlib.path as mpath +import matplotlib.patches as mpatches +import matplotlib.pyplot as plt + +Path = mpath.Path + +fig, ax = plt.subplots() +pp1 = mpatches.PathPatch( + Path([(0, 0), (1, 0), (1, 1), (0, 0)], + [Path.MOVETO, Path.CURVE3, Path.CURVE3, Path.CLOSEPOLY]), + fc="none", transform=ax.transData) + +ax.add_patch(pp1) +ax.plot([0.75], [0.25], "ro") +ax.set_title('The red point should be on the path') + +plt.show() +``` + +![Bezier曲线示例](https://matplotlib.org/_images/sphx_glr_quad_bezier_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.patches +matplotlib.patches.PathPatch +matplotlib.axes.Axes.add_patch +``` + +## 下载这个示例 + +- [下载python源码: quad_bezier.py](https://matplotlib.org/_downloads/quad_bezier.py) +- [下载Jupyter notebook: quad_bezier.ipynb](https://matplotlib.org/_downloads/quad_bezier.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/shapes_and_collections/scatter.md b/Python/matplotlab/gallery/shapes_and_collections/scatter.md new file mode 100644 index 00000000..574581b4 --- /dev/null +++ b/Python/matplotlab/gallery/shapes_and_collections/scatter.md @@ -0,0 +1,39 @@ +# 散点图 + +此示例展示了一个简单的散点图。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +N = 50 +x = np.random.rand(N) +y = np.random.rand(N) +colors = np.random.rand(N) +area = (30 * np.random.rand(N))**2 # 0 to 15 point radii + +plt.scatter(x, y, s=area, c=colors, alpha=0.5) +plt.show() +``` + +![散点图示例](https://matplotlib.org/_images/sphx_glr_scatter_001.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib + +matplotlib.axes.Axes.scatter +matplotlib.pyplot.scatter +``` + +## 下载这个示例 + +- [下载python源码: scatter.py](https://matplotlib.org/_downloads/scatter.py) +- [下载Jupyter notebook: scatter.ipynb](https://matplotlib.org/_downloads/scatter.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/showcase/anatomy.md b/Python/matplotlab/gallery/showcase/anatomy.md new file mode 100644 index 00000000..b4797295 --- /dev/null +++ b/Python/matplotlab/gallery/showcase/anatomy.md @@ -0,0 +1,150 @@ +# 解剖图 + +下图显示了组成一个图的几个matplotlib元素的名称。 + +![解剖图](https://matplotlib.org/_images/sphx_glr_anatomy_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.ticker import AutoMinorLocator, MultipleLocator, FuncFormatter + +np.random.seed(19680801) + +X = np.linspace(0.5, 3.5, 100) +Y1 = 3+np.cos(X) +Y2 = 1+np.cos(1+X/0.75)/2 +Y3 = np.random.uniform(Y1, Y2, len(X)) + +fig = plt.figure(figsize=(8, 8)) +ax = fig.add_subplot(1, 1, 1, aspect=1) + + +def minor_tick(x, pos): + if not x % 1.0: + return "" + return "%.2f" % x + +ax.xaxis.set_major_locator(MultipleLocator(1.000)) +ax.xaxis.set_minor_locator(AutoMinorLocator(4)) +ax.yaxis.set_major_locator(MultipleLocator(1.000)) +ax.yaxis.set_minor_locator(AutoMinorLocator(4)) +ax.xaxis.set_minor_formatter(FuncFormatter(minor_tick)) + +ax.set_xlim(0, 4) +ax.set_ylim(0, 4) + +ax.tick_params(which='major', width=1.0) +ax.tick_params(which='major', length=10) +ax.tick_params(which='minor', width=1.0, labelsize=10) +ax.tick_params(which='minor', length=5, labelsize=10, labelcolor='0.25') + +ax.grid(linestyle="--", linewidth=0.5, color='.25', zorder=-10) + +ax.plot(X, Y1, c=(0.25, 0.25, 1.00), lw=2, label="Blue signal", zorder=10) +ax.plot(X, Y2, c=(1.00, 0.25, 0.25), lw=2, label="Red signal") +ax.plot(X, Y3, linewidth=0, + marker='o', markerfacecolor='w', markeredgecolor='k') + +ax.set_title("Anatomy of a figure", fontsize=20, verticalalignment='bottom') +ax.set_xlabel("X axis label") +ax.set_ylabel("Y axis label") + +ax.legend() + + +def circle(x, y, radius=0.15): + from matplotlib.patches import Circle + from matplotlib.patheffects import withStroke + circle = Circle((x, y), radius, clip_on=False, zorder=10, linewidth=1, + edgecolor='black', facecolor=(0, 0, 0, .0125), + path_effects=[withStroke(linewidth=5, foreground='w')]) + ax.add_artist(circle) + + +def text(x, y, text): + ax.text(x, y, text, backgroundcolor="white", + ha='center', va='top', weight='bold', color='blue') + + +# Minor tick +circle(0.50, -0.10) +text(0.50, -0.32, "Minor tick label") + +# Major tick +circle(-0.03, 4.00) +text(0.03, 3.80, "Major tick") + +# Minor tick +circle(0.00, 3.50) +text(0.00, 3.30, "Minor tick") + +# Major tick label +circle(-0.15, 3.00) +text(-0.15, 2.80, "Major tick label") + +# X Label +circle(1.80, -0.27) +text(1.80, -0.45, "X axis label") + +# Y Label +circle(-0.27, 1.80) +text(-0.27, 1.6, "Y axis label") + +# Title +circle(1.60, 4.13) +text(1.60, 3.93, "Title") + +# Blue plot +circle(1.75, 2.80) +text(1.75, 2.60, "Line\n(line plot)") + +# Red plot +circle(1.20, 0.60) +text(1.20, 0.40, "Line\n(line plot)") + +# Scatter plot +circle(3.20, 1.75) +text(3.20, 1.55, "Markers\n(scatter plot)") + +# Grid +circle(3.00, 3.00) +text(3.00, 2.80, "Grid") + +# Legend +circle(3.70, 3.80) +text(3.70, 3.60, "Legend") + +# Axes +circle(0.5, 0.5) +text(0.5, 0.3, "Axes") + +# Figure +circle(-0.3, 0.65) +text(-0.3, 0.45, "Figure") + +color = 'blue' +ax.annotate('Spines', xy=(4.0, 0.35), xycoords='data', + xytext=(3.3, 0.5), textcoords='data', + weight='bold', color=color, + arrowprops=dict(arrowstyle='->', + connectionstyle="arc3", + color=color)) + +ax.annotate('', xy=(3.15, 0.0), xycoords='data', + xytext=(3.45, 0.45), textcoords='data', + weight='bold', color=color, + arrowprops=dict(arrowstyle='->', + connectionstyle="arc3", + color=color)) + +ax.text(4.0, -0.4, "Made with http://matplotlib.org", + fontsize=10, ha="right", color='.5') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: anatomy.py](https://matplotlib.org/_downloads/anatomy.py) +- [下载Jupyter notebook: anatomy.ipynb](https://matplotlib.org/_downloads/anatomy.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/showcase/bachelors_degrees_by_gender.md b/Python/matplotlab/gallery/showcase/bachelors_degrees_by_gender.md new file mode 100644 index 00000000..5763100d --- /dev/null +++ b/Python/matplotlab/gallery/showcase/bachelors_degrees_by_gender.md @@ -0,0 +1,118 @@ +# 按性别分列的学士学位 + +包含多个时间序列的图形,其中演示了打印框架、刻度线和标签以及线图特性的广泛自定义样式。 + +还演示了文本标签沿右边缘的自定义放置,作为传统图例的替代方法。 + +![按性别分列的学士学位示例](https://matplotlib.org/_images/sphx_glr_bachelors_degrees_by_gender_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.cbook import get_sample_data + + +fname = get_sample_data('percent_bachelors_degrees_women_usa.csv', + asfileobj=False) +gender_degree_data = np.genfromtxt(fname, delimiter=',', names=True) + +# These are the colors that will be used in the plot +color_sequence = ['#1f77b4', '#aec7e8', '#ff7f0e', '#ffbb78', '#2ca02c', + '#98df8a', '#d62728', '#ff9896', '#9467bd', '#c5b0d5', + '#8c564b', '#c49c94', '#e377c2', '#f7b6d2', '#7f7f7f', + '#c7c7c7', '#bcbd22', '#dbdb8d', '#17becf', '#9edae5'] + +# You typically want your plot to be ~1.33x wider than tall. This plot +# is a rare exception because of the number of lines being plotted on it. +# Common sizes: (10, 7.5) and (12, 9) +fig, ax = plt.subplots(1, 1, figsize=(12, 14)) + +# Remove the plot frame lines. They are unnecessary here. +ax.spines['top'].set_visible(False) +ax.spines['bottom'].set_visible(False) +ax.spines['right'].set_visible(False) +ax.spines['left'].set_visible(False) + +# Ensure that the axis ticks only show up on the bottom and left of the plot. +# Ticks on the right and top of the plot are generally unnecessary. +ax.get_xaxis().tick_bottom() +ax.get_yaxis().tick_left() + +fig.subplots_adjust(left=.06, right=.75, bottom=.02, top=.94) +# Limit the range of the plot to only where the data is. +# Avoid unnecessary whitespace. +ax.set_xlim(1969.5, 2011.1) +ax.set_ylim(-0.25, 90) + +# Make sure your axis ticks are large enough to be easily read. +# You don't want your viewers squinting to read your plot. +plt.xticks(range(1970, 2011, 10), fontsize=14) +plt.yticks(range(0, 91, 10), fontsize=14) +ax.xaxis.set_major_formatter(plt.FuncFormatter('{:.0f}'.format)) +ax.yaxis.set_major_formatter(plt.FuncFormatter('{:.0f}%'.format)) + +# Provide tick lines across the plot to help your viewers trace along +# the axis ticks. Make sure that the lines are light and small so they +# don't obscure the primary data lines. +plt.grid(True, 'major', 'y', ls='--', lw=.5, c='k', alpha=.3) + +# Remove the tick marks; they are unnecessary with the tick lines we just +# plotted. +plt.tick_params(axis='both', which='both', bottom=False, top=False, + labelbottom=True, left=False, right=False, labelleft=True) + +# Now that the plot is prepared, it's time to actually plot the data! +# Note that I plotted the majors in order of the highest % in the final year. +majors = ['Health Professions', 'Public Administration', 'Education', + 'Psychology', 'Foreign Languages', 'English', + 'Communications\nand Journalism', 'Art and Performance', 'Biology', + 'Agriculture', 'Social Sciences and History', 'Business', + 'Math and Statistics', 'Architecture', 'Physical Sciences', + 'Computer Science', 'Engineering'] + +y_offsets = {'Foreign Languages': 0.5, 'English': -0.5, + 'Communications\nand Journalism': 0.75, + 'Art and Performance': -0.25, 'Agriculture': 1.25, + 'Social Sciences and History': 0.25, 'Business': -0.75, + 'Math and Statistics': 0.75, 'Architecture': -0.75, + 'Computer Science': 0.75, 'Engineering': -0.25} + +for rank, column in enumerate(majors): + # Plot each line separately with its own color. + column_rec_name = column.replace('\n', '_').replace(' ', '_') + + line = plt.plot(gender_degree_data['Year'], + gender_degree_data[column_rec_name], + lw=2.5, + color=color_sequence[rank]) + + # Add a text label to the right end of every line. Most of the code below + # is adding specific offsets y position because some labels overlapped. + y_pos = gender_degree_data[column_rec_name][-1] - 0.5 + + if column in y_offsets: + y_pos += y_offsets[column] + + # Again, make sure that all labels are large enough to be easily read + # by the viewer. + plt.text(2011.5, y_pos, column, fontsize=14, color=color_sequence[rank]) + +# Make the title big enough so it spans the entire plot, but don't make it +# so big that it requires two lines to show. + +# Note that if the title is descriptive enough, it is unnecessary to include +# axis labels; they are self-evident, in this plot's case. +fig.suptitle('Percentage of Bachelor\'s degrees conferred to women in ' + 'the U.S.A. by major (1970-2011)\n', fontsize=18, ha='center') + +# Finally, save the figure as a PNG. +# You can also save it as a PDF, JPEG, etc. +# Just change the file extension in this call. +# plt.savefig('percent-bachelors-degrees-women-usa.png', bbox_inches='tight') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bachelors_degrees_by_gender.py](https://matplotlib.org/_downloads/bachelors_degrees_by_gender.py) +- [下载Jupyter notebook: bachelors_degrees_by_gender.ipynb](https://matplotlib.org/_downloads/bachelors_degrees_by_gender.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/showcase/firefox.md b/Python/matplotlab/gallery/showcase/firefox.md new file mode 100644 index 00000000..52913e69 --- /dev/null +++ b/Python/matplotlab/gallery/showcase/firefox.md @@ -0,0 +1,76 @@ +# 绘制火狐浏览器logo + +此示例显示如何使用路径和修补程序创建Firefox徽标。 + +![绘制火狐浏览器logo示例](https://matplotlib.org/_images/sphx_glr_firefox_001.png) + +```python +import re +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.path import Path +import matplotlib.patches as patches + +# From: http://raphaeljs.com/icons/#firefox +firefox = "M28.4,22.469c0.479-0.964,0.851-1.991,1.095-3.066c0.953-3.661,0.666-6.854,0.666-6.854l-0.327,2.104c0,0-0.469-3.896-1.044-5.353c-0.881-2.231-1.273-2.214-1.274-2.21c0.542,1.379,0.494,2.169,0.483,2.288c-0.01-0.016-0.019-0.032-0.027-0.047c-0.131-0.324-0.797-1.819-2.225-2.878c-2.502-2.481-5.943-4.014-9.745-4.015c-4.056,0-7.705,1.745-10.238,4.525C5.444,6.5,5.183,5.938,5.159,5.317c0,0-0.002,0.002-0.006,0.005c0-0.011-0.003-0.021-0.003-0.031c0,0-1.61,1.247-1.436,4.612c-0.299,0.574-0.56,1.172-0.777,1.791c-0.375,0.817-0.75,2.004-1.059,3.746c0,0,0.133-0.422,0.399-0.988c-0.064,0.482-0.103,0.971-0.116,1.467c-0.09,0.845-0.118,1.865-0.039,3.088c0,0,0.032-0.406,0.136-1.021c0.834,6.854,6.667,12.165,13.743,12.165l0,0c1.86,0,3.636-0.37,5.256-1.036C24.938,27.771,27.116,25.196,28.4,22.469zM16.002,3.356c2.446,0,4.73,0.68,6.68,1.86c-2.274-0.528-3.433-0.261-3.423-0.248c0.013,0.015,3.384,0.589,3.981,1.411c0,0-1.431,0-2.856,0.41c-0.065,0.019,5.242,0.663,6.327,5.966c0,0-0.582-1.213-1.301-1.42c0.473,1.439,0.351,4.17-0.1,5.528c-0.058,0.174-0.118-0.755-1.004-1.155c0.284,2.037-0.018,5.268-1.432,6.158c-0.109,0.07,0.887-3.189,0.201-1.93c-4.093,6.276-8.959,2.539-10.934,1.208c1.585,0.388,3.267,0.108,4.242-0.559c0.982-0.672,1.564-1.162,2.087-1.047c0.522,0.117,0.87-0.407,0.464-0.872c-0.405-0.466-1.392-1.105-2.725-0.757c-0.94,0.247-2.107,1.287-3.886,0.233c-1.518-0.899-1.507-1.63-1.507-2.095c0-0.366,0.257-0.88,0.734-1.028c0.58,0.062,1.044,0.214,1.537,0.466c0.005-0.135,0.006-0.315-0.001-0.519c0.039-0.077,0.015-0.311-0.047-0.596c-0.036-0.287-0.097-0.582-0.19-0.851c0.01-0.002,0.017-0.007,0.021-0.021c0.076-0.344,2.147-1.544,2.299-1.659c0.153-0.114,0.55-0.378,0.506-1.183c-0.015-0.265-0.058-0.294-2.232-0.286c-0.917,0.003-1.425-0.894-1.589-1.245c0.222-1.231,0.863-2.11,1.919-2.704c0.02-0.011,0.015-0.021-0.008-0.027c0.219-0.127-2.524-0.006-3.76,1.604C9.674,8.045,9.219,7.95,8.71,7.95c-0.638,0-1.139,0.07-1.603,0.187c-0.05,0.013-0.122,0.011-0.208-0.001C6.769,8.04,6.575,7.88,6.365,7.672c0.161-0.18,0.324-0.356,0.495-0.526C9.201,4.804,12.43,3.357,16.002,3.356z" + + +def svg_parse(path): + commands = {'M': (Path.MOVETO,), + 'L': (Path.LINETO,), + 'Q': (Path.CURVE3,)*2, + 'C': (Path.CURVE4,)*3, + 'Z': (Path.CLOSEPOLY,)} + path_re = re.compile(r'([MLHVCSQTAZ])([^MLHVCSQTAZ]+)', re.IGNORECASE) + float_re = re.compile(r'(?:[\s,]*)([+-]?\d+(?:\.\d+)?)') + vertices = [] + codes = [] + last = (0, 0) + for cmd, values in path_re.findall(path): + points = [float(v) for v in float_re.findall(values)] + points = np.array(points).reshape((len(points)//2, 2)) + if cmd.islower(): + points += last + cmd = cmd.capitalize() + last = points[-1] + codes.extend(commands[cmd]) + vertices.extend(points.tolist()) + return codes, vertices + +# SVG to matplotlib +codes, verts = svg_parse(firefox) +verts = np.array(verts) +path = Path(verts, codes) + +# Make upside down +verts[:, 1] *= -1 +xmin, xmax = verts[:, 0].min()-1, verts[:, 0].max()+1 +ymin, ymax = verts[:, 1].min()-1, verts[:, 1].max()+1 + +fig = plt.figure(figsize=(5, 5)) +ax = fig.add_axes([0.0, 0.0, 1.0, 1.0], frameon=False, aspect=1) + +# White outline (width = 6) +patch = patches.PathPatch(path, facecolor='None', edgecolor='w', lw=6) +ax.add_patch(patch) + +# Actual shape with black outline +patch = patches.PathPatch(path, facecolor='orange', edgecolor='k', lw=2) +ax.add_patch(patch) + +# Centering +ax.set_xlim(xmin, xmax) +ax.set_ylim(ymin, ymax) + +# No ticks +ax.set_xticks([]) +ax.set_yticks([]) + +# Display +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: firefox.py](https://matplotlib.org/_downloads/firefox.py) +- [下载Jupyter notebook: firefox.ipynb](https://matplotlib.org/_downloads/firefox.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/showcase/integral.md b/Python/matplotlab/gallery/showcase/integral.md new file mode 100644 index 00000000..4e3cd837 --- /dev/null +++ b/Python/matplotlab/gallery/showcase/integral.md @@ -0,0 +1,59 @@ +# 积分作为曲线下面积 + +虽然这是一个简单的例子,但它展示了一些重要的调整: + +- 带有自定义颜色和线宽的简单线条图。 +- 使用Polygon补丁创建的阴影区域。 +- 带有mathtext渲染的文本标签。 +- figtext调用标记x轴和y轴。 +- 使用轴刺来隐藏顶部和右侧脊柱。 +- 自定义刻度线和标签。 + +![积分作为曲线下面积](https://matplotlib.org/_images/sphx_glr_integral_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.patches import Polygon + + +def func(x): + return (x - 3) * (x - 5) * (x - 7) + 85 + + +a, b = 2, 9 # integral limits +x = np.linspace(0, 10) +y = func(x) + +fig, ax = plt.subplots() +plt.plot(x, y, 'r', linewidth=2) +plt.ylim(ymin=0) + +# Make the shaded region +ix = np.linspace(a, b) +iy = func(ix) +verts = [(a, 0), *zip(ix, iy), (b, 0)] +poly = Polygon(verts, facecolor='0.9', edgecolor='0.5') +ax.add_patch(poly) + +plt.text(0.5 * (a + b), 30, r"$\int_a^b f(x)\mathrm{d}x$", + horizontalalignment='center', fontsize=20) + +plt.figtext(0.9, 0.05, '$x$') +plt.figtext(0.1, 0.9, '$y$') + +ax.spines['right'].set_visible(False) +ax.spines['top'].set_visible(False) +ax.xaxis.set_ticks_position('bottom') + +ax.set_xticks((a, b)) +ax.set_xticklabels(('$a$', '$b$')) +ax.set_yticks([]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: integral.py](https://matplotlib.org/_downloads/integral.py) +- [下载Jupyter notebook: integral.ipynb](https://matplotlib.org/_downloads/integral.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/showcase/mandelbrot.md b/Python/matplotlab/gallery/showcase/mandelbrot.md new file mode 100644 index 00000000..9243c2e2 --- /dev/null +++ b/Python/matplotlab/gallery/showcase/mandelbrot.md @@ -0,0 +1,80 @@ +# 阴影和增强标准化渲染 + +通过使用与幂归一化色图(gamma = 0.3)相关联的归一化重新计数,可以改善Mandelbrot集渲染。 由于阴影,渲染可以进一步增强。 + +maxiter给出了计算的精度。 在大多数现代笔记本电脑上,maxiter = 200应该需要几秒钟。 + +![阴影和增强标准化渲染示例](https://matplotlib.org/_images/sphx_glr_mandelbrot_001.png) + +```python +import numpy as np + + +def mandelbrot_set(xmin, xmax, ymin, ymax, xn, yn, maxiter, horizon=2.0): + X = np.linspace(xmin, xmax, xn).astype(np.float32) + Y = np.linspace(ymin, ymax, yn).astype(np.float32) + C = X + Y[:, None] * 1j + N = np.zeros_like(C, dtype=int) + Z = np.zeros_like(C) + for n in range(maxiter): + I = np.less(abs(Z), horizon) + N[I] = n + Z[I] = Z[I]**2 + C[I] + N[N == maxiter-1] = 0 + return Z, N + + +if __name__ == '__main__': + import time + import matplotlib + from matplotlib import colors + import matplotlib.pyplot as plt + + xmin, xmax, xn = -2.25, +0.75, 3000/2 + ymin, ymax, yn = -1.25, +1.25, 2500/2 + maxiter = 200 + horizon = 2.0 ** 40 + log_horizon = np.log(np.log(horizon))/np.log(2) + Z, N = mandelbrot_set(xmin, xmax, ymin, ymax, xn, yn, maxiter, horizon) + + # Normalized recount as explained in: + # https://linas.org/art-gallery/escape/smooth.html + # https://www.ibm.com/developerworks/community/blogs/jfp/entry/My_Christmas_Gift + + # This line will generate warnings for null values but it is faster to + # process them afterwards using the nan_to_num + with np.errstate(invalid='ignore'): + M = np.nan_to_num(N + 1 - + np.log(np.log(abs(Z)))/np.log(2) + + log_horizon) + + dpi = 72 + width = 10 + height = 10*yn/xn + fig = plt.figure(figsize=(width, height), dpi=dpi) + ax = fig.add_axes([0.0, 0.0, 1.0, 1.0], frameon=False, aspect=1) + + # Shaded rendering + light = colors.LightSource(azdeg=315, altdeg=10) + M = light.shade(M, cmap=plt.cm.hot, vert_exag=1.5, + norm=colors.PowerNorm(0.3), blend_mode='hsv') + plt.imshow(M, extent=[xmin, xmax, ymin, ymax], interpolation="bicubic") + ax.set_xticks([]) + ax.set_yticks([]) + + # Some advertisement for matplotlib + year = time.strftime("%Y") + text = ("The Mandelbrot fractal set\n" + "Rendered with matplotlib %s, %s - http://matplotlib.org" + % (matplotlib.__version__, year)) + ax.text(xmin+.025, ymin+.025, text, color="white", fontsize=12, alpha=0.5) + + plt.show() +``` + +Total running time of the script: ( 0 minutes 4.800 seconds) + +## 下载这个示例 + +- [下载python源码: mandelbrot.py](https://matplotlib.org/_downloads/mandelbrot.py) +- [下载Jupyter notebook: mandelbrot.ipynb](https://matplotlib.org/_downloads/mandelbrot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/showcase/xkcd.md b/Python/matplotlab/gallery/showcase/xkcd.md new file mode 100644 index 00000000..0fa98410 --- /dev/null +++ b/Python/matplotlab/gallery/showcase/xkcd.md @@ -0,0 +1,74 @@ +# XKCD + +演示如何创建类似xkcd的绘图。 + +```python +import matplotlib.pyplot as plt +import numpy as np +``` + +```python +with plt.xkcd(): + # Based on "Stove Ownership" from XKCD by Randall Monroe + # http://xkcd.com/418/ + + fig = plt.figure() + ax = fig.add_axes((0.1, 0.2, 0.8, 0.7)) + ax.spines['right'].set_color('none') + ax.spines['top'].set_color('none') + plt.xticks([]) + plt.yticks([]) + ax.set_ylim([-30, 10]) + + data = np.ones(100) + data[70:] -= np.arange(30) + + plt.annotate( + 'THE DAY I REALIZED\nI COULD COOK BACON\nWHENEVER I WANTED', + xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10)) + + plt.plot(data) + + plt.xlabel('time') + plt.ylabel('my overall health') + fig.text( + 0.5, 0.05, + '"Stove Ownership" from xkcd by Randall Monroe', + ha='center') +``` + +![XKCD示例1](https://matplotlib.org/_images/sphx_glr_xkcd_001.png) + +```python +with plt.xkcd(): + # Based on "The Data So Far" from XKCD by Randall Monroe + # http://xkcd.com/373/ + + fig = plt.figure() + ax = fig.add_axes((0.1, 0.2, 0.8, 0.7)) + ax.bar([0, 1], [0, 100], 0.25) + ax.spines['right'].set_color('none') + ax.spines['top'].set_color('none') + ax.xaxis.set_ticks_position('bottom') + ax.set_xticks([0, 1]) + ax.set_xlim([-0.5, 1.5]) + ax.set_ylim([0, 110]) + ax.set_xticklabels(['CONFIRMED BY\nEXPERIMENT', 'REFUTED BY\nEXPERIMENT']) + plt.yticks([]) + + plt.title("CLAIMS OF SUPERNATURAL POWERS") + + fig.text( + 0.5, 0.05, + '"The Data So Far" from xkcd by Randall Monroe', + ha='center') + +plt.show() +``` + +![XKCD示例2](https://matplotlib.org/_images/sphx_glr_xkcd_002.png) + +## 下载这个示例 + +- [下载python源码: xkcd.py](https://matplotlib.org/_downloads/xkcd.py) +- [下载Jupyter notebook: xkcd.ipynb](https://matplotlib.org/_downloads/xkcd.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/advanced_hillshading.md b/Python/matplotlab/gallery/specialty_plots/advanced_hillshading.md new file mode 100644 index 00000000..ad71a341 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/advanced_hillshading.md @@ -0,0 +1,85 @@ +# 晕渲 + +用阴影图展示一些常见的技巧。 + +![晕渲示例](https://matplotlib.org/_images/sphx_glr_advanced_hillshading_001.png) + +![晕渲示例2](https://matplotlib.org/_images/sphx_glr_advanced_hillshading_002.png) + +![晕渲示例3](https://matplotlib.org/_images/sphx_glr_advanced_hillshading_003.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.colors import LightSource, Normalize + + +def display_colorbar(): + """Display a correct numeric colorbar for a shaded plot.""" + y, x = np.mgrid[-4:2:200j, -4:2:200j] + z = 10 * np.cos(x**2 + y**2) + + cmap = plt.cm.copper + ls = LightSource(315, 45) + rgb = ls.shade(z, cmap) + + fig, ax = plt.subplots() + ax.imshow(rgb, interpolation='bilinear') + + # Use a proxy artist for the colorbar... + im = ax.imshow(z, cmap=cmap) + im.remove() + fig.colorbar(im) + + ax.set_title('Using a colorbar with a shaded plot', size='x-large') + + +def avoid_outliers(): + """Use a custom norm to control the displayed z-range of a shaded plot.""" + y, x = np.mgrid[-4:2:200j, -4:2:200j] + z = 10 * np.cos(x**2 + y**2) + + # Add some outliers... + z[100, 105] = 2000 + z[120, 110] = -9000 + + ls = LightSource(315, 45) + fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(8, 4.5)) + + rgb = ls.shade(z, plt.cm.copper) + ax1.imshow(rgb, interpolation='bilinear') + ax1.set_title('Full range of data') + + rgb = ls.shade(z, plt.cm.copper, vmin=-10, vmax=10) + ax2.imshow(rgb, interpolation='bilinear') + ax2.set_title('Manually set range') + + fig.suptitle('Avoiding Outliers in Shaded Plots', size='x-large') + + +def shade_other_data(): + """Demonstrates displaying different variables through shade and color.""" + y, x = np.mgrid[-4:2:200j, -4:2:200j] + z1 = np.sin(x**2) # Data to hillshade + z2 = np.cos(x**2 + y**2) # Data to color + + norm = Normalize(z2.min(), z2.max()) + cmap = plt.cm.RdBu + + ls = LightSource(315, 45) + rgb = ls.shade_rgb(cmap(norm(z2)), z1) + + fig, ax = plt.subplots() + ax.imshow(rgb, interpolation='bilinear') + ax.set_title('Shade by one variable, color by another', size='x-large') + +display_colorbar() +avoid_outliers() +shade_other_data() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: advanced_hillshading.py](https://matplotlib.org/_downloads/advanced_hillshading.py) +- [下载Jupyter notebook: advanced_hillshading.ipynb](https://matplotlib.org/_downloads/advanced_hillshading.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/anscombe.md b/Python/matplotlab/gallery/specialty_plots/anscombe.md new file mode 100644 index 00000000..015ac176 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/anscombe.md @@ -0,0 +1,78 @@ +# Anscombe的四重奏 + +![Anscombe的四重奏示例](https://matplotlib.org/_images/sphx_glr_anscombe_001.png) + +输出: + +```python +mean=7.50, std=1.94, r=0.82 +mean=7.50, std=1.94, r=0.82 +mean=7.50, std=1.94, r=0.82 +mean=7.50, std=1.94, r=0.82 +``` + +```python +""" +Edward Tufte uses this example from Anscombe to show 4 datasets of x +and y that have the same mean, standard deviation, and regression +line, but which are qualitatively different. + +matplotlib fun for a rainy day +""" + +import matplotlib.pyplot as plt +import numpy as np + +x = np.array([10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5]) +y1 = np.array([8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]) +y2 = np.array([9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]) +y3 = np.array([7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]) +x4 = np.array([8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]) +y4 = np.array([6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]) + + +def fit(x): + return 3 + 0.5 * x + + +xfit = np.array([np.min(x), np.max(x)]) + +plt.subplot(221) +plt.plot(x, y1, 'ks', xfit, fit(xfit), 'r-', lw=2) +plt.axis([2, 20, 2, 14]) +plt.setp(plt.gca(), xticklabels=[], yticks=(4, 8, 12), xticks=(0, 10, 20)) +plt.text(3, 12, 'I', fontsize=20) + +plt.subplot(222) +plt.plot(x, y2, 'ks', xfit, fit(xfit), 'r-', lw=2) +plt.axis([2, 20, 2, 14]) +plt.setp(plt.gca(), xticks=(0, 10, 20), xticklabels=[], + yticks=(4, 8, 12), yticklabels=[], ) +plt.text(3, 12, 'II', fontsize=20) + +plt.subplot(223) +plt.plot(x, y3, 'ks', xfit, fit(xfit), 'r-', lw=2) +plt.axis([2, 20, 2, 14]) +plt.text(3, 12, 'III', fontsize=20) +plt.setp(plt.gca(), yticks=(4, 8, 12), xticks=(0, 10, 20)) + +plt.subplot(224) +xfit = np.array([np.min(x4), np.max(x4)]) +plt.plot(x4, y4, 'ks', xfit, fit(xfit), 'r-', lw=2) +plt.axis([2, 20, 2, 14]) +plt.setp(plt.gca(), yticklabels=[], yticks=(4, 8, 12), xticks=(0, 10, 20)) +plt.text(3, 12, 'IV', fontsize=20) + +# verify the stats +pairs = (x, y1), (x, y2), (x, y3), (x4, y4) +for x, y in pairs: + print('mean=%1.2f, std=%1.2f, r=%1.2f' % (np.mean(y), np.std(y), + np.corrcoef(x, y)[0][1])) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: anscombe.py](https://matplotlib.org/_downloads/anscombe.py) +- [下载Jupyter notebook: anscombe.ipynb](https://matplotlib.org/_downloads/anscombe.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/hinton_demo.md b/Python/matplotlab/gallery/specialty_plots/hinton_demo.md new file mode 100644 index 00000000..c8076b96 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/hinton_demo.md @@ -0,0 +1,48 @@ +# Hinton图 + +Hinton图对于可视化2D阵列的值(例如,权重矩阵)是有用的:正值和负值分别由白色和黑色方块表示,并且每个方块的大小表示每个值的大小。 + +David Warde-Farley在SciPy Cookbook上的初步想法 + +![Hinton图示例](https://matplotlib.org/_images/sphx_glr_hinton_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +def hinton(matrix, max_weight=None, ax=None): + """Draw Hinton diagram for visualizing a weight matrix.""" + ax = ax if ax is not None else plt.gca() + + if not max_weight: + max_weight = 2 ** np.ceil(np.log(np.abs(matrix).max()) / np.log(2)) + + ax.patch.set_facecolor('gray') + ax.set_aspect('equal', 'box') + ax.xaxis.set_major_locator(plt.NullLocator()) + ax.yaxis.set_major_locator(plt.NullLocator()) + + for (x, y), w in np.ndenumerate(matrix): + color = 'white' if w > 0 else 'black' + size = np.sqrt(np.abs(w) / max_weight) + rect = plt.Rectangle([x - size / 2, y - size / 2], size, size, + facecolor=color, edgecolor=color) + ax.add_patch(rect) + + ax.autoscale_view() + ax.invert_yaxis() + + +if __name__ == '__main__': + # Fixing random state for reproducibility + np.random.seed(19680801) + + hinton(np.random.rand(20, 20) - 0.5) + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: hinton_demo.py](https://matplotlib.org/_downloads/hinton_demo.py) +- [下载Jupyter notebook: hinton_demo.ipynb](https://matplotlib.org/_downloads/hinton_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/leftventricle_bulleye.md b/Python/matplotlab/gallery/specialty_plots/leftventricle_bulleye.md new file mode 100644 index 00000000..2f61085b --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/leftventricle_bulleye.md @@ -0,0 +1,214 @@ +# 左心室靶心 + +该示例演示了如何为美国心脏协会(AHA)推荐的左心室创建17段模型。 + +![左心室靶心示例](https://matplotlib.org/_images/sphx_glr_leftventricle_bulleye_001.png) + +```python +import numpy as np +import matplotlib as mpl +import matplotlib.pyplot as plt + + +def bullseye_plot(ax, data, segBold=None, cmap=None, norm=None): + """ + Bullseye representation for the left ventricle. + + Parameters + ---------- + ax : axes + data : list of int and float + The intensity values for each of the 17 segments + segBold: list of int, optional + A list with the segments to highlight + cmap : ColorMap or None, optional + Optional argument to set the desired colormap + norm : Normalize or None, optional + Optional argument to normalize data into the [0.0, 1.0] range + + + Notes + ----- + This function create the 17 segment model for the left ventricle according + to the American Heart Association (AHA) [1]_ + + References + ---------- + .. [1] M. D. Cerqueira, N. J. Weissman, V. Dilsizian, A. K. Jacobs, + S. Kaul, W. K. Laskey, D. J. Pennell, J. A. Rumberger, T. Ryan, + and M. S. Verani, "Standardized myocardial segmentation and + nomenclature for tomographic imaging of the heart", + Circulation, vol. 105, no. 4, pp. 539-542, 2002. + """ + if segBold is None: + segBold = [] + + linewidth = 2 + data = np.array(data).ravel() + + if cmap is None: + cmap = plt.cm.viridis + + if norm is None: + norm = mpl.colors.Normalize(vmin=data.min(), vmax=data.max()) + + theta = np.linspace(0, 2 * np.pi, 768) + r = np.linspace(0.2, 1, 4) + + # Create the bound for the segment 17 + for i in range(r.shape[0]): + ax.plot(theta, np.repeat(r[i], theta.shape), '-k', lw=linewidth) + + # Create the bounds for the segments 1-12 + for i in range(6): + theta_i = np.deg2rad(i * 60) + ax.plot([theta_i, theta_i], [r[1], 1], '-k', lw=linewidth) + + # Create the bounds for the segments 13-16 + for i in range(4): + theta_i = np.deg2rad(i * 90 - 45) + ax.plot([theta_i, theta_i], [r[0], r[1]], '-k', lw=linewidth) + + # Fill the segments 1-6 + r0 = r[2:4] + r0 = np.repeat(r0[:, np.newaxis], 128, axis=1).T + for i in range(6): + # First segment start at 60 degrees + theta0 = theta[i * 128:i * 128 + 128] + np.deg2rad(60) + theta0 = np.repeat(theta0[:, np.newaxis], 2, axis=1) + z = np.ones((128, 2)) * data[i] + ax.pcolormesh(theta0, r0, z, cmap=cmap, norm=norm) + if i + 1 in segBold: + ax.plot(theta0, r0, '-k', lw=linewidth + 2) + ax.plot(theta0[0], [r[2], r[3]], '-k', lw=linewidth + 1) + ax.plot(theta0[-1], [r[2], r[3]], '-k', lw=linewidth + 1) + + # Fill the segments 7-12 + r0 = r[1:3] + r0 = np.repeat(r0[:, np.newaxis], 128, axis=1).T + for i in range(6): + # First segment start at 60 degrees + theta0 = theta[i * 128:i * 128 + 128] + np.deg2rad(60) + theta0 = np.repeat(theta0[:, np.newaxis], 2, axis=1) + z = np.ones((128, 2)) * data[i + 6] + ax.pcolormesh(theta0, r0, z, cmap=cmap, norm=norm) + if i + 7 in segBold: + ax.plot(theta0, r0, '-k', lw=linewidth + 2) + ax.plot(theta0[0], [r[1], r[2]], '-k', lw=linewidth + 1) + ax.plot(theta0[-1], [r[1], r[2]], '-k', lw=linewidth + 1) + + # Fill the segments 13-16 + r0 = r[0:2] + r0 = np.repeat(r0[:, np.newaxis], 192, axis=1).T + for i in range(4): + # First segment start at 45 degrees + theta0 = theta[i * 192:i * 192 + 192] + np.deg2rad(45) + theta0 = np.repeat(theta0[:, np.newaxis], 2, axis=1) + z = np.ones((192, 2)) * data[i + 12] + ax.pcolormesh(theta0, r0, z, cmap=cmap, norm=norm) + if i + 13 in segBold: + ax.plot(theta0, r0, '-k', lw=linewidth + 2) + ax.plot(theta0[0], [r[0], r[1]], '-k', lw=linewidth + 1) + ax.plot(theta0[-1], [r[0], r[1]], '-k', lw=linewidth + 1) + + # Fill the segments 17 + if data.size == 17: + r0 = np.array([0, r[0]]) + r0 = np.repeat(r0[:, np.newaxis], theta.size, axis=1).T + theta0 = np.repeat(theta[:, np.newaxis], 2, axis=1) + z = np.ones((theta.size, 2)) * data[16] + ax.pcolormesh(theta0, r0, z, cmap=cmap, norm=norm) + if 17 in segBold: + ax.plot(theta0, r0, '-k', lw=linewidth + 2) + + ax.set_ylim([0, 1]) + ax.set_yticklabels([]) + ax.set_xticklabels([]) + + +# Create the fake data +data = np.array(range(17)) + 1 + + +# Make a figure and axes with dimensions as desired. +fig, ax = plt.subplots(figsize=(12, 8), nrows=1, ncols=3, + subplot_kw=dict(projection='polar')) +fig.canvas.set_window_title('Left Ventricle Bulls Eyes (AHA)') + +# Create the axis for the colorbars +axl = fig.add_axes([0.14, 0.15, 0.2, 0.05]) +axl2 = fig.add_axes([0.41, 0.15, 0.2, 0.05]) +axl3 = fig.add_axes([0.69, 0.15, 0.2, 0.05]) + + +# Set the colormap and norm to correspond to the data for which +# the colorbar will be used. +cmap = mpl.cm.viridis +norm = mpl.colors.Normalize(vmin=1, vmax=17) + +# ColorbarBase derives from ScalarMappable and puts a colorbar +# in a specified axes, so it has everything needed for a +# standalone colorbar. There are many more kwargs, but the +# following gives a basic continuous colorbar with ticks +# and labels. +cb1 = mpl.colorbar.ColorbarBase(axl, cmap=cmap, norm=norm, + orientation='horizontal') +cb1.set_label('Some Units') + + +# Set the colormap and norm to correspond to the data for which +# the colorbar will be used. +cmap2 = mpl.cm.cool +norm2 = mpl.colors.Normalize(vmin=1, vmax=17) + +# ColorbarBase derives from ScalarMappable and puts a colorbar +# in a specified axes, so it has everything needed for a +# standalone colorbar. There are many more kwargs, but the +# following gives a basic continuous colorbar with ticks +# and labels. +cb2 = mpl.colorbar.ColorbarBase(axl2, cmap=cmap2, norm=norm2, + orientation='horizontal') +cb2.set_label('Some other units') + + +# The second example illustrates the use of a ListedColormap, a +# BoundaryNorm, and extended ends to show the "over" and "under" +# value colors. +cmap3 = mpl.colors.ListedColormap(['r', 'g', 'b', 'c']) +cmap3.set_over('0.35') +cmap3.set_under('0.75') + +# If a ListedColormap is used, the length of the bounds array must be +# one greater than the length of the color list. The bounds must be +# monotonically increasing. +bounds = [2, 3, 7, 9, 15] +norm3 = mpl.colors.BoundaryNorm(bounds, cmap3.N) +cb3 = mpl.colorbar.ColorbarBase(axl3, cmap=cmap3, norm=norm3, + # to use 'extend', you must + # specify two extra boundaries: + boundaries=[0] + bounds + [18], + extend='both', + ticks=bounds, # optional + spacing='proportional', + orientation='horizontal') +cb3.set_label('Discrete intervals, some other units') + + +# Create the 17 segment model +bullseye_plot(ax[0], data, cmap=cmap, norm=norm) +ax[0].set_title('Bulls Eye (AHA)') + +bullseye_plot(ax[1], data, cmap=cmap2, norm=norm2) +ax[1].set_title('Bulls Eye (AHA)') + +bullseye_plot(ax[2], data, segBold=[3, 5, 6, 11, 12, 16], + cmap=cmap3, norm=norm3) +ax[2].set_title('Segments [3,5,6,11,12,16] in bold') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: leftventricle_bulleye.py](https://matplotlib.org/_downloads/leftventricle_bulleye.py) +- [下载Jupyter notebook: leftventricle_bulleye.ipynb](https://matplotlib.org/_downloads/leftventricle_bulleye.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/mri_demo.md b/Python/matplotlab/gallery/specialty_plots/mri_demo.md new file mode 100644 index 00000000..2849db42 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/mri_demo.md @@ -0,0 +1,28 @@ +# MRI + +此示例说明如何将(MRI)图像读入NumPy阵列,并使用imshow以灰度显示。 + +![MRI示例](https://matplotlib.org/_images/sphx_glr_mri_demo_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook +import matplotlib.cm as cm +import numpy as np + + +# Data are 256x256 16 bit integers +with cbook.get_sample_data('s1045.ima.gz') as dfile: + im = np.fromstring(dfile.read(), np.uint16).reshape((256, 256)) + +fig, ax = plt.subplots(num="MRI_demo") +ax.imshow(im, cmap=cm.gray) +ax.axis('off') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: mri_demo.py](https://matplotlib.org/_downloads/mri_demo.py) +- [下载Jupyter notebook: mri_demo.ipynb](https://matplotlib.org/_downloads/mri_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/mri_with_eeg.md b/Python/matplotlab/gallery/specialty_plots/mri_with_eeg.md new file mode 100644 index 00000000..ce4e272e --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/mri_with_eeg.md @@ -0,0 +1,82 @@ +# MRI与脑电图 + +显示一组带有MRI图像的子图,其强度直方图和一些EEG轨迹。 + +![MRI与脑电图示例](https://matplotlib.org/_images/sphx_glr_mri_with_eeg_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook +import matplotlib.cm as cm + +from matplotlib.collections import LineCollection +from matplotlib.ticker import MultipleLocator + +fig = plt.figure("MRI_with_EEG") + +# Load the MRI data (256x256 16 bit integers) +with cbook.get_sample_data('s1045.ima.gz') as dfile: + im = np.fromstring(dfile.read(), np.uint16).reshape((256, 256)) + +# Plot the MRI image +ax0 = fig.add_subplot(2, 2, 1) +ax0.imshow(im, cmap=cm.gray) +ax0.axis('off') + +# Plot the histogram of MRI intensity +ax1 = fig.add_subplot(2, 2, 2) +im = np.ravel(im) +im = im[np.nonzero(im)] # Ignore the background +im = im / (2**16 - 1) # Normalize +ax1.hist(im, bins=100) +ax1.xaxis.set_major_locator(MultipleLocator(0.4)) +ax1.minorticks_on() +ax1.set_yticks([]) +ax1.set_xlabel('Intensity (a.u.)') +ax1.set_ylabel('MRI density') + +# Load the EEG data +n_samples, n_rows = 800, 4 +with cbook.get_sample_data('eeg.dat') as eegfile: + data = np.fromfile(eegfile, dtype=float).reshape((n_samples, n_rows)) +t = 10 * np.arange(n_samples) / n_samples + +# Plot the EEG +ticklocs = [] +ax2 = fig.add_subplot(2, 1, 2) +ax2.set_xlim(0, 10) +ax2.set_xticks(np.arange(10)) +dmin = data.min() +dmax = data.max() +dr = (dmax - dmin) * 0.7 # Crowd them a bit. +y0 = dmin +y1 = (n_rows - 1) * dr + dmax +ax2.set_ylim(y0, y1) + +segs = [] +for i in range(n_rows): + segs.append(np.column_stack((t, data[:, i]))) + ticklocs.append(i * dr) + +offsets = np.zeros((n_rows, 2), dtype=float) +offsets[:, 1] = ticklocs + +lines = LineCollection(segs, offsets=offsets, transOffset=None) +ax2.add_collection(lines) + +# Set the yticks to use axes coordinates on the y axis +ax2.set_yticks(ticklocs) +ax2.set_yticklabels(['PG3', 'PG5', 'PG7', 'PG9']) + +ax2.set_xlabel('Time (s)') + + +plt.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: mri_with_eeg.py](https://matplotlib.org/_downloads/mri_with_eeg.py) +- [下载Jupyter notebook: mri_with_eeg.ipynb](https://matplotlib.org/_downloads/mri_with_eeg.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/radar_chart.md b/Python/matplotlab/gallery/specialty_plots/radar_chart.md new file mode 100644 index 00000000..7f4f1e02 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/radar_chart.md @@ -0,0 +1,222 @@ +# 雷达图(又名蜘蛛星图) + +此示例创建雷达图表,也称为蜘蛛星图。 + +虽然此示例允许“圆”或“多边形”的框架,但多边形框架没有合适的网格线(线条是圆形而不是多边形)。 通过将matplotlib.axis中的GRIDLINE_INTERPOLATION_STEPS设置为所需的顶点数,可以获得多边形网格,但多边形的方向不与径向轴对齐。 + +http://en.wikipedia.org/wiki/Radar_chart + +```python +import numpy as np + +import matplotlib.pyplot as plt +from matplotlib.path import Path +from matplotlib.spines import Spine +from matplotlib.projections.polar import PolarAxes +from matplotlib.projections import register_projection + + +def radar_factory(num_vars, frame='circle'): + """Create a radar chart with `num_vars` axes. + + This function creates a RadarAxes projection and registers it. + + Parameters + ---------- + num_vars : int + Number of variables for radar chart. + frame : {'circle' | 'polygon'} + Shape of frame surrounding axes. + + """ + # calculate evenly-spaced axis angles + theta = np.linspace(0, 2*np.pi, num_vars, endpoint=False) + + def draw_poly_patch(self): + # rotate theta such that the first axis is at the top + verts = unit_poly_verts(theta + np.pi / 2) + return plt.Polygon(verts, closed=True, edgecolor='k') + + def draw_circle_patch(self): + # unit circle centered on (0.5, 0.5) + return plt.Circle((0.5, 0.5), 0.5) + + patch_dict = {'polygon': draw_poly_patch, 'circle': draw_circle_patch} + if frame not in patch_dict: + raise ValueError('unknown value for `frame`: %s' % frame) + + class RadarAxes(PolarAxes): + + name = 'radar' + # use 1 line segment to connect specified points + RESOLUTION = 1 + # define draw_frame method + draw_patch = patch_dict[frame] + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + # rotate plot such that the first axis is at the top + self.set_theta_zero_location('N') + + def fill(self, *args, closed=True, **kwargs): + """Override fill so that line is closed by default""" + return super().fill(closed=closed, *args, **kwargs) + + def plot(self, *args, **kwargs): + """Override plot so that line is closed by default""" + lines = super().plot(*args, **kwargs) + for line in lines: + self._close_line(line) + + def _close_line(self, line): + x, y = line.get_data() + # FIXME: markers at x[0], y[0] get doubled-up + if x[0] != x[-1]: + x = np.concatenate((x, [x[0]])) + y = np.concatenate((y, [y[0]])) + line.set_data(x, y) + + def set_varlabels(self, labels): + self.set_thetagrids(np.degrees(theta), labels) + + def _gen_axes_patch(self): + return self.draw_patch() + + def _gen_axes_spines(self): + if frame == 'circle': + return super()._gen_axes_spines() + # The following is a hack to get the spines (i.e. the axes frame) + # to draw correctly for a polygon frame. + + # spine_type must be 'left', 'right', 'top', 'bottom', or `circle`. + spine_type = 'circle' + verts = unit_poly_verts(theta + np.pi / 2) + # close off polygon by repeating first vertex + verts.append(verts[0]) + path = Path(verts) + + spine = Spine(self, spine_type, path) + spine.set_transform(self.transAxes) + return {'polar': spine} + + register_projection(RadarAxes) + return theta + + +def unit_poly_verts(theta): + """Return vertices of polygon for subplot axes. + + This polygon is circumscribed by a unit circle centered at (0.5, 0.5) + """ + x0, y0, r = [0.5] * 3 + verts = [(r*np.cos(t) + x0, r*np.sin(t) + y0) for t in theta] + return verts + + +def example_data(): + # The following data is from the Denver Aerosol Sources and Health study. + # See doi:10.1016/j.atmosenv.2008.12.017 + # + # The data are pollution source profile estimates for five modeled + # pollution sources (e.g., cars, wood-burning, etc) that emit 7-9 chemical + # species. The radar charts are experimented with here to see if we can + # nicely visualize how the modeled source profiles change across four + # scenarios: + # 1) No gas-phase species present, just seven particulate counts on + # Sulfate + # Nitrate + # Elemental Carbon (EC) + # Organic Carbon fraction 1 (OC) + # Organic Carbon fraction 2 (OC2) + # Organic Carbon fraction 3 (OC3) + # Pyrolized Organic Carbon (OP) + # 2)Inclusion of gas-phase specie carbon monoxide (CO) + # 3)Inclusion of gas-phase specie ozone (O3). + # 4)Inclusion of both gas-phase species is present... + data = [ + ['Sulfate', 'Nitrate', 'EC', 'OC1', 'OC2', 'OC3', 'OP', 'CO', 'O3'], + ('Basecase', [ + [0.88, 0.01, 0.03, 0.03, 0.00, 0.06, 0.01, 0.00, 0.00], + [0.07, 0.95, 0.04, 0.05, 0.00, 0.02, 0.01, 0.00, 0.00], + [0.01, 0.02, 0.85, 0.19, 0.05, 0.10, 0.00, 0.00, 0.00], + [0.02, 0.01, 0.07, 0.01, 0.21, 0.12, 0.98, 0.00, 0.00], + [0.01, 0.01, 0.02, 0.71, 0.74, 0.70, 0.00, 0.00, 0.00]]), + ('With CO', [ + [0.88, 0.02, 0.02, 0.02, 0.00, 0.05, 0.00, 0.05, 0.00], + [0.08, 0.94, 0.04, 0.02, 0.00, 0.01, 0.12, 0.04, 0.00], + [0.01, 0.01, 0.79, 0.10, 0.00, 0.05, 0.00, 0.31, 0.00], + [0.00, 0.02, 0.03, 0.38, 0.31, 0.31, 0.00, 0.59, 0.00], + [0.02, 0.02, 0.11, 0.47, 0.69, 0.58, 0.88, 0.00, 0.00]]), + ('With O3', [ + [0.89, 0.01, 0.07, 0.00, 0.00, 0.05, 0.00, 0.00, 0.03], + [0.07, 0.95, 0.05, 0.04, 0.00, 0.02, 0.12, 0.00, 0.00], + [0.01, 0.02, 0.86, 0.27, 0.16, 0.19, 0.00, 0.00, 0.00], + [0.01, 0.03, 0.00, 0.32, 0.29, 0.27, 0.00, 0.00, 0.95], + [0.02, 0.00, 0.03, 0.37, 0.56, 0.47, 0.87, 0.00, 0.00]]), + ('CO & O3', [ + [0.87, 0.01, 0.08, 0.00, 0.00, 0.04, 0.00, 0.00, 0.01], + [0.09, 0.95, 0.02, 0.03, 0.00, 0.01, 0.13, 0.06, 0.00], + [0.01, 0.02, 0.71, 0.24, 0.13, 0.16, 0.00, 0.50, 0.00], + [0.01, 0.03, 0.00, 0.28, 0.24, 0.23, 0.00, 0.44, 0.88], + [0.02, 0.00, 0.18, 0.45, 0.64, 0.55, 0.86, 0.00, 0.16]]) + ] + return data + + +if __name__ == '__main__': + N = 9 + theta = radar_factory(N, frame='polygon') + + data = example_data() + spoke_labels = data.pop(0) + + fig, axes = plt.subplots(figsize=(9, 9), nrows=2, ncols=2, + subplot_kw=dict(projection='radar')) + fig.subplots_adjust(wspace=0.25, hspace=0.20, top=0.85, bottom=0.05) + + colors = ['b', 'r', 'g', 'm', 'y'] + # Plot the four cases from the example data on separate axes + for ax, (title, case_data) in zip(axes.flatten(), data): + ax.set_rgrids([0.2, 0.4, 0.6, 0.8]) + ax.set_title(title, weight='bold', size='medium', position=(0.5, 1.1), + horizontalalignment='center', verticalalignment='center') + for d, color in zip(case_data, colors): + ax.plot(theta, d, color=color) + ax.fill(theta, d, facecolor=color, alpha=0.25) + ax.set_varlabels(spoke_labels) + + # add legend relative to top-left plot + ax = axes[0, 0] + labels = ('Factor 1', 'Factor 2', 'Factor 3', 'Factor 4', 'Factor 5') + legend = ax.legend(labels, loc=(0.9, .95), + labelspacing=0.1, fontsize='small') + + fig.text(0.5, 0.965, '5-Factor Solution Profiles Across Four Scenarios', + horizontalalignment='center', color='black', weight='bold', + size='large') + + plt.show() +``` + +![雷达图示例](https://matplotlib.org/_images/sphx_glr_radar_chart_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.path +matplotlib.path.Path +matplotlib.spines +matplotlib.spines.Spine +matplotlib.projections +matplotlib.projections.polar +matplotlib.projections.polar.PolarAxes +matplotlib.projections.register_projection +``` + +## 下载这个示例 + +- [下载python源码: radar_chart.py](https://matplotlib.org/_downloads/radar_chart.py) +- [下载Jupyter notebook: radar_chart.ipynb](https://matplotlib.org/_downloads/radar_chart.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/sankey_basics.md b/Python/matplotlab/gallery/specialty_plots/sankey_basics.md new file mode 100644 index 00000000..3d24d0d1 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/sankey_basics.md @@ -0,0 +1,93 @@ +# 桑基图类 + +通过生成三个基本图表来演示Sankey类。 + +```python +import matplotlib.pyplot as plt + +from matplotlib.sankey import Sankey +``` + +示例1 - 主要是默认值 + +这演示了如何通过隐式调用Sankey.add()方法并将finish()附加到类的调用来创建一个简单的图。 + +```python +Sankey(flows=[0.25, 0.15, 0.60, -0.20, -0.15, -0.05, -0.50, -0.10], + labels=['', '', '', 'First', 'Second', 'Third', 'Fourth', 'Fifth'], + orientations=[-1, 1, 0, 1, 1, 1, 0, -1]).finish() +plt.title("The default settings produce a diagram like this.") +``` + +![桑基图类示例](https://matplotlib.org/_images/sphx_glr_sankey_basics_001.png) + +注意: + +示例2 + +这表明: + +```python +fig = plt.figure() +ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], + title="Flow Diagram of a Widget") +sankey = Sankey(ax=ax, scale=0.01, offset=0.2, head_angle=180, + format='%.0f', unit='%') +sankey.add(flows=[25, 0, 60, -10, -20, -5, -15, -10, -40], + labels=['', '', '', 'First', 'Second', 'Third', 'Fourth', + 'Fifth', 'Hurray!'], + orientations=[-1, 1, 0, 1, 1, 1, -1, -1, 0], + pathlengths=[0.25, 0.25, 0.25, 0.25, 0.25, 0.6, 0.25, 0.25, + 0.25], + patchlabel="Widget\nA") # Arguments to matplotlib.patches.PathPatch() +diagrams = sankey.finish() +diagrams[0].texts[-1].set_color('r') +diagrams[0].text.set_fontweight('bold') +``` + +![桑基图类示例2](https://matplotlib.org/_images/sphx_glr_sankey_basics_002.png) + +注意: + +示例3 + +这表明: + +```python +fig = plt.figure() +ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], title="Two Systems") +flows = [0.25, 0.15, 0.60, -0.10, -0.05, -0.25, -0.15, -0.10, -0.35] +sankey = Sankey(ax=ax, unit=None) +sankey.add(flows=flows, label='one', + orientations=[-1, 1, 0, 1, 1, 1, -1, -1, 0]) +sankey.add(flows=[-0.25, 0.15, 0.1], label='two', + orientations=[-1, -1, -1], prior=0, connect=(0, 0)) +diagrams = sankey.finish() +diagrams[-1].patch.set_hatch('/') +plt.legend() +``` + +![桑基图类示例3](https://matplotlib.org/_images/sphx_glr_sankey_basics_003.png) + +请注意,只指定了一个连接,但系统形成一个电路,因为:(1)路径的长度是合理的,(2)流的方向和顺序是镜像的。 + +```python +plt.show() +``` + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.sankey +matplotlib.sankey.Sankey +matplotlib.sankey.Sankey.add +matplotlib.sankey.Sankey.finish +``` + +## 下载这个示例 + +- [下载python源码: sankey_basics.py](https://matplotlib.org/_downloads/sankey_basics.py) +- [下载Jupyter notebook: sankey_basics.ipynb](https://matplotlib.org/_downloads/sankey_basics.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/sankey_links.md b/Python/matplotlab/gallery/specialty_plots/sankey_links.md new file mode 100644 index 00000000..6b796335 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/sankey_links.md @@ -0,0 +1,73 @@ +# 使用Sankey的长链连接 + +通过建立长链连接来演示/测试Sankey类。 + +```python +import matplotlib.pyplot as plt +from matplotlib.sankey import Sankey + +links_per_side = 6 + + +def side(sankey, n=1): + """Generate a side chain.""" + prior = len(sankey.diagrams) + for i in range(0, 2*n, 2): + sankey.add(flows=[1, -1], orientations=[-1, -1], + patchlabel=str(prior + i), + prior=prior + i - 1, connect=(1, 0), alpha=0.5) + sankey.add(flows=[1, -1], orientations=[1, 1], + patchlabel=str(prior + i + 1), + prior=prior + i, connect=(1, 0), alpha=0.5) + + +def corner(sankey): + """Generate a corner link.""" + prior = len(sankey.diagrams) + sankey.add(flows=[1, -1], orientations=[0, 1], + patchlabel=str(prior), facecolor='k', + prior=prior - 1, connect=(1, 0), alpha=0.5) + + +fig = plt.figure() +ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], + title="Why would you want to do this?\n(But you could.)") +sankey = Sankey(ax=ax, unit=None) +sankey.add(flows=[1, -1], orientations=[0, 1], + patchlabel="0", facecolor='k', + rotation=45) +side(sankey, n=links_per_side) +corner(sankey) +side(sankey, n=links_per_side) +corner(sankey) +side(sankey, n=links_per_side) +corner(sankey) +side(sankey, n=links_per_side) +sankey.finish() +# Notice: +# 1. The alignment doesn't drift significantly (if at all; with 16007 +# subdiagrams there is still closure). +# 2. The first diagram is rotated 45 deg, so all other diagrams are rotated +# accordingly. + +plt.show() +``` + +![使用Sankey的长链连接示例](https://matplotlib.org/_images/sphx_glr_sankey_links_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.sankey +matplotlib.sankey.Sankey +matplotlib.sankey.Sankey.add +matplotlib.sankey.Sankey.finish +``` + +## 下载这个示例 + +- [下载python源码: sankey_links.py](https://matplotlib.org/_downloads/sankey_links.py) +- [下载Jupyter notebook: sankey_links.ipynb](https://matplotlib.org/_downloads/sankey_links.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/sankey_rankine.md b/Python/matplotlab/gallery/specialty_plots/sankey_rankine.md new file mode 100644 index 00000000..aa83473e --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/sankey_rankine.md @@ -0,0 +1,100 @@ +# 朗肯动力循环 + +通过朗肯动力循环的实际示例演示Sankey类。 + +```python +import matplotlib.pyplot as plt + +from matplotlib.sankey import Sankey + +fig = plt.figure(figsize=(8, 9)) +ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], + title="Rankine Power Cycle: Example 8.6 from Moran and " + "Shapiro\n\x22Fundamentals of Engineering Thermodynamics " + "\x22, 6th ed., 2008") +Hdot = [260.431, 35.078, 180.794, 221.115, 22.700, + 142.361, 10.193, 10.210, 43.670, 44.312, + 68.631, 10.758, 10.758, 0.017, 0.642, + 232.121, 44.559, 100.613, 132.168] # MW +sankey = Sankey(ax=ax, format='%.3G', unit=' MW', gap=0.5, scale=1.0/Hdot[0]) +sankey.add(patchlabel='\n\nPump 1', rotation=90, facecolor='#37c959', + flows=[Hdot[13], Hdot[6], -Hdot[7]], + labels=['Shaft power', '', None], + pathlengths=[0.4, 0.883, 0.25], + orientations=[1, -1, 0]) +sankey.add(patchlabel='\n\nOpen\nheater', facecolor='#37c959', + flows=[Hdot[11], Hdot[7], Hdot[4], -Hdot[8]], + labels=[None, '', None, None], + pathlengths=[0.25, 0.25, 1.93, 0.25], + orientations=[1, 0, -1, 0], prior=0, connect=(2, 1)) +sankey.add(patchlabel='\n\nPump 2', facecolor='#37c959', + flows=[Hdot[14], Hdot[8], -Hdot[9]], + labels=['Shaft power', '', None], + pathlengths=[0.4, 0.25, 0.25], + orientations=[1, 0, 0], prior=1, connect=(3, 1)) +sankey.add(patchlabel='Closed\nheater', trunklength=2.914, fc='#37c959', + flows=[Hdot[9], Hdot[1], -Hdot[11], -Hdot[10]], + pathlengths=[0.25, 1.543, 0.25, 0.25], + labels=['', '', None, None], + orientations=[0, -1, 1, -1], prior=2, connect=(2, 0)) +sankey.add(patchlabel='Trap', facecolor='#37c959', trunklength=5.102, + flows=[Hdot[11], -Hdot[12]], + labels=['\n', None], + pathlengths=[1.0, 1.01], + orientations=[1, 1], prior=3, connect=(2, 0)) +sankey.add(patchlabel='Steam\ngenerator', facecolor='#ff5555', + flows=[Hdot[15], Hdot[10], Hdot[2], -Hdot[3], -Hdot[0]], + labels=['Heat rate', '', '', None, None], + pathlengths=0.25, + orientations=[1, 0, -1, -1, -1], prior=3, connect=(3, 1)) +sankey.add(patchlabel='\n\n\nTurbine 1', facecolor='#37c959', + flows=[Hdot[0], -Hdot[16], -Hdot[1], -Hdot[2]], + labels=['', None, None, None], + pathlengths=[0.25, 0.153, 1.543, 0.25], + orientations=[0, 1, -1, -1], prior=5, connect=(4, 0)) +sankey.add(patchlabel='\n\n\nReheat', facecolor='#37c959', + flows=[Hdot[2], -Hdot[2]], + labels=[None, None], + pathlengths=[0.725, 0.25], + orientations=[-1, 0], prior=6, connect=(3, 0)) +sankey.add(patchlabel='Turbine 2', trunklength=3.212, facecolor='#37c959', + flows=[Hdot[3], Hdot[16], -Hdot[5], -Hdot[4], -Hdot[17]], + labels=[None, 'Shaft power', None, '', 'Shaft power'], + pathlengths=[0.751, 0.15, 0.25, 1.93, 0.25], + orientations=[0, -1, 0, -1, 1], prior=6, connect=(1, 1)) +sankey.add(patchlabel='Condenser', facecolor='#58b1fa', trunklength=1.764, + flows=[Hdot[5], -Hdot[18], -Hdot[6]], + labels=['', 'Heat rate', None], + pathlengths=[0.45, 0.25, 0.883], + orientations=[-1, 1, 0], prior=8, connect=(2, 0)) +diagrams = sankey.finish() +for diagram in diagrams: + diagram.text.set_fontweight('bold') + diagram.text.set_fontsize('10') + for text in diagram.texts: + text.set_fontsize('10') +# Notice that the explicit connections are handled automatically, but the +# implicit ones currently are not. The lengths of the paths and the trunks +# must be adjusted manually, and that is a bit tricky. + +plt.show() +``` + +![朗肯动力循环示例](https://matplotlib.org/_images/sphx_glr_sankey_rankine_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.sankey +matplotlib.sankey.Sankey +matplotlib.sankey.Sankey.add +matplotlib.sankey.Sankey.finish +``` + +## 下载这个示例 + +- [下载python源码: sankey_rankine.py](https://matplotlib.org/_downloads/sankey_rankine.py) +- [下载Jupyter notebook: sankey_rankine.ipynb](https://matplotlib.org/_downloads/sankey_rankine.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/skewt.md b/Python/matplotlab/gallery/specialty_plots/skewt.md new file mode 100644 index 00000000..820bc22c --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/skewt.md @@ -0,0 +1,304 @@ +# SkewT-logP图:使用变换和自定义投影 + +这可以作为matplotlib变换和自定义投影API的强化练习。 这个例子产生了一个所谓的SkewT-logP图,它是气象学中用于显示温度垂直剖面的常见图。 就matplotlib而言,复杂性来自于X轴和Y轴不正交。 这是通过在基本Axes变换中包含一个偏斜分量来处理的。 处理上下X轴具有不同数据范围的事实带来了额外的复杂性,这需要一系列用于刻度,棘轮和轴的自定义类来处理这一问题。 + +```python +from matplotlib.axes import Axes +import matplotlib.transforms as transforms +import matplotlib.axis as maxis +import matplotlib.spines as mspines +from matplotlib.projections import register_projection + + +# The sole purpose of this class is to look at the upper, lower, or total +# interval as appropriate and see what parts of the tick to draw, if any. +class SkewXTick(maxis.XTick): + def update_position(self, loc): + # This ensures that the new value of the location is set before + # any other updates take place + self._loc = loc + super().update_position(loc) + + def _has_default_loc(self): + return self.get_loc() is None + + def _need_lower(self): + return (self._has_default_loc() or + transforms.interval_contains(self.axes.lower_xlim, + self.get_loc())) + + def _need_upper(self): + return (self._has_default_loc() or + transforms.interval_contains(self.axes.upper_xlim, + self.get_loc())) + + @property + def gridOn(self): + return (self._gridOn and (self._has_default_loc() or + transforms.interval_contains(self.get_view_interval(), + self.get_loc()))) + + @gridOn.setter + def gridOn(self, value): + self._gridOn = value + + @property + def tick1On(self): + return self._tick1On and self._need_lower() + + @tick1On.setter + def tick1On(self, value): + self._tick1On = value + + @property + def label1On(self): + return self._label1On and self._need_lower() + + @label1On.setter + def label1On(self, value): + self._label1On = value + + @property + def tick2On(self): + return self._tick2On and self._need_upper() + + @tick2On.setter + def tick2On(self, value): + self._tick2On = value + + @property + def label2On(self): + return self._label2On and self._need_upper() + + @label2On.setter + def label2On(self, value): + self._label2On = value + + def get_view_interval(self): + return self.axes.xaxis.get_view_interval() + + +# This class exists to provide two separate sets of intervals to the tick, +# as well as create instances of the custom tick +class SkewXAxis(maxis.XAxis): + def _get_tick(self, major): + return SkewXTick(self.axes, None, '', major=major) + + def get_view_interval(self): + return self.axes.upper_xlim[0], self.axes.lower_xlim[1] + + +# This class exists to calculate the separate data range of the +# upper X-axis and draw the spine there. It also provides this range +# to the X-axis artist for ticking and gridlines +class SkewSpine(mspines.Spine): + def _adjust_location(self): + pts = self._path.vertices + if self.spine_type == 'top': + pts[:, 0] = self.axes.upper_xlim + else: + pts[:, 0] = self.axes.lower_xlim + + +# This class handles registration of the skew-xaxes as a projection as well +# as setting up the appropriate transformations. It also overrides standard +# spines and axes instances as appropriate. +class SkewXAxes(Axes): + # The projection must specify a name. This will be used be the + # user to select the projection, i.e. ``subplot(111, + # projection='skewx')``. + name = 'skewx' + + def _init_axis(self): + # Taken from Axes and modified to use our modified X-axis + self.xaxis = SkewXAxis(self) + self.spines['top'].register_axis(self.xaxis) + self.spines['bottom'].register_axis(self.xaxis) + self.yaxis = maxis.YAxis(self) + self.spines['left'].register_axis(self.yaxis) + self.spines['right'].register_axis(self.yaxis) + + def _gen_axes_spines(self): + spines = {'top': SkewSpine.linear_spine(self, 'top'), + 'bottom': mspines.Spine.linear_spine(self, 'bottom'), + 'left': mspines.Spine.linear_spine(self, 'left'), + 'right': mspines.Spine.linear_spine(self, 'right')} + return spines + + def _set_lim_and_transforms(self): + """ + This is called once when the plot is created to set up all the + transforms for the data, text and grids. + """ + rot = 30 + + # Get the standard transform setup from the Axes base class + Axes._set_lim_and_transforms(self) + + # Need to put the skew in the middle, after the scale and limits, + # but before the transAxes. This way, the skew is done in Axes + # coordinates thus performing the transform around the proper origin + # We keep the pre-transAxes transform around for other users, like the + # spines for finding bounds + self.transDataToAxes = self.transScale + \ + self.transLimits + transforms.Affine2D().skew_deg(rot, 0) + + # Create the full transform from Data to Pixels + self.transData = self.transDataToAxes + self.transAxes + + # Blended transforms like this need to have the skewing applied using + # both axes, in axes coords like before. + self._xaxis_transform = (transforms.blended_transform_factory( + self.transScale + self.transLimits, + transforms.IdentityTransform()) + + transforms.Affine2D().skew_deg(rot, 0)) + self.transAxes + + @property + def lower_xlim(self): + return self.axes.viewLim.intervalx + + @property + def upper_xlim(self): + pts = [[0., 1.], [1., 1.]] + return self.transDataToAxes.inverted().transform(pts)[:, 0] + + +# Now register the projection with matplotlib so the user can select +# it. +register_projection(SkewXAxes) + +if __name__ == '__main__': + # Now make a simple example using the custom projection. + from io import StringIO + from matplotlib.ticker import (MultipleLocator, NullFormatter, + ScalarFormatter) + import matplotlib.pyplot as plt + import numpy as np + + # Some examples data + data_txt = ''' + 978.0 345 7.8 0.8 61 4.16 325 14 282.7 294.6 283.4 + 971.0 404 7.2 0.2 61 4.01 327 17 282.7 294.2 283.4 + 946.7 610 5.2 -1.8 61 3.56 335 26 282.8 293.0 283.4 + 944.0 634 5.0 -2.0 61 3.51 336 27 282.8 292.9 283.4 + 925.0 798 3.4 -2.6 65 3.43 340 32 282.8 292.7 283.4 + 911.8 914 2.4 -2.7 69 3.46 345 37 282.9 292.9 283.5 + 906.0 966 2.0 -2.7 71 3.47 348 39 283.0 293.0 283.6 + 877.9 1219 0.4 -3.2 77 3.46 0 48 283.9 293.9 284.5 + 850.0 1478 -1.3 -3.7 84 3.44 0 47 284.8 294.8 285.4 + 841.0 1563 -1.9 -3.8 87 3.45 358 45 285.0 295.0 285.6 + 823.0 1736 1.4 -0.7 86 4.44 353 42 290.3 303.3 291.0 + 813.6 1829 4.5 1.2 80 5.17 350 40 294.5 309.8 295.4 + 809.0 1875 6.0 2.2 77 5.57 347 39 296.6 313.2 297.6 + 798.0 1988 7.4 -0.6 57 4.61 340 35 299.2 313.3 300.1 + 791.0 2061 7.6 -1.4 53 4.39 335 33 300.2 313.6 301.0 + 783.9 2134 7.0 -1.7 54 4.32 330 31 300.4 313.6 301.2 + 755.1 2438 4.8 -3.1 57 4.06 300 24 301.2 313.7 301.9 + 727.3 2743 2.5 -4.4 60 3.81 285 29 301.9 313.8 302.6 + 700.5 3048 0.2 -5.8 64 3.57 275 31 302.7 313.8 303.3 + 700.0 3054 0.2 -5.8 64 3.56 280 31 302.7 313.8 303.3 + 698.0 3077 0.0 -6.0 64 3.52 280 31 302.7 313.7 303.4 + 687.0 3204 -0.1 -7.1 59 3.28 281 31 304.0 314.3 304.6 + 648.9 3658 -3.2 -10.9 55 2.59 285 30 305.5 313.8 305.9 + 631.0 3881 -4.7 -12.7 54 2.29 289 33 306.2 313.6 306.6 + 600.7 4267 -6.4 -16.7 44 1.73 295 39 308.6 314.3 308.9 + 592.0 4381 -6.9 -17.9 41 1.59 297 41 309.3 314.6 309.6 + 577.6 4572 -8.1 -19.6 39 1.41 300 44 310.1 314.9 310.3 + 555.3 4877 -10.0 -22.3 36 1.16 295 39 311.3 315.3 311.5 + 536.0 5151 -11.7 -24.7 33 0.97 304 39 312.4 315.8 312.6 + 533.8 5182 -11.9 -25.0 33 0.95 305 39 312.5 315.8 312.7 + 500.0 5680 -15.9 -29.9 29 0.64 290 44 313.6 315.9 313.7 + 472.3 6096 -19.7 -33.4 28 0.49 285 46 314.1 315.8 314.1 + 453.0 6401 -22.4 -36.0 28 0.39 300 50 314.4 315.8 314.4 + 400.0 7310 -30.7 -43.7 27 0.20 285 44 315.0 315.8 315.0 + 399.7 7315 -30.8 -43.8 27 0.20 285 44 315.0 315.8 315.0 + 387.0 7543 -33.1 -46.1 26 0.16 281 47 314.9 315.5 314.9 + 382.7 7620 -33.8 -46.8 26 0.15 280 48 315.0 315.6 315.0 + 342.0 8398 -40.5 -53.5 23 0.08 293 52 316.1 316.4 316.1 + 320.4 8839 -43.7 -56.7 22 0.06 300 54 317.6 317.8 317.6 + 318.0 8890 -44.1 -57.1 22 0.05 301 55 317.8 318.0 317.8 + 310.0 9060 -44.7 -58.7 19 0.04 304 61 319.2 319.4 319.2 + 306.1 9144 -43.9 -57.9 20 0.05 305 63 321.5 321.7 321.5 + 305.0 9169 -43.7 -57.7 20 0.05 303 63 322.1 322.4 322.1 + 300.0 9280 -43.5 -57.5 20 0.05 295 64 323.9 324.2 323.9 + 292.0 9462 -43.7 -58.7 17 0.05 293 67 326.2 326.4 326.2 + 276.0 9838 -47.1 -62.1 16 0.03 290 74 326.6 326.7 326.6 + 264.0 10132 -47.5 -62.5 16 0.03 288 79 330.1 330.3 330.1 + 251.0 10464 -49.7 -64.7 16 0.03 285 85 331.7 331.8 331.7 + 250.0 10490 -49.7 -64.7 16 0.03 285 85 332.1 332.2 332.1 + 247.0 10569 -48.7 -63.7 16 0.03 283 88 334.7 334.8 334.7 + 244.0 10649 -48.9 -63.9 16 0.03 280 91 335.6 335.7 335.6 + 243.3 10668 -48.9 -63.9 16 0.03 280 91 335.8 335.9 335.8 + 220.0 11327 -50.3 -65.3 15 0.03 280 85 343.5 343.6 343.5 + 212.0 11569 -50.5 -65.5 15 0.03 280 83 346.8 346.9 346.8 + 210.0 11631 -49.7 -64.7 16 0.03 280 83 349.0 349.1 349.0 + 200.0 11950 -49.9 -64.9 15 0.03 280 80 353.6 353.7 353.6 + 194.0 12149 -49.9 -64.9 15 0.03 279 78 356.7 356.8 356.7 + 183.0 12529 -51.3 -66.3 15 0.03 278 75 360.4 360.5 360.4 + 164.0 13233 -55.3 -68.3 18 0.02 277 69 365.2 365.3 365.2 + 152.0 13716 -56.5 -69.5 18 0.02 275 65 371.1 371.2 371.1 + 150.0 13800 -57.1 -70.1 18 0.02 275 64 371.5 371.6 371.5 + 136.0 14414 -60.5 -72.5 19 0.02 268 54 376.0 376.1 376.0 + 132.0 14600 -60.1 -72.1 19 0.02 265 51 380.0 380.1 380.0 + 131.4 14630 -60.2 -72.2 19 0.02 265 51 380.3 380.4 380.3 + 128.0 14792 -60.9 -72.9 19 0.02 266 50 381.9 382.0 381.9 + 125.0 14939 -60.1 -72.1 19 0.02 268 49 385.9 386.0 385.9 + 119.0 15240 -62.2 -73.8 20 0.01 270 48 387.4 387.5 387.4 + 112.0 15616 -64.9 -75.9 21 0.01 265 53 389.3 389.3 389.3 + 108.0 15838 -64.1 -75.1 21 0.01 265 58 394.8 394.9 394.8 + 107.8 15850 -64.1 -75.1 21 0.01 265 58 395.0 395.1 395.0 + 105.0 16010 -64.7 -75.7 21 0.01 272 50 396.9 396.9 396.9 + 103.0 16128 -62.9 -73.9 21 0.02 277 45 402.5 402.6 402.5 + 100.0 16310 -62.5 -73.5 21 0.02 285 36 406.7 406.8 406.7 + ''' + + # Parse the data + sound_data = StringIO(data_txt) + p, h, T, Td = np.loadtxt(sound_data, usecols=range(0, 4), unpack=True) + + # Create a new figure. The dimensions here give a good aspect ratio + fig = plt.figure(figsize=(6.5875, 6.2125)) + ax = fig.add_subplot(111, projection='skewx') + + plt.grid(True) + + # Plot the data using normal plotting functions, in this case using + # log scaling in Y, as dictated by the typical meteorological plot + ax.semilogy(T, p, color='C3') + ax.semilogy(Td, p, color='C2') + + # An example of a slanted line at constant X + l = ax.axvline(0, color='C0') + + # Disables the log-formatting that comes with semilogy + ax.yaxis.set_major_formatter(ScalarFormatter()) + ax.yaxis.set_minor_formatter(NullFormatter()) + ax.set_yticks(np.linspace(100, 1000, 10)) + ax.set_ylim(1050, 100) + + ax.xaxis.set_major_locator(MultipleLocator(10)) + ax.set_xlim(-50, 50) + + plt.show() +``` + +![SkewT-logP图示例](https://matplotlib.org/_images/sphx_glr_skewt_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.transforms +matplotlib.spines +matplotlib.spines.Spine +matplotlib.spines.Spine.register_axis +matplotlib.projections +matplotlib.projections.register_projection +``` + +## 下载这个示例 + +- [下载python源码: skewt.py](https://matplotlib.org/_downloads/skewt.py) +- [下载Jupyter notebook: skewt.ipynb](https://matplotlib.org/_downloads/skewt.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/system_monitor.md b/Python/matplotlab/gallery/specialty_plots/system_monitor.md new file mode 100644 index 00000000..015bf4c5 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/system_monitor.md @@ -0,0 +1,84 @@ +# 系统监视器 + +![系统监视器示例](https://matplotlib.org/_images/sphx_glr_system_monitor_001.png) + +输出: + +```python +75.0 frames per second +``` + +```python +import time +import matplotlib.pyplot as plt +import numpy as np + + +def get_memory(t): + "Simulate a function that returns system memory" + return 100 * (0.5 + 0.5 * np.sin(0.5 * np.pi * t)) + + +def get_cpu(t): + "Simulate a function that returns cpu usage" + return 100 * (0.5 + 0.5 * np.sin(0.2 * np.pi * (t - 0.25))) + + +def get_net(t): + "Simulate a function that returns network bandwidth" + return 100 * (0.5 + 0.5 * np.sin(0.7 * np.pi * (t - 0.1))) + + +def get_stats(t): + return get_memory(t), get_cpu(t), get_net(t) + +fig, ax = plt.subplots() +ind = np.arange(1, 4) + +# show the figure, but do not block +plt.show(block=False) + + +pm, pc, pn = plt.bar(ind, get_stats(0)) +pm.set_facecolor('r') +pc.set_facecolor('g') +pn.set_facecolor('b') +ax.set_xticks(ind) +ax.set_xticklabels(['Memory', 'CPU', 'Bandwidth']) +ax.set_ylim([0, 100]) +ax.set_ylabel('Percent usage') +ax.set_title('System Monitor') + +start = time.time() +for i in range(200): # run for a little while + m, c, n = get_stats(i / 10.0) + + # update the animated artists + pm.set_height(m) + pc.set_height(c) + pn.set_height(n) + + # ask the canvas to re-draw itself the next time it + # has a chance. + # For most of the GUI backends this adds an event to the queue + # of the GUI frameworks event loop. + fig.canvas.draw_idle() + try: + # make sure that the GUI framework has a chance to run its event loop + # and clear any GUI events. This needs to be in a try/except block + # because the default implementation of this method is to raise + # NotImplementedError + fig.canvas.flush_events() + except NotImplementedError: + pass + +stop = time.time() +print("{fps:.1f} frames per second".format(fps=200 / (stop - start))) +``` + +脚本的总运行时间:(0分钟2.678秒) + +## 下载这个示例 + +- [下载python源码: system_monitor.py](https://matplotlib.org/_downloads/system_monitor.py) +- [下载Jupyter notebook: system_monitor.ipynb](https://matplotlib.org/_downloads/system_monitor.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/specialty_plots/topographic_hillshading.md b/Python/matplotlab/gallery/specialty_plots/topographic_hillshading.md new file mode 100644 index 00000000..5e55ece2 --- /dev/null +++ b/Python/matplotlab/gallery/specialty_plots/topographic_hillshading.md @@ -0,0 +1,75 @@ +# 地形山体阴影 + +在“山体阴影”图上展示不同混合模式和垂直夸大的视觉效果。 + +请注意,“叠加”和“柔和”混合模式适用于复杂曲面,例如此示例,而默认的“hsv”混合模式最适用于光滑曲面,例如许多数学函数。 + +在大多数情况下,山体阴影纯粹用于视觉目的,可以安全地忽略dx / dy。 在这种情况下,您可以通过反复试验调整vert_exag(垂直夸大)以获得所需的视觉效果。 但是,此示例演示了如何使用dx和dy kwargs来确保vert_exag参数是真正的垂直夸大。 + +![地形山体阴影示例](https://matplotlib.org/_images/sphx_glr_topographic_hillshading_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.cbook import get_sample_data +from matplotlib.colors import LightSource + + +with np.load(get_sample_data('jacksboro_fault_dem.npz')) as dem: + z = dem['elevation'] + + #-- Optional dx and dy for accurate vertical exaggeration ---------------- + # If you need topographically accurate vertical exaggeration, or you don't + # want to guess at what *vert_exag* should be, you'll need to specify the + # cellsize of the grid (i.e. the *dx* and *dy* parameters). Otherwise, any + # *vert_exag* value you specify will be relative to the grid spacing of + # your input data (in other words, *dx* and *dy* default to 1.0, and + # *vert_exag* is calculated relative to those parameters). Similarly, *dx* + # and *dy* are assumed to be in the same units as your input z-values. + # Therefore, we'll need to convert the given dx and dy from decimal degrees + # to meters. + dx, dy = dem['dx'], dem['dy'] + dy = 111200 * dy + dx = 111200 * dx * np.cos(np.radians(dem['ymin'])) + #------------------------------------------------------------------------- + +# Shade from the northwest, with the sun 45 degrees from horizontal +ls = LightSource(azdeg=315, altdeg=45) +cmap = plt.cm.gist_earth + +fig, axes = plt.subplots(nrows=4, ncols=3, figsize=(8, 9)) +plt.setp(axes.flat, xticks=[], yticks=[]) + +# Vary vertical exaggeration and blend mode and plot all combinations +for col, ve in zip(axes.T, [0.1, 1, 10]): + # Show the hillshade intensity image in the first row + col[0].imshow(ls.hillshade(z, vert_exag=ve, dx=dx, dy=dy), cmap='gray') + + # Place hillshaded plots with different blend modes in the rest of the rows + for ax, mode in zip(col[1:], ['hsv', 'overlay', 'soft']): + rgb = ls.shade(z, cmap=cmap, blend_mode=mode, + vert_exag=ve, dx=dx, dy=dy) + ax.imshow(rgb) + +# Label rows and columns +for ax, ve in zip(axes[0], [0.1, 1, 10]): + ax.set_title('{0}'.format(ve), size=18) +for ax, mode in zip(axes[:, 0], ['Hillshade', 'hsv', 'overlay', 'soft']): + ax.set_ylabel(mode, size=18) + +# Group labels... +axes[0, 1].annotate('Vertical Exaggeration', (0.5, 1), xytext=(0, 30), + textcoords='offset points', xycoords='axes fraction', + ha='center', va='bottom', size=20) +axes[2, 0].annotate('Blend Mode', (0, 0.5), xytext=(-30, 0), + textcoords='offset points', xycoords='axes fraction', + ha='right', va='center', size=20, rotation=90) +fig.subplots_adjust(bottom=0.05, right=0.95) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: topographic_hillshading.py](https://matplotlib.org/_downloads/topographic_hillshading.py) +- [下载Jupyter notebook: topographic_hillshading.ipynb](https://matplotlib.org/_downloads/topographic_hillshading.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/barchart_demo.md b/Python/matplotlab/gallery/statistics/barchart_demo.md new file mode 100644 index 00000000..c7945b33 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/barchart_demo.md @@ -0,0 +1,213 @@ +# 条形图演示 + +使用Matplotlib的许多形状和大小的条形图。 + +条形图对于可视化计数或带有误差栏的汇总统计信息非常有用。这些示例显示了使用Matplotlib执行此操作的几种方法。 + +```python +# Credit: Josh Hemann + +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.ticker import MaxNLocator +from collections import namedtuple + + +n_groups = 5 + +means_men = (20, 35, 30, 35, 27) +std_men = (2, 3, 4, 1, 2) + +means_women = (25, 32, 34, 20, 25) +std_women = (3, 5, 2, 3, 3) + +fig, ax = plt.subplots() + +index = np.arange(n_groups) +bar_width = 0.35 + +opacity = 0.4 +error_config = {'ecolor': '0.3'} + +rects1 = ax.bar(index, means_men, bar_width, + alpha=opacity, color='b', + yerr=std_men, error_kw=error_config, + label='Men') + +rects2 = ax.bar(index + bar_width, means_women, bar_width, + alpha=opacity, color='r', + yerr=std_women, error_kw=error_config, + label='Women') + +ax.set_xlabel('Group') +ax.set_ylabel('Scores') +ax.set_title('Scores by group and gender') +ax.set_xticks(index + bar_width / 2) +ax.set_xticklabels(('A', 'B', 'C', 'D', 'E')) +ax.legend() + +fig.tight_layout() +plt.show() +``` + +![条形图演示](https://matplotlib.org/_images/sphx_glr_barchart_demo_001.png) + +这个例子来自一个应用程序,在这个应用程序中,小学体育教师希望能够向父母展示他们的孩子在一些健身测试中的表现,而且重要的是,相对于其他孩子的表现。 为了演示目的提取绘图代码,我们只为小Johnny Doe编制一些数据...... + +```python +Student = namedtuple('Student', ['name', 'grade', 'gender']) +Score = namedtuple('Score', ['score', 'percentile']) + +# GLOBAL CONSTANTS +testNames = ['Pacer Test', 'Flexed Arm\n Hang', 'Mile Run', 'Agility', + 'Push Ups'] +testMeta = dict(zip(testNames, ['laps', 'sec', 'min:sec', 'sec', ''])) + + +def attach_ordinal(num): + """helper function to add ordinal string to integers + + 1 -> 1st + 56 -> 56th + """ + suffixes = {str(i): v + for i, v in enumerate(['th', 'st', 'nd', 'rd', 'th', + 'th', 'th', 'th', 'th', 'th'])} + + v = str(num) + # special case early teens + if v in {'11', '12', '13'}: + return v + 'th' + return v + suffixes[v[-1]] + + +def format_score(scr, test): + """ + Build up the score labels for the right Y-axis by first + appending a carriage return to each string and then tacking on + the appropriate meta information (i.e., 'laps' vs 'seconds'). We + want the labels centered on the ticks, so if there is no meta + info (like for pushups) then don't add the carriage return to + the string + """ + md = testMeta[test] + if md: + return '{0}\n{1}'.format(scr, md) + else: + return scr + + +def format_ycursor(y): + y = int(y) + if y < 0 or y >= len(testNames): + return '' + else: + return testNames[y] + + +def plot_student_results(student, scores, cohort_size): + # create the figure + fig, ax1 = plt.subplots(figsize=(9, 7)) + fig.subplots_adjust(left=0.115, right=0.88) + fig.canvas.set_window_title('Eldorado K-8 Fitness Chart') + + pos = np.arange(len(testNames)) + + rects = ax1.barh(pos, [scores[k].percentile for k in testNames], + align='center', + height=0.5, color='m', + tick_label=testNames) + + ax1.set_title(student.name) + + ax1.set_xlim([0, 100]) + ax1.xaxis.set_major_locator(MaxNLocator(11)) + ax1.xaxis.grid(True, linestyle='--', which='major', + color='grey', alpha=.25) + + # Plot a solid vertical gridline to highlight the median position + ax1.axvline(50, color='grey', alpha=0.25) + # set X-axis tick marks at the deciles + cohort_label = ax1.text(.5, -.07, 'Cohort Size: {0}'.format(cohort_size), + horizontalalignment='center', size='small', + transform=ax1.transAxes) + + # Set the right-hand Y-axis ticks and labels + ax2 = ax1.twinx() + + scoreLabels = [format_score(scores[k].score, k) for k in testNames] + + # set the tick locations + ax2.set_yticks(pos) + # make sure that the limits are set equally on both yaxis so the + # ticks line up + ax2.set_ylim(ax1.get_ylim()) + + # set the tick labels + ax2.set_yticklabels(scoreLabels) + + ax2.set_ylabel('Test Scores') + + ax2.set_xlabel(('Percentile Ranking Across ' + '{grade} Grade {gender}s').format( + grade=attach_ordinal(student.grade), + gender=student.gender.title())) + + rect_labels = [] + # Lastly, write in the ranking inside each bar to aid in interpretation + for rect in rects: + # Rectangle widths are already integer-valued but are floating + # type, so it helps to remove the trailing decimal point and 0 by + # converting width to int type + width = int(rect.get_width()) + + rankStr = attach_ordinal(width) + # The bars aren't wide enough to print the ranking inside + if width < 5: + # Shift the text to the right side of the right edge + xloc = width + 1 + # Black against white background + clr = 'black' + align = 'left' + else: + # Shift the text to the left side of the right edge + xloc = 0.98*width + # White on magenta + clr = 'white' + align = 'right' + + # Center the text vertically in the bar + yloc = rect.get_y() + rect.get_height()/2.0 + label = ax1.text(xloc, yloc, rankStr, horizontalalignment=align, + verticalalignment='center', color=clr, weight='bold', + clip_on=True) + rect_labels.append(label) + + # make the interactive mouse over give the bar title + ax2.fmt_ydata = format_ycursor + # return all of the artists created + return {'fig': fig, + 'ax': ax1, + 'ax_right': ax2, + 'bars': rects, + 'perc_labels': rect_labels, + 'cohort_label': cohort_label} + +student = Student('Johnny Doe', 2, 'boy') +scores = dict(zip(testNames, + (Score(v, p) for v, p in + zip(['7', '48', '12:52', '17', '14'], + np.round(np.random.uniform(0, 1, + len(testNames))*100, 0))))) +cohort_size = 62 # The number of other 2nd grade boys + +arts = plot_student_results(student, scores, cohort_size) +plt.show() +``` + +![条形图演示2](https://matplotlib.org/_images/sphx_glr_barchart_demo_002.png) + +## 下载这个示例 + +- [下载python源码: barchart_demo.py](https://matplotlib.org/_downloads/barchart_demo.py) +- [下载Jupyter notebook: barchart_demo.ipynb](https://matplotlib.org/_downloads/barchart_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/boxplot.md b/Python/matplotlab/gallery/statistics/boxplot.md new file mode 100644 index 00000000..52c10f99 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/boxplot.md @@ -0,0 +1,97 @@ +# 箱形图中的艺术 + +此示例演示如何使用各种kwargs完全自定义箱形图。第一个图演示了如何删除和添加单个组件(请注意,平均值是默认情况下未显示的唯一值)。第二个图展示了如何定制艺术家的风格。它还演示了如何将胡须的极限设置为特定的百分位数(右下轴) + +关于箱形图及其历史的一般参考可以在这里找到:http://vita.had.co.nz/papers/boxplots.pdf + +```python +import numpy as np +import matplotlib.pyplot as plt + +# fake data +np.random.seed(19680801) +data = np.random.lognormal(size=(37, 4), mean=1.5, sigma=1.75) +labels = list('ABCD') +fs = 10 # fontsize +``` + +演示如何切换不同元素的显示: + +```python +fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6), sharey=True) +axes[0, 0].boxplot(data, labels=labels) +axes[0, 0].set_title('Default', fontsize=fs) + +axes[0, 1].boxplot(data, labels=labels, showmeans=True) +axes[0, 1].set_title('showmeans=True', fontsize=fs) + +axes[0, 2].boxplot(data, labels=labels, showmeans=True, meanline=True) +axes[0, 2].set_title('showmeans=True,\nmeanline=True', fontsize=fs) + +axes[1, 0].boxplot(data, labels=labels, showbox=False, showcaps=False) +tufte_title = 'Tufte Style \n(showbox=False,\nshowcaps=False)' +axes[1, 0].set_title(tufte_title, fontsize=fs) + +axes[1, 1].boxplot(data, labels=labels, notch=True, bootstrap=10000) +axes[1, 1].set_title('notch=True,\nbootstrap=10000', fontsize=fs) + +axes[1, 2].boxplot(data, labels=labels, showfliers=False) +axes[1, 2].set_title('showfliers=False', fontsize=fs) + +for ax in axes.flatten(): + ax.set_yscale('log') + ax.set_yticklabels([]) + +fig.subplots_adjust(hspace=0.4) +plt.show() +``` + +![箱形图示例](https://matplotlib.org/_images/sphx_glr_boxplot_001.png) + +演示如何自定义显示不同的元素: + +```python +boxprops = dict(linestyle='--', linewidth=3, color='darkgoldenrod') +flierprops = dict(marker='o', markerfacecolor='green', markersize=12, + linestyle='none') +medianprops = dict(linestyle='-.', linewidth=2.5, color='firebrick') +meanpointprops = dict(marker='D', markeredgecolor='black', + markerfacecolor='firebrick') +meanlineprops = dict(linestyle='--', linewidth=2.5, color='purple') + +fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6), sharey=True) +axes[0, 0].boxplot(data, boxprops=boxprops) +axes[0, 0].set_title('Custom boxprops', fontsize=fs) + +axes[0, 1].boxplot(data, flierprops=flierprops, medianprops=medianprops) +axes[0, 1].set_title('Custom medianprops\nand flierprops', fontsize=fs) + +axes[0, 2].boxplot(data, whis='range') +axes[0, 2].set_title('whis="range"', fontsize=fs) + +axes[1, 0].boxplot(data, meanprops=meanpointprops, meanline=False, + showmeans=True) +axes[1, 0].set_title('Custom mean\nas point', fontsize=fs) + +axes[1, 1].boxplot(data, meanprops=meanlineprops, meanline=True, + showmeans=True) +axes[1, 1].set_title('Custom mean\nas line', fontsize=fs) + +axes[1, 2].boxplot(data, whis=[15, 85]) +axes[1, 2].set_title('whis=[15, 85]\n#percentiles', fontsize=fs) + +for ax in axes.flatten(): + ax.set_yscale('log') + ax.set_yticklabels([]) + +fig.suptitle("I never said they'd be pretty") +fig.subplots_adjust(hspace=0.4) +plt.show() +``` + +![箱型图示例2](https://matplotlib.org/_images/sphx_glr_boxplot_002.png) + +## 下载这个示例 + +- [下载python源码: boxplot.py](https://matplotlib.org/_downloads/boxplot.py) +- [下载Jupyter notebook: boxplot.ipynb](https://matplotlib.org/_downloads/boxplot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/boxplot_color.md b/Python/matplotlab/gallery/statistics/boxplot_color.md new file mode 100644 index 00000000..f11c0d28 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/boxplot_color.md @@ -0,0 +1,53 @@ +# 带有自定义填充颜色的箱形图 + +此图说明了如何创建两种类型的箱形图(矩形和缺口),以及如何通过访问框图的艺术家属性来使用自定义颜色填充它们。 此外,labels参数用于为每个样本提供x-tick标签。 + +关于箱形图及其历史的一般参考可以在这里找到:http://vita.had.co.nz/papers/boxplots.pdf + +![自定义颜色箱形图示例](https://matplotlib.org/_images/sphx_glr_boxplot_color_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Random test data +np.random.seed(19680801) +all_data = [np.random.normal(0, std, size=100) for std in range(1, 4)] +labels = ['x1', 'x2', 'x3'] + +fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(9, 4)) + +# rectangular box plot +bplot1 = axes[0].boxplot(all_data, + vert=True, # vertical box alignment + patch_artist=True, # fill with color + labels=labels) # will be used to label x-ticks +axes[0].set_title('Rectangular box plot') + +# notch shape box plot +bplot2 = axes[1].boxplot(all_data, + notch=True, # notch shape + vert=True, # vertical box alignment + patch_artist=True, # fill with color + labels=labels) # will be used to label x-ticks +axes[1].set_title('Notched box plot') + +# fill with colors +colors = ['pink', 'lightblue', 'lightgreen'] +for bplot in (bplot1, bplot2): + for patch, color in zip(bplot['boxes'], colors): + patch.set_facecolor(color) + +# adding horizontal grid lines +for ax in axes: + ax.yaxis.grid(True) + ax.set_xlabel('Three separate samples') + ax.set_ylabel('Observed values') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: boxplot_color.py](https://matplotlib.org/_downloads/boxplot_color.py) +- [下载Jupyter notebook: boxplot_color.ipynb](https://matplotlib.org/_downloads/boxplot_color.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/boxplot_demo.md b/Python/matplotlab/gallery/statistics/boxplot_demo.md new file mode 100644 index 00000000..43b0dbc9 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/boxplot_demo.md @@ -0,0 +1,238 @@ +# 箱形图 + +用matplotlib可视化箱形图。 + +以下示例展示了如何使用Matplotlib可视化箱图。有许多选项可以控制它们的外观以及用于汇总数据的统计信息。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.patches import Polygon + + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# fake up some data +spread = np.random.rand(50) * 100 +center = np.ones(25) * 50 +flier_high = np.random.rand(10) * 100 + 100 +flier_low = np.random.rand(10) * -100 +data = np.concatenate((spread, center, flier_high, flier_low)) + +fig, axs = plt.subplots(2, 3) + +# basic plot +axs[0, 0].boxplot(data) +axs[0, 0].set_title('basic plot') + +# notched plot +axs[0, 1].boxplot(data, 1) +axs[0, 1].set_title('notched plot') + +# change outlier point symbols +axs[0, 2].boxplot(data, 0, 'gD') +axs[0, 2].set_title('change outlier\npoint symbols') + +# don't show outlier points +axs[1, 0].boxplot(data, 0, '') +axs[1, 0].set_title("don't show\noutlier points") + +# horizontal boxes +axs[1, 1].boxplot(data, 0, 'rs', 0) +axs[1, 1].set_title('horizontal boxes') + +# change whisker length +axs[1, 2].boxplot(data, 0, 'rs', 0, 0.75) +axs[1, 2].set_title('change whisker length') + +fig.subplots_adjust(left=0.08, right=0.98, bottom=0.05, top=0.9, + hspace=0.4, wspace=0.3) + +# fake up some more data +spread = np.random.rand(50) * 100 +center = np.ones(25) * 40 +flier_high = np.random.rand(10) * 100 + 100 +flier_low = np.random.rand(10) * -100 +d2 = np.concatenate((spread, center, flier_high, flier_low)) +data.shape = (-1, 1) +d2.shape = (-1, 1) +# Making a 2-D array only works if all the columns are the +# same length. If they are not, then use a list instead. +# This is actually more efficient because boxplot converts +# a 2-D array into a list of vectors internally anyway. +data = [data, d2, d2[::2, 0]] + +# Multiple box plots on one Axes +fig, ax = plt.subplots() +ax.boxplot(data) + +plt.show() +``` + +下面我们将从五个不同的概率分布生成数据,每个概率分布具有不同的特征。 我们想要了解数据的IID引导程序重采样如何保留原始样本的分布属性,并且箱形图是进行此评估的一种可视化工具。 + +```python +numDists = 5 +randomDists = ['Normal(1,1)', ' Lognormal(1,1)', 'Exp(1)', 'Gumbel(6,4)', + 'Triangular(2,9,11)'] +N = 500 + +norm = np.random.normal(1, 1, N) +logn = np.random.lognormal(1, 1, N) +expo = np.random.exponential(1, N) +gumb = np.random.gumbel(6, 4, N) +tria = np.random.triangular(2, 9, 11, N) + +# Generate some random indices that we'll use to resample the original data +# arrays. For code brevity, just use the same random indices for each array +bootstrapIndices = np.random.random_integers(0, N - 1, N) +normBoot = norm[bootstrapIndices] +expoBoot = expo[bootstrapIndices] +gumbBoot = gumb[bootstrapIndices] +lognBoot = logn[bootstrapIndices] +triaBoot = tria[bootstrapIndices] + +data = [norm, normBoot, logn, lognBoot, expo, expoBoot, gumb, gumbBoot, + tria, triaBoot] + +fig, ax1 = plt.subplots(figsize=(10, 6)) +fig.canvas.set_window_title('A Boxplot Example') +fig.subplots_adjust(left=0.075, right=0.95, top=0.9, bottom=0.25) + +bp = ax1.boxplot(data, notch=0, sym='+', vert=1, whis=1.5) +plt.setp(bp['boxes'], color='black') +plt.setp(bp['whiskers'], color='black') +plt.setp(bp['fliers'], color='red', marker='+') + +# Add a horizontal grid to the plot, but make it very light in color +# so we can use it for reading data values but not be distracting +ax1.yaxis.grid(True, linestyle='-', which='major', color='lightgrey', + alpha=0.5) + +# Hide these grid behind plot objects +ax1.set_axisbelow(True) +ax1.set_title('Comparison of IID Bootstrap Resampling Across Five Distributions') +ax1.set_xlabel('Distribution') +ax1.set_ylabel('Value') + +# Now fill the boxes with desired colors +boxColors = ['darkkhaki', 'royalblue'] +numBoxes = numDists*2 +medians = list(range(numBoxes)) +for i in range(numBoxes): + box = bp['boxes'][i] + boxX = [] + boxY = [] + for j in range(5): + boxX.append(box.get_xdata()[j]) + boxY.append(box.get_ydata()[j]) + boxCoords = np.column_stack([boxX, boxY]) + # Alternate between Dark Khaki and Royal Blue + k = i % 2 + boxPolygon = Polygon(boxCoords, facecolor=boxColors[k]) + ax1.add_patch(boxPolygon) + # Now draw the median lines back over what we just filled in + med = bp['medians'][i] + medianX = [] + medianY = [] + for j in range(2): + medianX.append(med.get_xdata()[j]) + medianY.append(med.get_ydata()[j]) + ax1.plot(medianX, medianY, 'k') + medians[i] = medianY[0] + # Finally, overplot the sample averages, with horizontal alignment + # in the center of each box + ax1.plot([np.average(med.get_xdata())], [np.average(data[i])], + color='w', marker='*', markeredgecolor='k') + +# Set the axes ranges and axes labels +ax1.set_xlim(0.5, numBoxes + 0.5) +top = 40 +bottom = -5 +ax1.set_ylim(bottom, top) +ax1.set_xticklabels(np.repeat(randomDists, 2), + rotation=45, fontsize=8) + +# Due to the Y-axis scale being different across samples, it can be +# hard to compare differences in medians across the samples. Add upper +# X-axis tick labels with the sample medians to aid in comparison +# (just use two decimal places of precision) +pos = np.arange(numBoxes) + 1 +upperLabels = [str(np.round(s, 2)) for s in medians] +weights = ['bold', 'semibold'] +for tick, label in zip(range(numBoxes), ax1.get_xticklabels()): + k = tick % 2 + ax1.text(pos[tick], top - (top*0.05), upperLabels[tick], + horizontalalignment='center', size='x-small', weight=weights[k], + color=boxColors[k]) + +# Finally, add a basic legend +fig.text(0.80, 0.08, str(N) + ' Random Numbers', + backgroundcolor=boxColors[0], color='black', weight='roman', + size='x-small') +fig.text(0.80, 0.045, 'IID Bootstrap Resample', + backgroundcolor=boxColors[1], + color='white', weight='roman', size='x-small') +fig.text(0.80, 0.015, '*', color='white', backgroundcolor='silver', + weight='roman', size='medium') +fig.text(0.815, 0.013, ' Average Value', color='black', weight='roman', + size='x-small') + +plt.show() +``` + +![箱形图](https://matplotlib.org/_images/sphx_glr_boxplot_demo_003.png) + +在这里,我们编写一个自定义函数来引导置信区间。然后我们可以使用boxplot和此函数来显示这些间隔。 + +```python +def fakeBootStrapper(n): + ''' + This is just a placeholder for the user's method of + bootstrapping the median and its confidence intervals. + + Returns an arbitrary median and confidence intervals + packed into a tuple + ''' + if n == 1: + med = 0.1 + CI = (-0.25, 0.25) + else: + med = 0.2 + CI = (-0.35, 0.50) + + return med, CI + +inc = 0.1 +e1 = np.random.normal(0, 1, size=(500,)) +e2 = np.random.normal(0, 1, size=(500,)) +e3 = np.random.normal(0, 1 + inc, size=(500,)) +e4 = np.random.normal(0, 1 + 2*inc, size=(500,)) + +treatments = [e1, e2, e3, e4] +med1, CI1 = fakeBootStrapper(1) +med2, CI2 = fakeBootStrapper(2) +medians = [None, None, med1, med2] +conf_intervals = [None, None, CI1, CI2] + +fig, ax = plt.subplots() +pos = np.array(range(len(treatments))) + 1 +bp = ax.boxplot(treatments, sym='k+', positions=pos, + notch=1, bootstrap=5000, + usermedians=medians, + conf_intervals=conf_intervals) + +ax.set_xlabel('treatment') +ax.set_ylabel('response') +plt.setp(bp['whiskers'], color='k', linestyle='-') +plt.setp(bp['fliers'], markersize=3.0) +plt.show() +``` + +![箱形图2](https://matplotlib.org/_images/sphx_glr_boxplot_demo_004.png) + +## 下载这个示例 + +- [下载python源码: boxplot_demo.py](https://matplotlib.org/_downloads/boxplot_demo.py) +- [下载Jupyter notebook: boxplot_demo.ipynb](https://matplotlib.org/_downloads/boxplot_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/boxplot_vs_violin.md b/Python/matplotlab/gallery/statistics/boxplot_vs_violin.md new file mode 100644 index 00000000..12809a6c --- /dev/null +++ b/Python/matplotlab/gallery/statistics/boxplot_vs_violin.md @@ -0,0 +1,54 @@ +# 箱形图与小提琴图对比 + +请注意,尽管小提琴图与Tukey(1977)的箱形图密切相关,但它们还添加了有用的信息,例如样本数据的分布(密度轨迹)。 + +默认情况下,箱形图显示1.5 *四分位数范围之外的数据点作为晶须上方或下方的异常值,而小提琴图则显示数据的整个范围。 + +关于箱形图及其历史的一般参考可以在这里找到:http://vita.had.co.nz/papers/boxplots.pdf + +小提琴图需要 matplotlib >= 1.4。 + +有关小提琴绘制的更多信息,scikit-learn文档有一个很棒的部分:http://scikit-learn.org/stable/modules/density.html + +![箱形图与小提琴图对比示例](https://matplotlib.org/_images/sphx_glr_boxplot_vs_violin_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(9, 4)) + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# generate some random test data +all_data = [np.random.normal(0, std, 100) for std in range(6, 10)] + +# plot violin plot +axes[0].violinplot(all_data, + showmeans=False, + showmedians=True) +axes[0].set_title('Violin plot') + +# plot box plot +axes[1].boxplot(all_data) +axes[1].set_title('Box plot') + +# adding horizontal grid lines +for ax in axes: + ax.yaxis.grid(True) + ax.set_xticks([y + 1 for y in range(len(all_data))]) + ax.set_xlabel('Four separate samples') + ax.set_ylabel('Observed values') + +# add x-tick labels +plt.setp(axes, xticks=[y + 1 for y in range(len(all_data))], + xticklabels=['x1', 'x2', 'x3', 'x4']) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: boxplot_vs_violin.py](https://matplotlib.org/_downloads/boxplot_vs_violin.py) +- [下载Jupyter notebook: boxplot_vs_violin.ipynb](https://matplotlib.org/_downloads/boxplot_vs_violin.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/bxp.md b/Python/matplotlab/gallery/statistics/bxp.md new file mode 100644 index 00000000..c538cfca --- /dev/null +++ b/Python/matplotlab/gallery/statistics/bxp.md @@ -0,0 +1,112 @@ +# 箱形图抽屉功能 + +此示例演示如何将预先计算的箱形图统计信息传递到框图抽屉。第一个图演示了如何删除和添加单个组件(请注意,平均值是默认情况下未显示的唯一值)。第二个图展示了如何定制艺术风格。 + +关于箱形图及其历史的一个很好的一般参考可以在这里找到:http://vita.had.co.nz/papers/boxplots.pdf + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook + +# fake data +np.random.seed(19680801) +data = np.random.lognormal(size=(37, 4), mean=1.5, sigma=1.75) +labels = list('ABCD') + +# compute the boxplot stats +stats = cbook.boxplot_stats(data, labels=labels, bootstrap=10000) +``` + +在我们计算了统计数据之后,我们可以通过并改变任何事情。 为了证明这一点,我将每组的中位数设置为所有数据的中位数,并将均值加倍 + +```python +for n in range(len(stats)): + stats[n]['med'] = np.median(data) + stats[n]['mean'] *= 2 + +print(list(stats[0])) + +fs = 10 # fontsize +``` + +输出: + +```python +['label', 'mean', 'iqr', 'cilo', 'cihi', 'whishi', 'whislo', 'fliers', 'q1', 'med', 'q3'] +``` + +演示如何切换不同元素的显示: + +```python +fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6), sharey=True) +axes[0, 0].bxp(stats) +axes[0, 0].set_title('Default', fontsize=fs) + +axes[0, 1].bxp(stats, showmeans=True) +axes[0, 1].set_title('showmeans=True', fontsize=fs) + +axes[0, 2].bxp(stats, showmeans=True, meanline=True) +axes[0, 2].set_title('showmeans=True,\nmeanline=True', fontsize=fs) + +axes[1, 0].bxp(stats, showbox=False, showcaps=False) +tufte_title = 'Tufte Style\n(showbox=False,\nshowcaps=False)' +axes[1, 0].set_title(tufte_title, fontsize=fs) + +axes[1, 1].bxp(stats, shownotches=True) +axes[1, 1].set_title('notch=True', fontsize=fs) + +axes[1, 2].bxp(stats, showfliers=False) +axes[1, 2].set_title('showfliers=False', fontsize=fs) + +for ax in axes.flatten(): + ax.set_yscale('log') + ax.set_yticklabels([]) + +fig.subplots_adjust(hspace=0.4) +plt.show() +``` + +![箱形图抽屉功能示例](https://matplotlib.org/_images/sphx_glr_bxp_001.png) + +演示如何自定义显示不同的元素: + +```python +boxprops = dict(linestyle='--', linewidth=3, color='darkgoldenrod') +flierprops = dict(marker='o', markerfacecolor='green', markersize=12, + linestyle='none') +medianprops = dict(linestyle='-.', linewidth=2.5, color='firebrick') +meanpointprops = dict(marker='D', markeredgecolor='black', + markerfacecolor='firebrick') +meanlineprops = dict(linestyle='--', linewidth=2.5, color='purple') + +fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(6, 6), sharey=True) +axes[0, 0].bxp(stats, boxprops=boxprops) +axes[0, 0].set_title('Custom boxprops', fontsize=fs) + +axes[0, 1].bxp(stats, flierprops=flierprops, medianprops=medianprops) +axes[0, 1].set_title('Custom medianprops\nand flierprops', fontsize=fs) + +axes[1, 0].bxp(stats, meanprops=meanpointprops, meanline=False, + showmeans=True) +axes[1, 0].set_title('Custom mean\nas point', fontsize=fs) + +axes[1, 1].bxp(stats, meanprops=meanlineprops, meanline=True, + showmeans=True) +axes[1, 1].set_title('Custom mean\nas line', fontsize=fs) + +for ax in axes.flatten(): + ax.set_yscale('log') + ax.set_yticklabels([]) + +fig.suptitle("I never said they'd be pretty") +fig.subplots_adjust(hspace=0.4) +plt.show() +``` + +![箱形图抽屉功能](https://matplotlib.org/_images/sphx_glr_bxp_002.png) + +## 下载这个示例 + +- [下载python源码: bxp.py](https://matplotlib.org/_downloads/bxp.py) +- [下载Jupyter notebook: bxp.ipynb](https://matplotlib.org/_downloads/bxp.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/customized_violin.md b/Python/matplotlab/gallery/statistics/customized_violin.md new file mode 100644 index 00000000..314f695b --- /dev/null +++ b/Python/matplotlab/gallery/statistics/customized_violin.md @@ -0,0 +1,75 @@ +# 自定义小提琴图 + +此示例演示如何完全自定义小提琴图。 第一个图通过仅提供数据来显示默认样式。第二个图首先限制了matplotlib用额外的kwargs绘制的内容。然后在顶部绘制箱形图的简化表示。 最后,修改了小提琴图的风格。 + +有关小提琴图的更多信息,scikit-learn文档有一个很棒的部分:http://scikit-learn.org/stable/modules/density.html + +![自定义小提琴图示例](https://matplotlib.org/_images/sphx_glr_customized_violin_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def adjacent_values(vals, q1, q3): + upper_adjacent_value = q3 + (q3 - q1) * 1.5 + upper_adjacent_value = np.clip(upper_adjacent_value, q3, vals[-1]) + + lower_adjacent_value = q1 - (q3 - q1) * 1.5 + lower_adjacent_value = np.clip(lower_adjacent_value, vals[0], q1) + return lower_adjacent_value, upper_adjacent_value + + +def set_axis_style(ax, labels): + ax.get_xaxis().set_tick_params(direction='out') + ax.xaxis.set_ticks_position('bottom') + ax.set_xticks(np.arange(1, len(labels) + 1)) + ax.set_xticklabels(labels) + ax.set_xlim(0.25, len(labels) + 0.75) + ax.set_xlabel('Sample name') + + +# create test data +np.random.seed(19680801) +data = [sorted(np.random.normal(0, std, 100)) for std in range(1, 5)] + +fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(9, 4), sharey=True) + +ax1.set_title('Default violin plot') +ax1.set_ylabel('Observed values') +ax1.violinplot(data) + +ax2.set_title('Customized violin plot') +parts = ax2.violinplot( + data, showmeans=False, showmedians=False, + showextrema=False) + +for pc in parts['bodies']: + pc.set_facecolor('#D43F3A') + pc.set_edgecolor('black') + pc.set_alpha(1) + +quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1) +whiskers = np.array([ + adjacent_values(sorted_array, q1, q3) + for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]) +whiskersMin, whiskersMax = whiskers[:, 0], whiskers[:, 1] + +inds = np.arange(1, len(medians) + 1) +ax2.scatter(inds, medians, marker='o', color='white', s=30, zorder=3) +ax2.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5) +ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1) + +# set style for the axes +labels = ['A', 'B', 'C', 'D'] +for ax in [ax1, ax2]: + set_axis_style(ax, labels) + +plt.subplots_adjust(bottom=0.15, wspace=0.05) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: customized_violin.py](https://matplotlib.org/_downloads/customized_violin.py) +- [下载Jupyter notebook: customized_violin.ipynb](https://matplotlib.org/_downloads/customized_violin.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/errorbar.md b/Python/matplotlab/gallery/statistics/errorbar.md new file mode 100644 index 00000000..29716ee7 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/errorbar.md @@ -0,0 +1,23 @@ +# 误差条形图功能 + +这展示了误差条形图功能的最基本用法。在这种情况下,为x方向和y方向的误差提供常数值。 + +![误差条形图示例](https://matplotlib.org/_images/sphx_glr_errorbar_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# example data +x = np.arange(0.1, 4, 0.5) +y = np.exp(-x) + +fig, ax = plt.subplots() +ax.errorbar(x, y, xerr=0.2, yerr=0.4) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: errorbar.py](https://matplotlib.org/_downloads/errorbar.py) +- [下载Jupyter notebook: errorbar.ipynb](https://matplotlib.org/_downloads/errorbar.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/errorbar_features.md b/Python/matplotlab/gallery/statistics/errorbar_features.md new file mode 100644 index 00000000..22198521 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/errorbar_features.md @@ -0,0 +1,47 @@ +# 误差条形图的不同方法 + +可以将错误指定为常数值(如errorbar_demo.py中所示)。但是,此示例通过指定错误值数组来演示它们的不同之处。 + +如果原始x和y数据的长度为N,则有两个选项: + +1. 数组形状为(N,): + 每个点的误差都不同,但误差值是对称的(即,上下两个值相等)。 +1. 数组形状为(2, N): + 每个点的误差不同,并且下限和上限(按该顺序)不同(非对称情况)。 + +此外,此示例演示如何使用带有误差线的对数刻度。 + +![](https://matplotlib.org/_images/sphx_glr_errorbar_features_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# example data +x = np.arange(0.1, 4, 0.5) +y = np.exp(-x) + +# example error bar values that vary with x-position +error = 0.1 + 0.2 * x + +fig, (ax0, ax1) = plt.subplots(nrows=2, sharex=True) +ax0.errorbar(x, y, yerr=error, fmt='-o') +ax0.set_title('variable, symmetric error') + +# error bar values w/ different -/+ errors that +# also vary with the x-position +lower_error = 0.4 * error +upper_error = error +asymmetric_error = [lower_error, upper_error] + +ax1.errorbar(x, y, xerr=asymmetric_error, fmt='o') +ax1.set_title('variable, asymmetric error') +ax1.set_yscale('log') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: errorbar_features.py](https://matplotlib.org/_downloads/errorbar_features.py) +- [下载Jupyter notebook: errorbar_features.ipynb](https://matplotlib.org/_downloads/errorbar_features.ipynb) + diff --git a/Python/matplotlab/gallery/statistics/errorbar_limits.md b/Python/matplotlab/gallery/statistics/errorbar_limits.md new file mode 100644 index 00000000..99c0307d --- /dev/null +++ b/Python/matplotlab/gallery/statistics/errorbar_limits.md @@ -0,0 +1,73 @@ +# 误差条形图中的上限和下限 + +在matplotlib中,误差条可以有“限制”。对误差线应用限制实质上使误差单向。因此,可以分别通过``uplims``,``lolims``,``xuplims``和``xlolims``参数在y方向和x方向上应用上限和下限。 这些参数可以是标量或布尔数组。 + +例如,如果``xlolims``为``True``,则``x-error``条形将仅从数据扩展到递增值。如果``uplims``是一个填充了``False``的数组,除了第4和第7个值之外,所有y误差条都是双向的,除了第4和第7个条形,它们将从数据延伸到减小的y值。 + +![条形图限制示例](https://matplotlib.org/_images/sphx_glr_errorbar_limits_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# example data +x = np.array([0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]) +y = np.exp(-x) +xerr = 0.1 +yerr = 0.2 + +# lower & upper limits of the error +lolims = np.array([0, 0, 1, 0, 1, 0, 0, 0, 1, 0], dtype=bool) +uplims = np.array([0, 1, 0, 0, 0, 1, 0, 0, 0, 1], dtype=bool) +ls = 'dotted' + +fig, ax = plt.subplots(figsize=(7, 4)) + +# standard error bars +ax.errorbar(x, y, xerr=xerr, yerr=yerr, linestyle=ls) + +# including upper limits +ax.errorbar(x, y + 0.5, xerr=xerr, yerr=yerr, uplims=uplims, + linestyle=ls) + +# including lower limits +ax.errorbar(x, y + 1.0, xerr=xerr, yerr=yerr, lolims=lolims, + linestyle=ls) + +# including upper and lower limits +ax.errorbar(x, y + 1.5, xerr=xerr, yerr=yerr, + lolims=lolims, uplims=uplims, + marker='o', markersize=8, + linestyle=ls) + +# Plot a series with lower and upper limits in both x & y +# constant x-error with varying y-error +xerr = 0.2 +yerr = np.zeros_like(x) + 0.2 +yerr[[3, 6]] = 0.3 + +# mock up some limits by modifying previous data +xlolims = lolims +xuplims = uplims +lolims = np.zeros(x.shape) +uplims = np.zeros(x.shape) +lolims[[6]] = True # only limited at this index +uplims[[3]] = True # only limited at this index + +# do the plotting +ax.errorbar(x, y + 2.1, xerr=xerr, yerr=yerr, + xlolims=xlolims, xuplims=xuplims, + uplims=uplims, lolims=lolims, + marker='o', markersize=8, + linestyle='none') + +# tidy up the figure +ax.set_xlim((0, 5.5)) +ax.set_title('Errorbar upper and lower limits') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: errorbar_limits.py](https://matplotlib.org/_downloads/errorbar_limits.py) +- [下载Jupyter notebook: errorbar_limits.ipynb](https://matplotlib.org/_downloads/errorbar_limits.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/errorbars_and_boxes.md b/Python/matplotlab/gallery/statistics/errorbars_and_boxes.md new file mode 100644 index 00000000..77c79b81 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/errorbars_and_boxes.md @@ -0,0 +1,68 @@ +# 使用PatchCollection在误差图中创建箱型图 + +在这个例子中,我们通过在x方向和y方向上添加由条形极限定义的矩形块来拼写一个非常标准的误差条形图。为此,我们必须编写自己的自定义函数 ``make_error_boxes``。仔细检查此函数将揭示matplotlib编写函数的首选模式: + +1. an Axes object is passed directly to the function +1. the function operates on the Axes methods directly, not through the ``pyplot`` interface +1. plotting kwargs that could be abbreviated are spelled out for better code readability in the future (for example we use ``facecolor`` instead of fc) +1. the artists returned by the Axes plotting methods are then returned by the function so that, if desired, their styles can be modified later outside of the function (they are not modified in this example). + +![创建箱型图示例](https://matplotlib.org/_images/sphx_glr_errorbars_and_boxes_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.collections import PatchCollection +from matplotlib.patches import Rectangle + +# Number of data points +n = 5 + +# Dummy data +np.random.seed(19680801) +x = np.arange(0, n, 1) +y = np.random.rand(n) * 5. + +# Dummy errors (above and below) +xerr = np.random.rand(2, n) + 0.1 +yerr = np.random.rand(2, n) + 0.2 + + +def make_error_boxes(ax, xdata, ydata, xerror, yerror, facecolor='r', + edgecolor='None', alpha=0.5): + + # Create list for all the error patches + errorboxes = [] + + # Loop over data points; create box from errors at each point + for x, y, xe, ye in zip(xdata, ydata, xerror.T, yerror.T): + rect = Rectangle((x - xe[0], y - ye[0]), xe.sum(), ye.sum()) + errorboxes.append(rect) + + # Create patch collection with specified colour/alpha + pc = PatchCollection(errorboxes, facecolor=facecolor, alpha=alpha, + edgecolor=edgecolor) + + # Add collection to axes + ax.add_collection(pc) + + # Plot errorbars + artists = ax.errorbar(xdata, ydata, xerr=xerror, yerr=yerror, + fmt='None', ecolor='k') + + return artists + + +# Create figure and axes +fig, ax = plt.subplots(1) + +# Call function to create error boxes +_ = make_error_boxes(ax, x, y, xerr, yerr) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: errorbars_and_boxes.py](https://matplotlib.org/_downloads/errorbars_and_boxes.py) +- [下载Jupyter notebook: errorbars_and_boxes.ipynb](https://matplotlib.org/_downloads/errorbars_and_boxes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/hexbin_demo.md b/Python/matplotlab/gallery/statistics/hexbin_demo.md new file mode 100644 index 00000000..e138d7b0 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/hexbin_demo.md @@ -0,0 +1,46 @@ +# Hexbin 演示 + +使用Matplotlib绘制hexbins。 + +Hexbin是一种轴方法或pyplot函数,它基本上是具有六边形单元的二维直方图的pcolor。 它可以比散点图更具信息性。 在下面的第一个图中,尝试用'scatter'代替'hexbin'。 + +![Hexbin演示](https://matplotlib.org/_images/sphx_glr_hexbin_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + +n = 100000 +x = np.random.standard_normal(n) +y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n) +xmin = x.min() +xmax = x.max() +ymin = y.min() +ymax = y.max() + +fig, axs = plt.subplots(ncols=2, sharey=True, figsize=(7, 4)) +fig.subplots_adjust(hspace=0.5, left=0.07, right=0.93) +ax = axs[0] +hb = ax.hexbin(x, y, gridsize=50, cmap='inferno') +ax.axis([xmin, xmax, ymin, ymax]) +ax.set_title("Hexagon binning") +cb = fig.colorbar(hb, ax=ax) +cb.set_label('counts') + +ax = axs[1] +hb = ax.hexbin(x, y, gridsize=50, bins='log', cmap='inferno') +ax.axis([xmin, xmax, ymin, ymax]) +ax.set_title("With a log color scale") +cb = fig.colorbar(hb, ax=ax) +cb.set_label('log10(N)') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: hexbin_demo.py](https://matplotlib.org/_downloads/hexbin_demo.py) +- [下载Jupyter notebook: hexbin_demo.ipynb](https://matplotlib.org/_downloads/hexbin_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/hist.md b/Python/matplotlab/gallery/statistics/hist.md new file mode 100644 index 00000000..1ec9e953 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/hist.md @@ -0,0 +1,102 @@ +# 直方图 + +演示如何使用matplotlib绘制直方图。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib import colors +from matplotlib.ticker import PercentFormatter + +# Fixing random state for reproducibility +np.random.seed(19680801) +``` + +## 生成数据并绘制简单的直方图 + +要生成一维直方图,我们只需要一个数字矢量。对于二维直方图,我们需要第二个矢量。我们将在下面生成两者,并显示每个向量的直方图。 + +```python +N_points = 100000 +n_bins = 20 + +# Generate a normal distribution, center at x=0 and y=5 +x = np.random.randn(N_points) +y = .4 * x + np.random.randn(100000) + 5 + +fig, axs = plt.subplots(1, 2, sharey=True, tight_layout=True) + +# We can set the number of bins with the `bins` kwarg +axs[0].hist(x, bins=n_bins) +axs[1].hist(y, bins=n_bins) +``` + +![直方图示例](https://matplotlib.org/_images/sphx_glr_hist_001.png) + +## 更新直方图颜色 + +直方图方法(除其他外)返回一个修补程序对象。这使我们可以访问所绘制对象的特性。使用这个,我们可以根据自己的喜好编辑直方图。让我们根据每个条的y值更改其颜色。 + +```python +fig, axs = plt.subplots(1, 2, tight_layout=True) + +# N is the count in each bin, bins is the lower-limit of the bin +N, bins, patches = axs[0].hist(x, bins=n_bins) + +# We'll color code by height, but you could use any scalar +fracs = N / N.max() + +# we need to normalize the data to 0..1 for the full range of the colormap +norm = colors.Normalize(fracs.min(), fracs.max()) + +# Now, we'll loop through our objects and set the color of each accordingly +for thisfrac, thispatch in zip(fracs, patches): + color = plt.cm.viridis(norm(thisfrac)) + thispatch.set_facecolor(color) + +# We can also normalize our inputs by the total number of counts +axs[1].hist(x, bins=n_bins, density=True) + +# Now we format the y-axis to display percentage +axs[1].yaxis.set_major_formatter(PercentFormatter(xmax=1)) +``` + +![直方图示例2](https://matplotlib.org/_images/sphx_glr_hist_002.png) + +## 绘制二维直方图 + +要绘制二维直方图,只需两个长度相同的向量,对应于直方图的每个轴。 + +```python +fig, ax = plt.subplots(tight_layout=True) +hist = ax.hist2d(x, y) +``` + +![直方图示例3](https://matplotlib.org/_images/sphx_glr_hist_003.png) + +## 自定义直方图 + +自定义2D直方图类似于1D情况,您可以控制可视组件,如存储箱大小或颜色规格化。 + +```python +fig, axs = plt.subplots(3, 1, figsize=(5, 15), sharex=True, sharey=True, + tight_layout=True) + +# We can increase the number of bins on each axis +axs[0].hist2d(x, y, bins=40) + +# As well as define normalization of the colors +axs[1].hist2d(x, y, bins=40, norm=colors.LogNorm()) + +# We can also define custom numbers of bins for each axis +axs[2].hist2d(x, y, bins=(80, 10), norm=colors.LogNorm()) + +plt.show() +``` + +![直方图示例4](https://matplotlib.org/_images/sphx_glr_hist_004.png) + +## 下载这个示例 + +- [下载python源码: hist.py](https://matplotlib.org/_downloads/hist.py) +- [下载Jupyter notebook: hist.ipynb](https://matplotlib.org/_downloads/hist.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/histogram_cumulative.md b/Python/matplotlab/gallery/statistics/histogram_cumulative.md new file mode 100644 index 00000000..1abdfeca --- /dev/null +++ b/Python/matplotlab/gallery/statistics/histogram_cumulative.md @@ -0,0 +1,55 @@ +# 使用直方图绘制累积分布 + +这展示了如何绘制一个累积的、归一化的直方图作为一个步骤函数,以便可视化一个样本的经验累积分布函数(CDF)。文中还给出了理论上的CDF值。 + +演示了 ``hist`` 函数的其他几个选项。也就是说,我们使用 ``normed`` 参数来标准化直方图以及 ``累积`` 参数的几个不同选项。normed参数采用布尔值。 如果为 ``True`` ,则会对箱高进行缩放,使得柱状图的总面积为1。``累积``的kwarg稍微有些细微差别。与normed一样,您可以将其传递为True或False,但您也可以将其传递-1以反转分布。 + +由于我们显示了归一化和累积直方图,因此这些曲线实际上是样本的累积分布函数(CDF)。 在工程学中,经验CDF有时被称为“非超越”曲线。 换句话说,您可以查看给定-x值的y值,以使样本的概率和观察值不超过该x值。 例如,x轴上的值225对应于y轴上的约0.85,因此样本中的观察值不超过225的可能性为85%。相反,设置累积为-1,如同已完成 在此示例的最后一个系列中,创建“超出”曲线。 + +选择不同的箱数和大小会显着影响直方图的形状。Astropy文档有关于如何选择这些参数的重要部分:http://docs.astropy.org/en/stable/visualization/histogram.html + +![使用直方图绘制累积分布](https://matplotlib.org/_images/sphx_glr_histogram_cumulative_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +np.random.seed(19680801) + +mu = 200 +sigma = 25 +n_bins = 50 +x = np.random.normal(mu, sigma, size=100) + +fig, ax = plt.subplots(figsize=(8, 4)) + +# plot the cumulative histogram +n, bins, patches = ax.hist(x, n_bins, density=True, histtype='step', + cumulative=True, label='Empirical') + +# Add a line showing the expected distribution. +y = ((1 / (np.sqrt(2 * np.pi) * sigma)) * + np.exp(-0.5 * (1 / sigma * (bins - mu))**2)) +y = y.cumsum() +y /= y[-1] + +ax.plot(bins, y, 'k--', linewidth=1.5, label='Theoretical') + +# Overlay a reversed cumulative histogram. +ax.hist(x, bins=bins, density=True, histtype='step', cumulative=-1, + label='Reversed emp.') + +# tidy up the figure +ax.grid(True) +ax.legend(loc='right') +ax.set_title('Cumulative step histograms') +ax.set_xlabel('Annual rainfall (mm)') +ax.set_ylabel('Likelihood of occurrence') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: histogram_cumulative.py](https://matplotlib.org/_downloads/histogram_cumulative.py) +- [下载Jupyter notebook: histogram_cumulative.ipynb](https://matplotlib.org/_downloads/histogram_cumulative.ipynb) diff --git a/Python/matplotlab/gallery/statistics/histogram_features.md b/Python/matplotlab/gallery/statistics/histogram_features.md new file mode 100644 index 00000000..c7b0a1da --- /dev/null +++ b/Python/matplotlab/gallery/statistics/histogram_features.md @@ -0,0 +1,60 @@ +# 直方图(hist)函数的几个特性演示 + +除基本直方图外,此演示还显示了一些可选功能: + +- 设置数据箱的数量。 +- ``标准化``标志,用于标准化箱高度,使直方图的积分为1.得到的直方图是概率密度函数的近似值。 +- 设置条形的面部颜色。 +- 设置不透明度(alpha值)。 + +选择不同的存储量和大小会显著影响直方图的形状。Astropy文档有很多关于如何选择这些参数的部分。 + +```python +import matplotlib +import numpy as np +import matplotlib.pyplot as plt + +np.random.seed(19680801) + +# example data +mu = 100 # mean of distribution +sigma = 15 # standard deviation of distribution +x = mu + sigma * np.random.randn(437) + +num_bins = 50 + +fig, ax = plt.subplots() + +# the histogram of the data +n, bins, patches = ax.hist(x, num_bins, density=1) + +# add a 'best fit' line +y = ((1 / (np.sqrt(2 * np.pi) * sigma)) * + np.exp(-0.5 * (1 / sigma * (bins - mu))**2)) +ax.plot(bins, y, '--') +ax.set_xlabel('Smarts') +ax.set_ylabel('Probability density') +ax.set_title(r'Histogram of IQ: $\mu=100$, $\sigma=15$') + +# Tweak spacing to prevent clipping of ylabel +fig.tight_layout() +plt.show() +``` + +![直方图特性演示](https://matplotlib.org/_images/sphx_glr_histogram_features_001.png) + +## 参考 + +此示例显示了以下函数和方法的使用: + +```python +matplotlib.axes.Axes.hist +matplotlib.axes.Axes.set_title +matplotlib.axes.Axes.set_xlabel +matplotlib.axes.Axes.set_ylabel +``` + +## 下载这个示例 + +- [下载python源码: histogram_features.py](https://matplotlib.org/_downloads/histogram_features.py) +- [下载Jupyter notebook: histogram_features.ipynb](https://matplotlib.org/_downloads/histogram_features.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/histogram_histtypes.md b/Python/matplotlab/gallery/statistics/histogram_histtypes.md new file mode 100644 index 00000000..028118e4 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/histogram_histtypes.md @@ -0,0 +1,37 @@ +# 演示直方图函数的不同histtype设置 + +- 具有颜色填充的步进曲线的直方图。 +- 具有自定义和不相等的箱宽度的直方图。 + +选择不同的存储量和大小会显著影响直方图的形状。Astropy文档有很多关于如何选择这些参数的部分: http://docs.astropy.org/en/stable/visualization/histogram.html + +![不同histtype设置示例](https://matplotlib.org/_images/sphx_glr_histogram_histtypes_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +np.random.seed(19680801) + +mu = 200 +sigma = 25 +x = np.random.normal(mu, sigma, size=100) + +fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(8, 4)) + +ax0.hist(x, 20, density=True, histtype='stepfilled', facecolor='g', alpha=0.75) +ax0.set_title('stepfilled') + +# Create a histogram by providing the bin edges (unequally spaced). +bins = [100, 150, 180, 195, 205, 220, 250, 300] +ax1.hist(x, bins, density=True, histtype='bar', rwidth=0.8) +ax1.set_title('unequal bins') + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: histogram_histtypes.py](https://matplotlib.org/_downloads/histogram_histtypes.py) +- [下载Jupyter notebook: histogram_histtypes.ipynb](https://matplotlib.org/_downloads/histogram_histtypes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/histogram_multihist.md b/Python/matplotlab/gallery/statistics/histogram_multihist.md new file mode 100644 index 00000000..abd06afe --- /dev/null +++ b/Python/matplotlab/gallery/statistics/histogram_multihist.md @@ -0,0 +1,50 @@ +# 使用多个数据集演示直方图(hist)函数 + +绘制具有多个样本集的直方图并演示: + +- 使用带有多个样本集的图例 +- 堆积图 +- 没有填充的步进曲线 +- 不同样本量的数据集 + +选择不同的存储量和大小会显著影响直方图的形状。Astropy文档有很多关于如何选择这些参数的部分: http://docs.astropy.org/en/stable/visualization/histogram.html + + +![多个数据集演示直方图](https://matplotlib.org/_images/sphx_glr_histogram_multihist_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +np.random.seed(19680801) + +n_bins = 10 +x = np.random.randn(1000, 3) + +fig, axes = plt.subplots(nrows=2, ncols=2) +ax0, ax1, ax2, ax3 = axes.flatten() + +colors = ['red', 'tan', 'lime'] +ax0.hist(x, n_bins, density=True, histtype='bar', color=colors, label=colors) +ax0.legend(prop={'size': 10}) +ax0.set_title('bars with legend') + +ax1.hist(x, n_bins, density=True, histtype='bar', stacked=True) +ax1.set_title('stacked bar') + +ax2.hist(x, n_bins, histtype='step', stacked=True, fill=False) +ax2.set_title('stack step (unfilled)') + +# Make a multiple-histogram of data-sets with different length. +x_multi = [np.random.randn(n) for n in [10000, 5000, 2000]] +ax3.hist(x_multi, n_bins, histtype='bar') +ax3.set_title('different sample sizes') + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: histogram_multihist.py](https://matplotlib.org/_downloads/histogram_multihist.py) +- [下载Jupyter notebook: histogram_multihist.ipynb](https://matplotlib.org/_downloads/histogram_multihist.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/multiple_histograms_side_by_side.md b/Python/matplotlab/gallery/statistics/multiple_histograms_side_by_side.md new file mode 100644 index 00000000..a5a0e336 --- /dev/null +++ b/Python/matplotlab/gallery/statistics/multiple_histograms_side_by_side.md @@ -0,0 +1,57 @@ +# 并排生成多个直方图 + +此示例沿范畴x轴绘制不同样本的水平直方图。此外,直方图被绘制成与它们的x位置对称,从而使它们与小提琴图非常相似。 + +为了使这个高度专门化的绘制,我们不能使用标准的 ``hist`` 方法。相反,我们使用 ``barh`` 直接绘制水平线。通过 ``np.histogram`` 函数计算棒材的垂直位置和长度。使用相同的范围(最小和最大值)和存储箱数量计算所有采样的直方图,以便每个采样的存储箱位于相同的垂直位置。 + +选择不同的存储量和大小会显著影响直方图的形状。Astropy文档有很多关于如何选择这些参数的部分: http://docs.astropy.org/en/stable/visualization/histogram.html + +![并排生成多个直方图示例](https://matplotlib.org/_images/sphx_glr_multiple_histograms_side_by_side_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +np.random.seed(19680801) +number_of_bins = 20 + +# An example of three data sets to compare +number_of_data_points = 387 +labels = ["A", "B", "C"] +data_sets = [np.random.normal(0, 1, number_of_data_points), + np.random.normal(6, 1, number_of_data_points), + np.random.normal(-3, 1, number_of_data_points)] + +# Computed quantities to aid plotting +hist_range = (np.min(data_sets), np.max(data_sets)) +binned_data_sets = [ + np.histogram(d, range=hist_range, bins=number_of_bins)[0] + for d in data_sets +] +binned_maximums = np.max(binned_data_sets, axis=1) +x_locations = np.arange(0, sum(binned_maximums), np.max(binned_maximums)) + +# The bin_edges are the same for all of the histograms +bin_edges = np.linspace(hist_range[0], hist_range[1], number_of_bins + 1) +centers = 0.5 * (bin_edges + np.roll(bin_edges, 1))[:-1] +heights = np.diff(bin_edges) + +# Cycle through and plot each histogram +fig, ax = plt.subplots() +for x_loc, binned_data in zip(x_locations, binned_data_sets): + lefts = x_loc - 0.5 * binned_data + ax.barh(centers, binned_data, height=heights, left=lefts) + +ax.set_xticks(x_locations) +ax.set_xticklabels(labels) + +ax.set_ylabel("Data values") +ax.set_xlabel("Data sets") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: multiple_histograms_side_by_side.py](https://matplotlib.org/_downloads/multiple_histograms_side_by_side.py) +- [下载Jupyter notebook: multiple_histograms_side_by_side.ipynb](https://matplotlib.org/_downloads/multiple_histograms_side_by_side.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/statistics/violinplot.md b/Python/matplotlab/gallery/statistics/violinplot.md new file mode 100644 index 00000000..757cd3da --- /dev/null +++ b/Python/matplotlab/gallery/statistics/violinplot.md @@ -0,0 +1,63 @@ +# 小提琴图基础 + +小提琴图类似于直方图和箱形图,因为它们显示了样本概率分布的抽象表示。小提琴图使用核密度估计(KDE)来计算样本的经验分布,而不是显示属于分类或顺序统计的数据点的计数。该计算由几个参数控制。此示例演示如何修改评估KDE的点数 ``(points)`` 以及如何修改KDE ``(bw_method)`` 的带宽。 + +有关小提琴图和KDE的更多信息,请参阅scikit-learn文档 +有一个很棒的部分:http://scikit-learn.org/stable/modules/density.html + +![小提琴图基础示例](https://matplotlib.org/_images/sphx_glr_violinplot_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# fake data +fs = 10 # fontsize +pos = [1, 2, 4, 5, 7, 8] +data = [np.random.normal(0, std, size=100) for std in pos] + +fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6)) + +axes[0, 0].violinplot(data, pos, points=20, widths=0.3, + showmeans=True, showextrema=True, showmedians=True) +axes[0, 0].set_title('Custom violinplot 1', fontsize=fs) + +axes[0, 1].violinplot(data, pos, points=40, widths=0.5, + showmeans=True, showextrema=True, showmedians=True, + bw_method='silverman') +axes[0, 1].set_title('Custom violinplot 2', fontsize=fs) + +axes[0, 2].violinplot(data, pos, points=60, widths=0.7, showmeans=True, + showextrema=True, showmedians=True, bw_method=0.5) +axes[0, 2].set_title('Custom violinplot 3', fontsize=fs) + +axes[1, 0].violinplot(data, pos, points=80, vert=False, widths=0.7, + showmeans=True, showextrema=True, showmedians=True) +axes[1, 0].set_title('Custom violinplot 4', fontsize=fs) + +axes[1, 1].violinplot(data, pos, points=100, vert=False, widths=0.9, + showmeans=True, showextrema=True, showmedians=True, + bw_method='silverman') +axes[1, 1].set_title('Custom violinplot 5', fontsize=fs) + +axes[1, 2].violinplot(data, pos, points=200, vert=False, widths=1.1, + showmeans=True, showextrema=True, showmedians=True, + bw_method=0.5) +axes[1, 2].set_title('Custom violinplot 6', fontsize=fs) + +for ax in axes.flatten(): + ax.set_yticklabels([]) + +fig.suptitle("Violin Plotting Examples") +fig.subplots_adjust(hspace=0.4) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: violinplot.py](https://matplotlib.org/_downloads/violinplot.py) +- [下载Jupyter notebook: violinplot.ipynb](https://matplotlib.org/_downloads/violinplot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/style_sheets/bmh.md b/Python/matplotlab/gallery/style_sheets/bmh.md new file mode 100644 index 00000000..c3aab921 --- /dev/null +++ b/Python/matplotlab/gallery/style_sheets/bmh.md @@ -0,0 +1,35 @@ +# 黑客贝叶斯方法样式表 + +这个例子演示了贝叶斯黑客方法 [[1]](https://matplotlib.org/gallery/style_sheets/bmh.html#id2) 在线书籍中使用的风格。 + +[[1]](https://matplotlib.org/gallery/style_sheets/bmh.html#id1) http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/ + +![黑客贝叶斯方法样式表示例](https://matplotlib.org/_images/sphx_glr_bmh_001.png) + +```python +from numpy.random import beta +import matplotlib.pyplot as plt + + +plt.style.use('bmh') + + +def plot_beta_hist(ax, a, b): + ax.hist(beta(a, b, size=10000), histtype="stepfilled", + bins=25, alpha=0.8, density=True) + + +fig, ax = plt.subplots() +plot_beta_hist(ax, 10, 10) +plot_beta_hist(ax, 4, 12) +plot_beta_hist(ax, 50, 12) +plot_beta_hist(ax, 6, 55) +ax.set_title("'bmh' style sheet") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bmh.py](https://matplotlib.org/_downloads/bmh.py) +- [下载Jupyter notebook: bmh.ipynb](https://matplotlib.org/_downloads/bmh.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/style_sheets/dark_background.md b/Python/matplotlab/gallery/style_sheets/dark_background.md new file mode 100644 index 00000000..1b00c0b1 --- /dev/null +++ b/Python/matplotlab/gallery/style_sheets/dark_background.md @@ -0,0 +1,32 @@ +# 黑色的背景样式表 + +此示例演示了 “dark_background” 样式,该样式使用白色表示通常为黑色的元素(文本,边框等)。请注意,并非所有绘图元素都默认为由rc参数定义的颜色。 + +![黑色的背景样式表示例](https://matplotlib.org/_images/sphx_glr_dark_background_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +plt.style.use('dark_background') + +fig, ax = plt.subplots() + +L = 6 +x = np.linspace(0, L) +ncolors = len(plt.rcParams['axes.prop_cycle']) +shift = np.linspace(0, L, ncolors, endpoint=False) +for s in shift: + ax.plot(x, np.sin(x + s), 'o-') +ax.set_xlabel('x-axis') +ax.set_ylabel('y-axis') +ax.set_title("'dark_background' style sheet") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: dark_background.py](https://matplotlib.org/_downloads/dark_background.py) +- [下载Jupyter notebook: dark_background.ipynb](https://matplotlib.org/_downloads/dark_background.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/style_sheets/fivethirtyeight.md b/Python/matplotlab/gallery/style_sheets/fivethirtyeight.md new file mode 100644 index 00000000..67a44788 --- /dev/null +++ b/Python/matplotlab/gallery/style_sheets/fivethirtyeight.md @@ -0,0 +1,35 @@ +# FiveThirtyEight样式表 + +这显示了“fivethirtyeight”样式的一个示例,它试图从FiveThirtyEight.com复制样式。 + +![FiveThirtyEight样式表示例](https://matplotlib.org/_images/sphx_glr_fivethirtyeight_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +plt.style.use('fivethirtyeight') + +x = np.linspace(0, 10) + +# Fixing random state for reproducibility +np.random.seed(19680801) + +fig, ax = plt.subplots() + +ax.plot(x, np.sin(x) + x + np.random.randn(50)) +ax.plot(x, np.sin(x) + 0.5 * x + np.random.randn(50)) +ax.plot(x, np.sin(x) + 2 * x + np.random.randn(50)) +ax.plot(x, np.sin(x) - 0.5 * x + np.random.randn(50)) +ax.plot(x, np.sin(x) - 2 * x + np.random.randn(50)) +ax.plot(x, np.sin(x) + np.random.randn(50)) +ax.set_title("'fivethirtyeight' style sheet") + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fivethirtyeight.py](https://matplotlib.org/_downloads/fivethirtyeight.py) +- [下载Jupyter notebook: fivethirtyeight.ipynb](https://matplotlib.org/_downloads/fivethirtyeight.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/style_sheets/ggplot.md b/Python/matplotlab/gallery/style_sheets/ggplot.md new file mode 100644 index 00000000..b3f68216 --- /dev/null +++ b/Python/matplotlab/gallery/style_sheets/ggplot.md @@ -0,0 +1,59 @@ +# ggplot样式表 + +此示例演示了“ggplot”样式,该样式调整样式以模拟[ggplot](http://ggplot2.org/)([R](https://www.r-project.org/)的流行绘图包)。 + +这些设置被无耻地从[[1]](https://matplotlib.org/gallery/style_sheets/ggplot.html#id2)窃取(经允许)。 + +[[1]](https://matplotlib.org/gallery/style_sheets/ggplot.html#id1) https://web.archive.org/web/20111215111010/http://www.huyng.com/archives/sane-color-scheme-for-matplotlib/691/ + +![ggplot样式表示例](https://matplotlib.org/_images/sphx_glr_ggplot_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +plt.style.use('ggplot') + +# Fixing random state for reproducibility +np.random.seed(19680801) + +fig, axes = plt.subplots(ncols=2, nrows=2) +ax1, ax2, ax3, ax4 = axes.ravel() + +# scatter plot (Note: `plt.scatter` doesn't use default colors) +x, y = np.random.normal(size=(2, 200)) +ax1.plot(x, y, 'o') + +# sinusoidal lines with colors from default color cycle +L = 2*np.pi +x = np.linspace(0, L) +ncolors = len(plt.rcParams['axes.prop_cycle']) +shift = np.linspace(0, L, ncolors, endpoint=False) +for s in shift: + ax2.plot(x, np.sin(x + s), '-') +ax2.margins(0) + +# bar graphs +x = np.arange(5) +y1, y2 = np.random.randint(1, 25, size=(2, 5)) +width = 0.25 +ax3.bar(x, y1, width) +ax3.bar(x + width, y2, width, + color=list(plt.rcParams['axes.prop_cycle'])[2]['color']) +ax3.set_xticks(x + width) +ax3.set_xticklabels(['a', 'b', 'c', 'd', 'e']) + +# circles with colors from default color cycle +for i, color in enumerate(plt.rcParams['axes.prop_cycle']): + xy = np.random.normal(size=2) + ax4.add_patch(plt.Circle(xy, radius=0.3, color=color['color'])) +ax4.axis('equal') +ax4.margins(0) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: ggplot.py](https://matplotlib.org/_downloads/ggplot.py) +- [下载Jupyter notebook: ggplot.ipynb](https://matplotlib.org/_downloads/ggplot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/style_sheets/grayscale.md b/Python/matplotlab/gallery/style_sheets/grayscale.md new file mode 100644 index 00000000..72e33515 --- /dev/null +++ b/Python/matplotlab/gallery/style_sheets/grayscale.md @@ -0,0 +1,44 @@ +# 灰度样式表 + +此示例演示“灰度”样式表,该样式表将定义为rc参数的所有颜色更改为灰度。 但请注意,并非所有绘图元素都默认为rc参数定义的颜色。 + +![灰度样式表示例](https://matplotlib.org/_images/sphx_glr_grayscale_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +def color_cycle_example(ax): + L = 6 + x = np.linspace(0, L) + ncolors = len(plt.rcParams['axes.prop_cycle']) + shift = np.linspace(0, L, ncolors, endpoint=False) + for s in shift: + ax.plot(x, np.sin(x + s), 'o-') + + +def image_and_patch_example(ax): + ax.imshow(np.random.random(size=(20, 20)), interpolation='none') + c = plt.Circle((5, 5), radius=5, label='patch') + ax.add_patch(c) + + +plt.style.use('grayscale') + +fig, (ax1, ax2) = plt.subplots(ncols=2) +fig.suptitle("'grayscale' style sheet") + +color_cycle_example(ax1) +image_and_patch_example(ax2) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: grayscale.py](https://matplotlib.org/_downloads/grayscale.py) +- [下载Jupyter notebook: grayscale.ipynb](https://matplotlib.org/_downloads/grayscale.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/style_sheets/plot_solarizedlight2.md b/Python/matplotlab/gallery/style_sheets/plot_solarizedlight2.md new file mode 100644 index 00000000..f3d87d99 --- /dev/null +++ b/Python/matplotlab/gallery/style_sheets/plot_solarizedlight2.md @@ -0,0 +1,44 @@ +# Solarized Light样式表 + +这显示了一个“Solarized_Light”样式的示例,它试图复制以下样式: + +- http://ethanschoonover.com/solarized +- https://github.com/jrnold/ggthemes +- http://pygal.org/en/stable/documentation/builtin_styles.html#light-solarized + +并且: + +使用调色板的所有8个重音 - 从蓝色开始 + +进行: + +- 为条形图和堆积图创建Alpha值。 .33或.5 +- 应用布局规则 + +![Solarized Light样式表示例](https://matplotlib.org/_images/sphx_glr_plot_solarizedlight2_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np +x = np.linspace(0, 10) +with plt.style.context('Solarize_Light2'): + plt.plot(x, np.sin(x) + x + np.random.randn(50)) + plt.plot(x, np.sin(x) + 2 * x + np.random.randn(50)) + plt.plot(x, np.sin(x) + 3 * x + np.random.randn(50)) + plt.plot(x, np.sin(x) + 4 + np.random.randn(50)) + plt.plot(x, np.sin(x) + 5 * x + np.random.randn(50)) + plt.plot(x, np.sin(x) + 6 * x + np.random.randn(50)) + plt.plot(x, np.sin(x) + 7 * x + np.random.randn(50)) + plt.plot(x, np.sin(x) + 8 * x + np.random.randn(50)) + # Number of accent colors in the color scheme + plt.title('8 Random Lines - Line') + plt.xlabel('x label', fontsize=14) + plt.ylabel('y label', fontsize=14) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: plot_solarizedlight2.py](https://matplotlib.org/_downloads/plot_solarizedlight2.py) +- [下载Jupyter notebook: plot_solarizedlight2.ipynb](https://matplotlib.org/_downloads/plot_solarizedlight2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/style_sheets/style_sheets_reference.md b/Python/matplotlab/gallery/style_sheets/style_sheets_reference.md new file mode 100644 index 00000000..b3146e41 --- /dev/null +++ b/Python/matplotlab/gallery/style_sheets/style_sheets_reference.md @@ -0,0 +1,203 @@ +# 样式表参考 + +此脚本演示了一组常见示例图上的不同可用样式表:散点图,图像,条形图,面片,线图和直方图, + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_001.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_002.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_003.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_004.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_005.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_006.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_007.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_008.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_009.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_010.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_011.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_012.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_013.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_014.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_015.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_016.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_017.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_018.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_019.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_020.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_021.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_022.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_023.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_024.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_025.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_026.png) + +![样式表参考示例](https://matplotlib.org/_images/sphx_glr_style_sheets_reference_027.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +def plot_scatter(ax, prng, nb_samples=100): + """Scatter plot. + """ + for mu, sigma, marker in [(-.5, 0.75, 'o'), (0.75, 1., 's')]: + x, y = prng.normal(loc=mu, scale=sigma, size=(2, nb_samples)) + ax.plot(x, y, ls='none', marker=marker) + ax.set_xlabel('X-label') + return ax + + +def plot_colored_sinusoidal_lines(ax): + """Plot sinusoidal lines with colors following the style color cycle. + """ + L = 2 * np.pi + x = np.linspace(0, L) + nb_colors = len(plt.rcParams['axes.prop_cycle']) + shift = np.linspace(0, L, nb_colors, endpoint=False) + for s in shift: + ax.plot(x, np.sin(x + s), '-') + ax.set_xlim([x[0], x[-1]]) + return ax + + +def plot_bar_graphs(ax, prng, min_value=5, max_value=25, nb_samples=5): + """Plot two bar graphs side by side, with letters as x-tick labels. + """ + x = np.arange(nb_samples) + ya, yb = prng.randint(min_value, max_value, size=(2, nb_samples)) + width = 0.25 + ax.bar(x, ya, width) + ax.bar(x + width, yb, width, color='C2') + ax.set_xticks(x + width) + ax.set_xticklabels(['a', 'b', 'c', 'd', 'e']) + return ax + + +def plot_colored_circles(ax, prng, nb_samples=15): + """Plot circle patches. + + NB: draws a fixed amount of samples, rather than using the length of + the color cycle, because different styles may have different numbers + of colors. + """ + for sty_dict, j in zip(plt.rcParams['axes.prop_cycle'], range(nb_samples)): + ax.add_patch(plt.Circle(prng.normal(scale=3, size=2), + radius=1.0, color=sty_dict['color'])) + # Force the limits to be the same across the styles (because different + # styles may have different numbers of available colors). + ax.set_xlim([-4, 8]) + ax.set_ylim([-5, 6]) + ax.set_aspect('equal', adjustable='box') # to plot circles as circles + return ax + + +def plot_image_and_patch(ax, prng, size=(20, 20)): + """Plot an image with random values and superimpose a circular patch. + """ + values = prng.random_sample(size=size) + ax.imshow(values, interpolation='none') + c = plt.Circle((5, 5), radius=5, label='patch') + ax.add_patch(c) + # Remove ticks + ax.set_xticks([]) + ax.set_yticks([]) + + +def plot_histograms(ax, prng, nb_samples=10000): + """Plot 4 histograms and a text annotation. + """ + params = ((10, 10), (4, 12), (50, 12), (6, 55)) + for a, b in params: + values = prng.beta(a, b, size=nb_samples) + ax.hist(values, histtype="stepfilled", bins=30, + alpha=0.8, density=True) + # Add a small annotation. + ax.annotate('Annotation', xy=(0.25, 4.25), xycoords='data', + xytext=(0.9, 0.9), textcoords='axes fraction', + va="top", ha="right", + bbox=dict(boxstyle="round", alpha=0.2), + arrowprops=dict( + arrowstyle="->", + connectionstyle="angle,angleA=-95,angleB=35,rad=10"), + ) + return ax + + +def plot_figure(style_label=""): + """Setup and plot the demonstration figure with a given style. + """ + # Use a dedicated RandomState instance to draw the same "random" values + # across the different figures. + prng = np.random.RandomState(96917002) + + # Tweak the figure size to be better suited for a row of numerous plots: + # double the width and halve the height. NB: use relative changes because + # some styles may have a figure size different from the default one. + (fig_width, fig_height) = plt.rcParams['figure.figsize'] + fig_size = [fig_width * 2, fig_height / 2] + + fig, axes = plt.subplots(ncols=6, nrows=1, num=style_label, + figsize=fig_size, squeeze=True) + axes[0].set_ylabel(style_label) + + plot_scatter(axes[0], prng) + plot_image_and_patch(axes[1], prng) + plot_bar_graphs(axes[2], prng) + plot_colored_circles(axes[3], prng) + plot_colored_sinusoidal_lines(axes[4]) + plot_histograms(axes[5], prng) + + fig.tight_layout() + + return fig + + +if __name__ == "__main__": + + # Setup a list of all available styles, in alphabetical order but + # the `default` and `classic` ones, which will be forced resp. in + # first and second position. + style_list = ['default', 'classic'] + sorted( + style for style in plt.style.available if style != 'classic') + + # Plot a demonstration figure for every available style sheet. + for style_label in style_list: + with plt.style.context(style_label): + fig = plot_figure(style_label=style_label) + + plt.show() +``` + +脚本的总运行时间:(0分5.216秒) + +## 下载这个示例 + +- [下载python源码: style_sheets_reference.py](https://matplotlib.org/_downloads/style_sheets_reference.py) +- [下载Jupyter notebook: style_sheets_reference.ipynb](https://matplotlib.org/_downloads/style_sheets_reference.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/align_labels_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/align_labels_demo.md new file mode 100644 index 00000000..d3d830be --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/align_labels_demo.md @@ -0,0 +1,40 @@ +# 对齐标签 + +使用 ``Figure.align_xlabels`` 和 ``Figure.align_ylabels`` 对齐xlabel和ylabel + +``Figure.align_labels`` 包装了这两个函数。 + +注意,xlabel “XLabel11” 通常更接近x轴,“YLabel1 0” 将更接近其各自轴的y轴。 + +![对齐标签图示](https://matplotlib.org/_images/sphx_glr_align_labels_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np +import matplotlib.gridspec as gridspec + +fig = plt.figure(tight_layout=True) +gs = gridspec.GridSpec(2, 2) + +ax = fig.add_subplot(gs[0, :]) +ax.plot(np.arange(0, 1e6, 1000)) +ax.set_ylabel('YLabel0') +ax.set_xlabel('XLabel0') + +for i in range(2): + ax = fig.add_subplot(gs[1, i]) + ax.plot(np.arange(1., 0., -0.1) * 2000., np.arange(1., 0., -0.1)) + ax.set_ylabel('YLabel1 %d' % i) + ax.set_xlabel('XLabel1 %d' % i) + if i == 0: + for tick in ax.get_xticklabels(): + tick.set_rotation(55) +fig.align_labels() # same as fig.align_xlabels(); fig.align_ylabels() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: align_labels_demo.py](https://matplotlib.org/_downloads/align_labels_demo.py) +- [下载Jupyter notebook: align_labels_demo.ipynb](https://matplotlib.org/_downloads/align_labels_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/axes_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_demo.md new file mode 100644 index 00000000..5b0ead95 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_demo.md @@ -0,0 +1,49 @@ +# 轴线演示 + +例如,使用plt.axes在主绘图轴中创建嵌入轴。 + +![轴线演示图示](https://matplotlib.org/_images/sphx_glr_axes_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# create some data to use for the plot +dt = 0.001 +t = np.arange(0.0, 10.0, dt) +r = np.exp(-t[:1000] / 0.05) # impulse response +x = np.random.randn(len(t)) +s = np.convolve(x, r)[:len(x)] * dt # colored noise + +# the main axes is subplot(111) by default +plt.plot(t, s) +plt.axis([0, 1, 1.1 * np.min(s), 2 * np.max(s)]) +plt.xlabel('time (s)') +plt.ylabel('current (nA)') +plt.title('Gaussian colored noise') + +# this is an inset axes over the main axes +a = plt.axes([.65, .6, .2, .2], facecolor='k') +n, bins, patches = plt.hist(s, 400, density=True) +plt.title('Probability') +plt.xticks([]) +plt.yticks([]) + +# this is another inset axes over the main axes +a = plt.axes([0.2, 0.6, .2, .2], facecolor='k') +plt.plot(t[:len(r)], r) +plt.title('Impulse response') +plt.xlim(0, 0.2) +plt.xticks([]) +plt.yticks([]) + +plt.show() +``` +## 下载这个示例 + +- [下载python源码: axes_demo.py](https://matplotlib.org/_downloads/axes_demo.py) +- [下载Jupyter notebook: axes_demo.ipynb](https://matplotlib.org/_downloads/axes_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/axes_margins.md b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_margins.md new file mode 100644 index 00000000..57ccc08a --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_margins.md @@ -0,0 +1,85 @@ +# Axes.margins缩放粘性 + +此示例中的第一个图显示了如何使用[边距]((https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.margins.html#matplotlib.axes.Axes.margins))而不是[set_xlim](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_xlim.html#matplotlib.axes.Axes.set_xlim)和[set_ylim](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.set_ylim.html#matplotlib.axes.Axes.set_ylim)放大和缩小绘图。第二个图展示了某些方法和艺术家引入的边缘“粘性”的概念,以及如何有效地解决这个问题。 + +```python +import numpy as np +import matplotlib.pyplot as plt + + +def f(t): + return np.exp(-t) * np.cos(2*np.pi*t) + + +t1 = np.arange(0.0, 3.0, 0.01) + +ax1 = plt.subplot(212) +ax1.margins(0.05) # Default margin is 0.05, value 0 means fit +ax1.plot(t1, f(t1), 'k') + +ax2 = plt.subplot(221) +ax2.margins(2, 2) # Values >0.0 zoom out +ax2.plot(t1, f(t1), 'r') +ax2.set_title('Zoomed out') + +ax3 = plt.subplot(222) +ax3.margins(x=0, y=-0.25) # Values in (-0.5, 0.0) zooms in to center +ax3.plot(t1, f(t1), 'g') +ax3.set_title('Zoomed in') + +plt.show() +``` + +![Axes.margins示例](https://matplotlib.org/_images/sphx_glr_axes_margins_001.png) + +## 论某些绘图方法的“粘性” + +一些绘图功能使轴限制为 “粘性(stickiness)” 或不受[边距](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.margins.html#matplotlib.axes.Axes.margins)方法的影响。 例如,[imshow](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow) 和pcolor期望用户希望限制在图中所示的像素周围紧密。 如果不需要此行为,则需要将 [use_sticky_edges](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.use_sticky_edges.html#matplotlib.axes.Axes.use_sticky_edges) 设置为 [False](https://docs.python.org/3/library/constants.html#False)。请考虑以下示例: + +一些绘图功能使轴限制“粘性(stickiness)”或免疫的边缘方法。例如,imShow和pcold期望用户希望限制紧在绘图中所示的像素周围。如果不需要此行为,则需要将use_clatice_edage设置为false。考虑以下示例: + +```python +y, x = np.mgrid[:5, 1:6] +poly_coords = [ + (0.25, 2.75), (3.25, 2.75), + (2.25, 0.75), (0.25, 0.75) +] +fig, (ax1, ax2) = plt.subplots(ncols=2) + +# Here we set the stickiness of the axes object... +# ax1 we'll leave as the default, which uses sticky edges +# and we'll turn off stickiness for ax2 +ax2.use_sticky_edges = False + +for ax, status in zip((ax1, ax2), ('Is', 'Is Not')): + cells = ax.pcolor(x, y, x+y, cmap='inferno') # sticky + ax.add_patch( + plt.Polygon(poly_coords, color='forestgreen', alpha=0.5) + ) # not sticky + ax.margins(x=0.1, y=0.05) + ax.set_aspect('equal') + ax.set_title('{} Sticky'.format(status)) + +plt.show() +``` + +![Axes.margins示例2](https://matplotlib.org/_images/sphx_glr_axes_margins_002.png) + +## 参考 + +本例中显示了以下函数、方法的使用: + +```python +import matplotlib +matplotlib.axes.Axes.margins +matplotlib.pyplot.margins +matplotlib.axes.Axes.use_sticky_edges +matplotlib.axes.Axes.pcolor +matplotlib.pyplot.pcolor +matplotlib.pyplot.Polygon +``` + +## 下载这个示例 + +- [下载python源码: axes_margins.py](https://matplotlib.org/_downloads/axes_margins.py) +- [下载Jupyter notebook: axes_margins.ipynb](https://matplotlib.org/_downloads/axes_margins.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/axes_props.md b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_props.md new file mode 100644 index 00000000..2a215ad2 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_props.md @@ -0,0 +1,26 @@ +# 轴线属性 + +您可以调整轴的刻度和网格属性。 + +![轴线属性示例](https://matplotlib.org/_images/sphx_glr_axes_props_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +t = np.arange(0.0, 2.0, 0.01) +s = np.sin(2 * np.pi * t) + +fig, ax = plt.subplots() +ax.plot(t, s) + +ax.grid(True, linestyle='-.') +ax.tick_params(labelcolor='r', labelsize='medium', width=3) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axes_props.py](https://matplotlib.org/_downloads/axes_props.py) +- [下载Jupyter notebook: axes_props.ipynb](https://matplotlib.org/_downloads/axes_props.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/axes_zoom_effect.md b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_zoom_effect.md new file mode 100644 index 00000000..10c42c3f --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/axes_zoom_effect.md @@ -0,0 +1,127 @@ +# 轴缩放效果 + +![轴缩放效果示例](https://matplotlib.org/_images/sphx_glr_axes_zoom_effect_001.png) + +```python +from matplotlib.transforms import ( + Bbox, TransformedBbox, blended_transform_factory) + +from mpl_toolkits.axes_grid1.inset_locator import ( + BboxPatch, BboxConnector, BboxConnectorPatch) + + +def connect_bbox(bbox1, bbox2, + loc1a, loc2a, loc1b, loc2b, + prop_lines, prop_patches=None): + if prop_patches is None: + prop_patches = { + **prop_lines, + "alpha": prop_lines.get("alpha", 1) * 0.2, + } + + c1 = BboxConnector(bbox1, bbox2, loc1=loc1a, loc2=loc2a, **prop_lines) + c1.set_clip_on(False) + c2 = BboxConnector(bbox1, bbox2, loc1=loc1b, loc2=loc2b, **prop_lines) + c2.set_clip_on(False) + + bbox_patch1 = BboxPatch(bbox1, **prop_patches) + bbox_patch2 = BboxPatch(bbox2, **prop_patches) + + p = BboxConnectorPatch(bbox1, bbox2, + # loc1a=3, loc2a=2, loc1b=4, loc2b=1, + loc1a=loc1a, loc2a=loc2a, loc1b=loc1b, loc2b=loc2b, + **prop_patches) + p.set_clip_on(False) + + return c1, c2, bbox_patch1, bbox_patch2, p + + +def zoom_effect01(ax1, ax2, xmin, xmax, **kwargs): + """ + ax1 : the main axes + ax1 : the zoomed axes + (xmin,xmax) : the limits of the colored area in both plot axes. + + connect ax1 & ax2. The x-range of (xmin, xmax) in both axes will + be marked. The keywords parameters will be used ti create + patches. + + """ + + trans1 = blended_transform_factory(ax1.transData, ax1.transAxes) + trans2 = blended_transform_factory(ax2.transData, ax2.transAxes) + + bbox = Bbox.from_extents(xmin, 0, xmax, 1) + + mybbox1 = TransformedBbox(bbox, trans1) + mybbox2 = TransformedBbox(bbox, trans2) + + prop_patches = {**kwargs, "ec": "none", "alpha": 0.2} + + c1, c2, bbox_patch1, bbox_patch2, p = connect_bbox( + mybbox1, mybbox2, + loc1a=3, loc2a=2, loc1b=4, loc2b=1, + prop_lines=kwargs, prop_patches=prop_patches) + + ax1.add_patch(bbox_patch1) + ax2.add_patch(bbox_patch2) + ax2.add_patch(c1) + ax2.add_patch(c2) + ax2.add_patch(p) + + return c1, c2, bbox_patch1, bbox_patch2, p + + +def zoom_effect02(ax1, ax2, **kwargs): + """ + ax1 : the main axes + ax1 : the zoomed axes + + Similar to zoom_effect01. The xmin & xmax will be taken from the + ax1.viewLim. + """ + + tt = ax1.transScale + (ax1.transLimits + ax2.transAxes) + trans = blended_transform_factory(ax2.transData, tt) + + mybbox1 = ax1.bbox + mybbox2 = TransformedBbox(ax1.viewLim, trans) + + prop_patches = {**kwargs, "ec": "none", "alpha": 0.2} + + c1, c2, bbox_patch1, bbox_patch2, p = connect_bbox( + mybbox1, mybbox2, + loc1a=3, loc2a=2, loc1b=4, loc2b=1, + prop_lines=kwargs, prop_patches=prop_patches) + + ax1.add_patch(bbox_patch1) + ax2.add_patch(bbox_patch2) + ax2.add_patch(c1) + ax2.add_patch(c2) + ax2.add_patch(p) + + return c1, c2, bbox_patch1, bbox_patch2, p + + +import matplotlib.pyplot as plt + +plt.figure(1, figsize=(5, 5)) +ax1 = plt.subplot(221) +ax2 = plt.subplot(212) +ax2.set_xlim(0, 1) +ax2.set_xlim(0, 5) +zoom_effect01(ax1, ax2, 0.2, 0.8) + + +ax1 = plt.subplot(222) +ax1.set_xlim(2, 3) +ax2.set_xlim(0, 5) +zoom_effect02(ax1, ax2) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axes_zoom_effect.py](https://matplotlib.org/_downloads/axes_zoom_effect.py) +- [下载Jupyter notebook: axes_zoom_effect.ipynb](https://matplotlib.org/_downloads/axes_zoom_effect.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/axhspan_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/axhspan_demo.md new file mode 100644 index 00000000..cef8cbc3 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/axhspan_demo.md @@ -0,0 +1,42 @@ +# axhspan 演示 + +创建在水平方向或垂直方向跨越轴的直线或矩形。 + +![axhspan 演示示例](https://matplotlib.org/_images/sphx_glr_axhspan_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +t = np.arange(-1, 2, .01) +s = np.sin(2 * np.pi * t) + +plt.plot(t, s) +# Draw a thick red hline at y=0 that spans the xrange +plt.axhline(linewidth=8, color='#d62728') + +# Draw a default hline at y=1 that spans the xrange +plt.axhline(y=1) + +# Draw a default vline at x=1 that spans the yrange +plt.axvline(x=1) + +# Draw a thick blue vline at x=0 that spans the upper quadrant of the yrange +plt.axvline(x=0, ymin=0.75, linewidth=8, color='#1f77b4') + +# Draw a default hline at y=.5 that spans the middle half of the axes +plt.axhline(y=.5, xmin=0.25, xmax=0.75) + +plt.axhspan(0.25, 0.75, facecolor='0.5', alpha=0.5) + +plt.axvspan(1.25, 1.55, facecolor='#2ca02c', alpha=0.5) + +plt.axis([-1, 2, -1, 2]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axhspan_demo.py](https://matplotlib.org/_downloads/axhspan_demo.py) +- [下载Jupyter notebook: axhspan_demo.ipynb](https://matplotlib.org/_downloads/axhspan_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/axis_equal_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/axis_equal_demo.md new file mode 100644 index 00000000..298dafbe --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/axis_equal_demo.md @@ -0,0 +1,40 @@ +# 等轴比演示 + +如何设置和调整具有等轴比的图像。 + +![等轴比示例图](https://matplotlib.org/_images/sphx_glr_axis_equal_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Plot circle of radius 3. + +an = np.linspace(0, 2 * np.pi, 100) +fig, axs = plt.subplots(2, 2) + +axs[0, 0].plot(3 * np.cos(an), 3 * np.sin(an)) +axs[0, 0].set_title('not equal, looks like ellipse', fontsize=10) + +axs[0, 1].plot(3 * np.cos(an), 3 * np.sin(an)) +axs[0, 1].axis('equal') +axs[0, 1].set_title('equal, looks like circle', fontsize=10) + +axs[1, 0].plot(3 * np.cos(an), 3 * np.sin(an)) +axs[1, 0].axis('equal') +axs[1, 0].axis([-3, 3, -3, 3]) +axs[1, 0].set_title('still a circle, even after changing limits', fontsize=10) + +axs[1, 1].plot(3 * np.cos(an), 3 * np.sin(an)) +axs[1, 1].set_aspect('equal', 'box') +axs[1, 1].set_title('still a circle, auto-adjusted data limits', fontsize=10) + +fig.tight_layout() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: axis_equal_demo.py](https://matplotlib.org/_downloads/axis_equal_demo.py) +- [下载Jupyter notebook: axis_equal_demo.ipynb](https://matplotlib.org/_downloads/axis_equal_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/broken_axis.md b/Python/matplotlab/gallery/subplots_axes_and_figures/broken_axis.md new file mode 100644 index 00000000..8ae805dd --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/broken_axis.md @@ -0,0 +1,71 @@ +# 断轴 + +折断轴的示例,其中y轴将切割出一部分。 + +![折断轴示例](https://matplotlib.org/_images/sphx_glr_broken_axis_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +# 30 points between [0, 0.2) originally made using np.random.rand(30)*.2 +pts = np.array([ + 0.015, 0.166, 0.133, 0.159, 0.041, 0.024, 0.195, 0.039, 0.161, 0.018, + 0.143, 0.056, 0.125, 0.096, 0.094, 0.051, 0.043, 0.021, 0.138, 0.075, + 0.109, 0.195, 0.050, 0.074, 0.079, 0.155, 0.020, 0.010, 0.061, 0.008]) + +# Now let's make two outlier points which are far away from everything. +pts[[3, 14]] += .8 + +# If we were to simply plot pts, we'd lose most of the interesting +# details due to the outliers. So let's 'break' or 'cut-out' the y-axis +# into two portions - use the top (ax) for the outliers, and the bottom +# (ax2) for the details of the majority of our data +f, (ax, ax2) = plt.subplots(2, 1, sharex=True) + +# plot the same data on both axes +ax.plot(pts) +ax2.plot(pts) + +# zoom-in / limit the view to different portions of the data +ax.set_ylim(.78, 1.) # outliers only +ax2.set_ylim(0, .22) # most of the data + +# hide the spines between ax and ax2 +ax.spines['bottom'].set_visible(False) +ax2.spines['top'].set_visible(False) +ax.xaxis.tick_top() +ax.tick_params(labeltop=False) # don't put tick labels at the top +ax2.xaxis.tick_bottom() + +# This looks pretty good, and was fairly painless, but you can get that +# cut-out diagonal lines look with just a bit more work. The important +# thing to know here is that in axes coordinates, which are always +# between 0-1, spine endpoints are at these locations (0,0), (0,1), +# (1,0), and (1,1). Thus, we just need to put the diagonals in the +# appropriate corners of each of our axes, and so long as we use the +# right transform and disable clipping. + +d = .015 # how big to make the diagonal lines in axes coordinates +# arguments to pass to plot, just so we don't keep repeating them +kwargs = dict(transform=ax.transAxes, color='k', clip_on=False) +ax.plot((-d, +d), (-d, +d), **kwargs) # top-left diagonal +ax.plot((1 - d, 1 + d), (-d, +d), **kwargs) # top-right diagonal + +kwargs.update(transform=ax2.transAxes) # switch to the bottom axes +ax2.plot((-d, +d), (1 - d, 1 + d), **kwargs) # bottom-left diagonal +ax2.plot((1 - d, 1 + d), (1 - d, 1 + d), **kwargs) # bottom-right diagonal + +# What's cool about this is that now if we vary the distance between +# ax and ax2 via f.subplots_adjust(hspace=...) or plt.subplot_tool(), +# the diagonal lines will move accordingly, and stay right at the tips +# of the spines they are 'breaking' + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: broken_axis.py](https://matplotlib.org/_downloads/broken_axis.py) +- [下载Jupyter notebook: broken_axis.ipynb](https://matplotlib.org/_downloads/broken_axis.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/colorbar_placement.md b/Python/matplotlab/gallery/subplots_axes_and_figures/colorbar_placement.md new file mode 100644 index 00000000..892355e5 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/colorbar_placement.md @@ -0,0 +1,61 @@ +# 放置色块 + +色标表示图像数据的定量范围。放置在一个图中并不重要,因为需要为它们腾出空间。 + +最简单的情况是将颜色条附加到每个轴: + +```python +import matplotlib.pyplot as plt +import numpy as np + +fig, axs = plt.subplots(2, 2) +cm = ['RdBu_r', 'viridis'] +for col in range(2): + for row in range(2): + ax = axs[row, col] + pcm = ax.pcolormesh(np.random.random((20, 20)) * (col + 1), + cmap=cm[col]) + fig.colorbar(pcm, ax=ax) +plt.show() +``` + +![放置色块示例](https://matplotlib.org/_images/sphx_glr_colorbar_placement_001.png) + +第一列在两行中都具有相同类型的数据,因此可能需要通过调用 ``Figure.colorbar`` 和轴列表而不是单个轴来组合我们所做的颜色栏。 + +```python +fig, axs = plt.subplots(2, 2) +cm = ['RdBu_r', 'viridis'] +for col in range(2): + for row in range(2): + ax = axs[row, col] + pcm = ax.pcolormesh(np.random.random((20, 20)) * (col + 1), + cmap=cm[col]) + fig.colorbar(pcm, ax=axs[:, col], shrink=0.6) +plt.show() +``` + +![放置色块示例2](https://matplotlib.org/_images/sphx_glr_colorbar_placement_002.png) + +使用此范例可以实现相对复杂的颜色条布局。请注意,此示例使用 ``constrained_layout = True`` 可以更好地工作 + +```python +fig, axs = plt.subplots(3, 3, constrained_layout=True) +for ax in axs.flat: + pcm = ax.pcolormesh(np.random.random((20, 20))) + +fig.colorbar(pcm, ax=axs[0, :2], shrink=0.6, location='bottom') +fig.colorbar(pcm, ax=[axs[0, 2]], location='bottom') +fig.colorbar(pcm, ax=axs[1:, :], location='right', shrink=0.6) +fig.colorbar(pcm, ax=[axs[2, 1]], location='left') + + +plt.show() +``` + +![放置色块示例3](https://matplotlib.org/_images/sphx_glr_colorbar_placement_003.png) + +## 下载这个示例 + +- [下载python源码: colorbar_placement.py](https://matplotlib.org/_downloads/colorbar_placement.py) +- [下载Jupyter notebook: colorbar_placement.ipynb](https://matplotlib.org/_downloads/colorbar_placement.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/custom_figure_class.md b/Python/matplotlab/gallery/subplots_axes_and_figures/custom_figure_class.md new file mode 100644 index 00000000..67c0ddac --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/custom_figure_class.md @@ -0,0 +1,31 @@ +# 自定义图类 + +您可以传递一个自定义的图构造函数,以确定是否要从默认图中派生。 这个简单的例子创建了一个带有图标题的图形。 + +![自定义图类示例](https://matplotlib.org/_images/sphx_glr_custom_figure_class_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.figure import Figure + + +class MyFigure(Figure): + def __init__(self, *args, figtitle='hi mom', **kwargs): + """ + custom kwarg figtitle is a figure title + """ + super().__init__(*args, **kwargs) + self.text(0.5, 0.95, figtitle, ha='center') + + +fig = plt.figure(FigureClass=MyFigure, figtitle='my title') +ax = fig.subplots() +ax.plot([1, 2, 3]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: custom_figure_class.py](https://matplotlib.org/_downloads/custom_figure_class.py) +- [下载Jupyter notebook: custom_figure_class.ipynb](https://matplotlib.org/_downloads/custom_figure_class.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/demo_constrained_layout.md b/Python/matplotlab/gallery/subplots_axes_and_figures/demo_constrained_layout.md new file mode 100644 index 00000000..7c03d887 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/demo_constrained_layout.md @@ -0,0 +1,80 @@ +# 使用约束布局调整轴的大小 + +约束布局尝试调整图中子图的大小,以使轴对象和轴上的标签之间不会重叠。 + +有关详细信息,请参阅 [“约束布局指南”](https://matplotlib.org/tutorials/intermediate/constrainedlayout_guide.html);有关替代方法,请参阅 [“严格布局”](https://matplotlib.org/tutorials/intermediate/tight_layout_guide.html) 指南。 + +```python +import matplotlib.pyplot as plt +import itertools +import warnings + + +def example_plot(ax): + ax.plot([1, 2]) + ax.set_xlabel('x-label', fontsize=12) + ax.set_ylabel('y-label', fontsize=12) + ax.set_title('Title', fontsize=14) +``` + +如果我们不使用constrained_layout,则标签会重叠轴 + +```python +fig, axs = plt.subplots(nrows=2, ncols=2, constrained_layout=False) + +for ax in axs.flatten(): + example_plot(ax) +``` + +![约束布局示例](https://matplotlib.org/_images/sphx_glr_demo_constrained_layout_001.png) + +添加 ``constrained_layout = True`` 会自动调整。 + +```python +fig, axs = plt.subplots(nrows=2, ncols=2, constrained_layout=True) + +for ax in axs.flatten(): + example_plot(ax) +``` + +![约束布局示例2](https://matplotlib.org/_images/sphx_glr_demo_constrained_layout_002.png) + +下面是使用嵌套gridspecs的更复杂的示例。 + +```python +fig = plt.figure(constrained_layout=True) + +import matplotlib.gridspec as gridspec + +gs0 = gridspec.GridSpec(1, 2, figure=fig) + +gs1 = gridspec.GridSpecFromSubplotSpec(3, 1, subplot_spec=gs0[0]) +for n in range(3): + ax = fig.add_subplot(gs1[n]) + example_plot(ax) + + +gs2 = gridspec.GridSpecFromSubplotSpec(2, 1, subplot_spec=gs0[1]) +for n in range(2): + ax = fig.add_subplot(gs2[n]) + example_plot(ax) + +plt.show() +``` + +![约束布局示例3](https://matplotlib.org/_images/sphx_glr_demo_constrained_layout_003.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.gridspec.GridSpec +matplotlib.gridspec.GridSpecFromSubplotSpec +``` + +## 下载这个示例 + +- [下载python源码: demo_constrained_layout.py](https://matplotlib.org/_downloads/demo_constrained_layout.py) +- [下载Jupyter notebook: demo_constrained_layout.ipynb](https://matplotlib.org/_downloads/demo_constrained_layout.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/demo_tight_layout.md b/Python/matplotlab/gallery/subplots_axes_and_figures/demo_tight_layout.md new file mode 100644 index 00000000..b8682dc0 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/demo_tight_layout.md @@ -0,0 +1,171 @@ +# 使用紧凑布局调整轴的大小 + +[tight_layout](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.tight_layout) 尝试调整图中子图的大小,以便轴对象和轴上的标签之间不会重叠。 + +有关详细信息,请参阅 [“约束布局指南”](https://matplotlib.org/tutorials/intermediate/constrainedlayout_guide.html);有关替代方法,请参阅 [“严格布局”](https://matplotlib.org/tutorials/intermediate/tight_layout_guide.html) 指南。 + +```python +import matplotlib.pyplot as plt +import itertools +import warnings + + +fontsizes = itertools.cycle([8, 16, 24, 32]) + + +def example_plot(ax): + ax.plot([1, 2]) + ax.set_xlabel('x-label', fontsize=next(fontsizes)) + ax.set_ylabel('y-label', fontsize=next(fontsizes)) + ax.set_title('Title', fontsize=next(fontsizes)) +``` + +```python +fig, ax = plt.subplots() +example_plot(ax) +plt.tight_layout() +``` + +![紧凑布局示例](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_001.png) + +```python +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2) +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) +plt.tight_layout() +``` + +![紧凑布局示例2](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_002.png) + +```python +fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1) +example_plot(ax1) +example_plot(ax2) +plt.tight_layout() +``` + +![紧凑布局示例3](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_003.png) + +```python +fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2) +example_plot(ax1) +example_plot(ax2) +plt.tight_layout() +``` + +![紧凑布局示例4](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_004.png) + +```python +fig, axes = plt.subplots(nrows=3, ncols=3) +for row in axes: + for ax in row: + example_plot(ax) +plt.tight_layout() +``` + +![紧凑布局示例5](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_005.png) + +```python +fig = plt.figure() + +ax1 = plt.subplot(221) +ax2 = plt.subplot(223) +ax3 = plt.subplot(122) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) + +plt.tight_layout() +``` + +![紧凑布局示例6](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_006.png) + +```python +fig = plt.figure() + +ax1 = plt.subplot2grid((3, 3), (0, 0)) +ax2 = plt.subplot2grid((3, 3), (0, 1), colspan=2) +ax3 = plt.subplot2grid((3, 3), (1, 0), colspan=2, rowspan=2) +ax4 = plt.subplot2grid((3, 3), (1, 2), rowspan=2) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) + +plt.tight_layout() + +plt.show() +``` + +![紧凑布局示例7](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_007.png) + +```python +fig = plt.figure() + +import matplotlib.gridspec as gridspec + +gs1 = gridspec.GridSpec(3, 1) +ax1 = fig.add_subplot(gs1[0]) +ax2 = fig.add_subplot(gs1[1]) +ax3 = fig.add_subplot(gs1[2]) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) + +with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + # This raises warnings since tight layout cannot + # handle gridspec automatically. We are going to + # do that manually so we can filter the warning. + gs1.tight_layout(fig, rect=[None, None, 0.45, None]) + +gs2 = gridspec.GridSpec(2, 1) +ax4 = fig.add_subplot(gs2[0]) +ax5 = fig.add_subplot(gs2[1]) + +example_plot(ax4) +example_plot(ax5) + +with warnings.catch_warnings(): + # This raises warnings since tight layout cannot + # handle gridspec automatically. We are going to + # do that manually so we can filter the warning. + warnings.simplefilter("ignore", UserWarning) + gs2.tight_layout(fig, rect=[0.45, None, None, None]) + +# now match the top and bottom of two gridspecs. +top = min(gs1.top, gs2.top) +bottom = max(gs1.bottom, gs2.bottom) + +gs1.update(top=top, bottom=bottom) +gs2.update(top=top, bottom=bottom) + +plt.show() +``` + +![紧凑布局示例8](https://matplotlib.org/_images/sphx_glr_demo_tight_layout_008.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.pyplot.tight_layout +matplotlib.figure.Figure.tight_layout +matplotlib.figure.Figure.add_subplot +matplotlib.pyplot.subplot2grid +matplotlib.gridspec.GridSpec +``` + +脚本的总运行时间:(0分1.072秒) + +## 下载这个示例 + +- [下载python源码: demo_tight_layout.py](https://matplotlib.org/_downloads/demo_tight_layout.py) +- [下载Jupyter notebook: demo_tight_layout.ipynb](https://matplotlib.org/_downloads/demo_tight_layout.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/fahrenheit_celsius_scales.md b/Python/matplotlab/gallery/subplots_axes_and_figures/fahrenheit_celsius_scales.md new file mode 100644 index 00000000..33109529 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/fahrenheit_celsius_scales.md @@ -0,0 +1,47 @@ +# 同一轴上的不同比例 + +演示如何在左右y轴上显示两个刻度。 + +此示例使用华氏度和摄氏度量表。 + +![同一轴上的不同比例示例](https://matplotlib.org/_images/sphx_glr_fahrenheit_celsius_scales_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def fahrenheit2celsius(temp): + """ + Returns temperature in Celsius. + """ + return (5. / 9.) * (temp - 32) + + +def convert_ax_c_to_celsius(ax_f): + """ + Update second axis according with first axis. + """ + y1, y2 = ax_f.get_ylim() + ax_c.set_ylim(fahrenheit2celsius(y1), fahrenheit2celsius(y2)) + ax_c.figure.canvas.draw() + +fig, ax_f = plt.subplots() +ax_c = ax_f.twinx() + +# automatically update ylim of ax2 when ylim of ax1 changes. +ax_f.callbacks.connect("ylim_changed", convert_ax_c_to_celsius) +ax_f.plot(np.linspace(-40, 120, 100)) +ax_f.set_xlim(0, 100) + +ax_f.set_title('Two scales: Fahrenheit and Celsius') +ax_f.set_ylabel('Fahrenheit') +ax_c.set_ylabel('Celsius') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fahrenheit_celsius_scales.py](https://matplotlib.org/_downloads/fahrenheit_celsius_scales.py) +- [下载Jupyter notebook: fahrenheit_celsius_scales.ipynb](https://matplotlib.org/_downloads/fahrenheit_celsius_scales.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/figure_title.md b/Python/matplotlab/gallery/subplots_axes_and_figures/figure_title.md new file mode 100644 index 00000000..30709274 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/figure_title.md @@ -0,0 +1,40 @@ +# 为图像设置标题 + +创建一个具有单独的子图标题和居中的图标题的图形。 + +![为图像设置标题示例](https://matplotlib.org/_images/sphx_glr_figure_title_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def f(t): + s1 = np.cos(2*np.pi*t) + e1 = np.exp(-t) + return s1 * e1 + +t1 = np.arange(0.0, 5.0, 0.1) +t2 = np.arange(0.0, 5.0, 0.02) +t3 = np.arange(0.0, 2.0, 0.01) + + +fig, axs = plt.subplots(2, 1, constrained_layout=True) +axs[0].plot(t1, f(t1), 'o', t2, f(t2), '-') +axs[0].set_title('subplot 1') +axs[0].set_xlabel('distance (m)') +axs[0].set_ylabel('Damped oscillation') +fig.suptitle('This is a somewhat long figure title', fontsize=16) + +axs[1].plot(t3, np.cos(2*np.pi*t3), '--') +axs[1].set_xlabel('time (s)') +axs[1].set_title('subplot 2') +axs[1].set_ylabel('Undamped') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: figure_title.py](https://matplotlib.org/_downloads/figure_title.py) +- [下载Jupyter notebook: figure_title.ipynb](https://matplotlib.org/_downloads/figure_title.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/ganged_plots.md b/Python/matplotlab/gallery/subplots_axes_and_figures/ganged_plots.md new file mode 100644 index 00000000..83585ccb --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/ganged_plots.md @@ -0,0 +1,42 @@ +# 创建相邻的子图 + +要创建共享公共轴(可视化)的图,可以将子图之间的hspace设置为零。 在创建子图时传递sharex = True将自动关闭除底部轴上的所有x刻度和标签。 + +在此示例中,绘图共享一个公共x轴,但您可以遵循相同的逻辑来提供公共y轴。 + +![](https://matplotlib.org/_images/sphx_glr_ganged_plots_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +t = np.arange(0.0, 2.0, 0.01) + +s1 = np.sin(2 * np.pi * t) +s2 = np.exp(-t) +s3 = s1 * s2 + +fig, axs = plt.subplots(3, 1, sharex=True) +# Remove horizontal space between axes +fig.subplots_adjust(hspace=0) + +# Plot each graph, and manually set the y tick values +axs[0].plot(t, s1) +axs[0].set_yticks(np.arange(-0.9, 1.0, 0.4)) +axs[0].set_ylim(-1, 1) + +axs[1].plot(t, s2) +axs[1].set_yticks(np.arange(0.1, 1.0, 0.2)) +axs[1].set_ylim(0, 1) + +axs[2].plot(t, s3) +axs[2].set_yticks(np.arange(-0.9, 1.0, 0.4)) +axs[2].set_ylim(-1, 1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: ganged_plots.py](https://matplotlib.org/_downloads/ganged_plots.py) +- [下载Jupyter notebook: ganged_plots.ipynb](https://matplotlib.org/_downloads/ganged_plots.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/geo_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/geo_demo.md new file mode 100644 index 00000000..d248adaa --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/geo_demo.md @@ -0,0 +1,50 @@ +# 地理预测 + +这显示了使用子图的4个可能的投影。Matplotlib还支持 [Basemaps Toolkit](https://matplotlib.org/basemap) 和 [Cartopy](http://scitools.org.uk/cartopy) 用于地理预测。 + +```python +import matplotlib.pyplot as plt +``` + +```python +plt.figure() +plt.subplot(111, projection="aitoff") +plt.title("Aitoff") +plt.grid(True) +``` + +![地理预测示例](https://matplotlib.org/_images/sphx_glr_geo_demo_001.png) + +```python +plt.figure() +plt.subplot(111, projection="hammer") +plt.title("Hammer") +plt.grid(True) +``` + +![地理预测示例2](https://matplotlib.org/_images/sphx_glr_geo_demo_002.png) + +```python +plt.figure() +plt.subplot(111, projection="lambert") +plt.title("Lambert") +plt.grid(True) +``` + +![地理预测示例3](https://matplotlib.org/_images/sphx_glr_geo_demo_003.png) + +```python +plt.figure() +plt.subplot(111, projection="mollweide") +plt.title("Mollweide") +plt.grid(True) + +plt.show() +``` + +![地理预测示例4](https://matplotlib.org/_images/sphx_glr_geo_demo_004.png) + +## 下载这个示例 + +- [下载python源码: geo_demo.py](https://matplotlib.org/_downloads/geo_demo.py) +- [下载Jupyter notebook: geo_demo.ipynb](https://matplotlib.org/_downloads/geo_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_and_subplots.md b/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_and_subplots.md new file mode 100644 index 00000000..d0704819 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_and_subplots.md @@ -0,0 +1,27 @@ +# 使用子图和GridSpec组合两个子图 + +有时我们想要在用[子图](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.subplots)创建的轴布局中组合两个子图。我们可以从轴上获取[GridSpec](https://matplotlib.org/api/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec),然后移除覆盖的轴并用新的更大的轴填充间隙。 在这里,我们创建一个布局,最后一列中的底部两个轴组合在一起。 + +请参阅:[使用GridSpec和其他功能自定义图布局](https://matplotlib.org/tutorials/intermediate/gridspec.html)。 + +![使用子图和GridSpec组合两个子图示例](https://matplotlib.org/_images/sphx_glr_gridspec_and_subplots_001.png) + +```python +import matplotlib.pyplot as plt + +fig, axs = plt.subplots(ncols=3, nrows=3) +gs = axs[1, 2].get_gridspec() +# remove the underlying axes +for ax in axs[1:, -1]: + ax.remove() +axbig = fig.add_subplot(gs[1:, -1]) +axbig.annotate('Big Axes \nGridSpec[1:, -1]', (0.1, 0.5), + xycoords='axes fraction', va='center') + +fig.tight_layout() +``` + +## 下载这个示例 + +- [下载python源码: gridspec_and_subplots.py](https://matplotlib.org/_downloads/gridspec_and_subplots.py) +- [下载Jupyter notebook: gridspec_and_subplots.ipynb](https://matplotlib.org/_downloads/gridspec_and_subplots.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_multicolumn.md b/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_multicolumn.md new file mode 100644 index 00000000..c961f749 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_multicolumn.md @@ -0,0 +1,36 @@ +# 使用GridSpec制作多列/行子图布局 + +[GridSpec](https://matplotlib.org/api/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec)是布置子打印网格的一种灵活方式。下面是一个使用3x3网格和横跨所有三列、两列和两行的轴的示例。 + +![GridSpec示例](https://matplotlib.org/_images/sphx_glr_gridspec_multicolumn_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.gridspec import GridSpec + + +def format_axes(fig): + for i, ax in enumerate(fig.axes): + ax.text(0.5, 0.5, "ax%d" % (i+1), va="center", ha="center") + ax.tick_params(labelbottom=False, labelleft=False) + +fig = plt.figure(constrained_layout=True) + +gs = GridSpec(3, 3, figure=fig) +ax1 = fig.add_subplot(gs[0, :]) +# identical to ax1 = plt.subplot(gs.new_subplotspec((0, 0), colspan=3)) +ax2 = fig.add_subplot(gs[1, :-1]) +ax3 = fig.add_subplot(gs[1:, -1]) +ax4 = fig.add_subplot(gs[-1, 0]) +ax5 = fig.add_subplot(gs[-1, -2]) + +fig.suptitle("GridSpec") +format_axes(fig) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: gridspec_multicolumn.py](https://matplotlib.org/_downloads/gridspec_multicolumn.py) +- [下载Jupyter notebook: gridspec_multicolumn.ipynb](https://matplotlib.org/_downloads/gridspec_multicolumn.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_nested.md b/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_nested.md new file mode 100644 index 00000000..2cf9bc21 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/gridspec_nested.md @@ -0,0 +1,51 @@ +# 嵌套的Gridspecs + +GridSpec可以嵌套,因此来自父GridSpec的子图可以设置嵌套的子图网格的位置。 + +![嵌套的Gridspecs示例](https://matplotlib.org/_images/sphx_glr_gridspec_nested_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec + + +def format_axes(fig): + for i, ax in enumerate(fig.axes): + ax.text(0.5, 0.5, "ax%d" % (i+1), va="center", ha="center") + ax.tick_params(labelbottom=False, labelleft=False) + + +# gridspec inside gridspec +f = plt.figure() + +gs0 = gridspec.GridSpec(1, 2, figure=f) + +gs00 = gridspec.GridSpecFromSubplotSpec(3, 3, subplot_spec=gs0[0]) + +ax1 = plt.Subplot(f, gs00[:-1, :]) +f.add_subplot(ax1) +ax2 = plt.Subplot(f, gs00[-1, :-1]) +f.add_subplot(ax2) +ax3 = plt.Subplot(f, gs00[-1, -1]) +f.add_subplot(ax3) + + +gs01 = gridspec.GridSpecFromSubplotSpec(3, 3, subplot_spec=gs0[1]) + +ax4 = plt.Subplot(f, gs01[:, :-1]) +f.add_subplot(ax4) +ax5 = plt.Subplot(f, gs01[:-1, -1]) +f.add_subplot(ax5) +ax6 = plt.Subplot(f, gs01[-1, -1]) +f.add_subplot(ax6) + +plt.suptitle("GridSpec Inside GridSpec") +format_axes(f) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: gridspec_nested.py](https://matplotlib.org/_downloads/gridspec_nested.py) +- [下载Jupyter notebook: gridspec_nested.ipynb](https://matplotlib.org/_downloads/gridspec_nested.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/invert_axes.md b/Python/matplotlab/gallery/subplots_axes_and_figures/invert_axes.md new file mode 100644 index 00000000..9a695fc9 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/invert_axes.md @@ -0,0 +1,28 @@ +# 反转轴 + +可以通过翻转轴限制的法线顺序来使用递减轴。 + +![反转轴示例](https://matplotlib.org/_images/sphx_glr_invert_axes_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +t = np.arange(0.01, 5.0, 0.01) +s = np.exp(-t) +plt.plot(t, s) + +plt.xlim(5, 0) # decreasing time + +plt.xlabel('decreasing time (s)') +plt.ylabel('voltage (mV)') +plt.title('Should be growing...') +plt.grid(True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: invert_axes.py](https://matplotlib.org/_downloads/invert_axes.py) +- [下载Jupyter notebook: invert_axes.ipynb](https://matplotlib.org/_downloads/invert_axes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/multiple_figs_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/multiple_figs_demo.md new file mode 100644 index 00000000..81114702 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/multiple_figs_demo.md @@ -0,0 +1,52 @@ +# 多外形演示 + +使用多个图形窗口和子图形。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +t = np.arange(0.0, 2.0, 0.01) +s1 = np.sin(2*np.pi*t) +s2 = np.sin(4*np.pi*t) +``` + +Create figure 1 + +```python +plt.figure(1) +plt.subplot(211) +plt.plot(t, s1) +plt.subplot(212) +plt.plot(t, 2*s1) +``` + +![多外形演示](https://matplotlib.org/_images/sphx_glr_multiple_figs_demo_001.png) + +Create figure 2 + +```python +plt.figure(2) +plt.plot(t, s2) +``` + +![多外形演示2](https://matplotlib.org/_images/sphx_glr_multiple_figs_demo_003.png) + +Now switch back to figure 1 and make some changes + +```python +plt.figure(1) +plt.subplot(211) +plt.plot(t, s2, 's') +ax = plt.gca() +ax.set_xticklabels([]) + +plt.show() +``` + +![多外形演示3](https://matplotlib.org/_images/sphx_glr_multiple_figs_demo_003.png) + +## 下载这个示例 + +- [下载python源码: multiple_figs_demo.py](https://matplotlib.org/_downloads/multiple_figs_demo.py) +- [下载Jupyter notebook: multiple_figs_demo.ipynb](https://matplotlib.org/_downloads/multiple_figs_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/shared_axis_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/shared_axis_demo.md new file mode 100644 index 00000000..9e88f7d1 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/shared_axis_demo.md @@ -0,0 +1,52 @@ +# 共享轴演示 + +通过将轴实例作为sharex或sharey kwarg传递,可以将一个轴的x或y轴限制与另一个轴共享。 + +更改一个轴上的轴限制将自动反映在另一个轴上,反之亦然,因此当您使用工具栏导航时,轴将在其共享轴上相互跟随。同样适用于轴缩放的变化(例如,log vs linear)。但是,刻度标签可能存在差异,例如,您可以选择性地关闭一个轴上的刻度标签。 + +下面的示例显示了如何在各个轴上自定义刻度标签。共享轴共享刻度定位器,刻度格式化程序,视图限制和变换(例如,对数,线性)。但是,ticklabels本身并不共享属性。这是一个功能而不是误差,因为您可能希望在上轴上使刻度标签更小,例如,在下面的示例中。 + +如果要关闭给定轴的刻度标签(例如,在子图(211)或子图(212)上),则无法执行标准技巧: + +```python +setp(ax2, xticklabels=[]) +``` + +因为这会更改在所有轴之间共享的Tick格式化程序。但您可以更改标签的可见性,这是一个特性: + +```python +setp(ax2.get_xticklabels(), visible=False) +``` + +![共享轴演示](https://matplotlib.org/_images/sphx_glr_shared_axis_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +t = np.arange(0.01, 5.0, 0.01) +s1 = np.sin(2 * np.pi * t) +s2 = np.exp(-t) +s3 = np.sin(4 * np.pi * t) + +ax1 = plt.subplot(311) +plt.plot(t, s1) +plt.setp(ax1.get_xticklabels(), fontsize=6) + +# share x only +ax2 = plt.subplot(312, sharex=ax1) +plt.plot(t, s2) +# make these tick labels invisible +plt.setp(ax2.get_xticklabels(), visible=False) + +# share x and y +ax3 = plt.subplot(313, sharex=ax1, sharey=ax1) +plt.plot(t, s3) +plt.xlim(0.01, 5.0) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: shared_axis_demo.py](https://matplotlib.org/_downloads/shared_axis_demo.py) +- [下载Jupyter notebook: shared_axis_demo.ipynb](https://matplotlib.org/_downloads/shared_axis_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/subplot.md b/Python/matplotlab/gallery/subplots_axes_and_figures/subplot.md new file mode 100644 index 00000000..18cb034d --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/subplot.md @@ -0,0 +1,34 @@ +# 多个子图 + +带有多个子图的简单演示。 + +![多个子图](https://matplotlib.org/_images/sphx_glr_subplot_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +x1 = np.linspace(0.0, 5.0) +x2 = np.linspace(0.0, 2.0) + +y1 = np.cos(2 * np.pi * x1) * np.exp(-x1) +y2 = np.cos(2 * np.pi * x2) + +plt.subplot(2, 1, 1) +plt.plot(x1, y1, 'o-') +plt.title('A tale of 2 subplots') +plt.ylabel('Damped oscillation') + +plt.subplot(2, 1, 2) +plt.plot(x2, y2, '.-') +plt.xlabel('time (s)') +plt.ylabel('Undamped') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: subplot.py](https://matplotlib.org/_downloads/subplot.py) +- [下载Jupyter notebook: subplot.ipynb](https://matplotlib.org/_downloads/subplot.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/subplot_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/subplot_demo.md new file mode 100644 index 00000000..57c38a06 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/subplot_demo.md @@ -0,0 +1,32 @@ +# 基本子图演示 + +演示有两个子图。有关更多选项,请参阅[子图演示](https://matplotlib.org/gallery/subplots_axes_and_figures/subplots_demo.html)。 + +![基本子图演示](https://matplotlib.org/_images/sphx_glr_subplot_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Data for plotting +x1 = np.linspace(0.0, 5.0) +x2 = np.linspace(0.0, 2.0) +y1 = np.cos(2 * np.pi * x1) * np.exp(-x1) +y2 = np.cos(2 * np.pi * x2) + +# Create two subplots sharing y axis +fig, (ax1, ax2) = plt.subplots(2, sharey=True) + +ax1.plot(x1, y1, 'ko-') +ax1.set(title='A tale of 2 subplots', ylabel='Damped oscillation') + +ax2.plot(x2, y2, 'r.-') +ax2.set(xlabel='time (s)', ylabel='Undamped') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: subplot_demo.py](https://matplotlib.org/_downloads/subplot_demo.py) +- [下载Jupyter notebook: subplot_demo.ipynb](https://matplotlib.org/_downloads/subplot_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/subplot_toolbar.md b/Python/matplotlab/gallery/subplots_axes_and_figures/subplot_toolbar.md new file mode 100644 index 00000000..396b6c05 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/subplot_toolbar.md @@ -0,0 +1,30 @@ +# 子图工具栏 + +Matplotlib有一个工具栏可用于调整子图(suplot)间距。 + +![子图工具栏示例](https://matplotlib.org/_images/sphx_glr_subplot_toolbar_001.png) + +![子图工具栏示例2](https://matplotlib.org/_images/sphx_glr_subplot_toolbar_002.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +fig, axs = plt.subplots(2, 2) + +axs[0, 0].imshow(np.random.random((100, 100))) + +axs[0, 1].imshow(np.random.random((100, 100))) + +axs[1, 0].imshow(np.random.random((100, 100))) + +axs[1, 1].imshow(np.random.random((100, 100))) + +plt.subplot_tool() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: subplot_toolbar.py](https://matplotlib.org/_downloads/subplot_toolbar.py) +- [下载Jupyter notebook: subplot_toolbar.ipynb](https://matplotlib.org/_downloads/subplot_toolbar.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/subplots_adjust.md b/Python/matplotlab/gallery/subplots_axes_and_figures/subplots_adjust.md new file mode 100644 index 00000000..69b12bea --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/subplots_adjust.md @@ -0,0 +1,29 @@ +# 调整子图 + +使用 [subplots_adjust()](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplots_adjust.html#matplotlib.pyplot.subplots_adjust) 调整边距和子图的间距。 + +![调整子图](https://matplotlib.org/_images/sphx_glr_subplots_adjust_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +plt.subplot(211) +plt.imshow(np.random.random((100, 100)), cmap=plt.cm.BuPu_r) +plt.subplot(212) +plt.imshow(np.random.random((100, 100)), cmap=plt.cm.BuPu_r) + +plt.subplots_adjust(bottom=0.1, right=0.8, top=0.9) +cax = plt.axes([0.85, 0.1, 0.075, 0.8]) +plt.colorbar(cax=cax) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: subplots_adjust.py](https://matplotlib.org/_downloads/subplots_adjust.py) +- [下载Jupyter notebook: subplots_adjust.ipynb](https://matplotlib.org/_downloads/subplots_adjust.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/subplots_demo.md b/Python/matplotlab/gallery/subplots_axes_and_figures/subplots_demo.md new file mode 100644 index 00000000..644a4c3d --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/subplots_demo.md @@ -0,0 +1,124 @@ +# 子图演示大全 + +说明 plt.subplots() 使用的示例。 + +此函数只需一次调用即可创建地物和子图网格,同时对各个图的创建方式提供合理的控制。要对子打印创建进行非常精细的调整,仍然可以直接在新地物上使用 add_subplot()。 + +```python +import matplotlib.pyplot as plt +import numpy as np + +# Simple data to display in various forms +x = np.linspace(0, 2 * np.pi, 400) +y = np.sin(x ** 2) + +plt.close('all') +``` + +只有一个图形和一个子图 + +```python +f, ax = plt.subplots() +ax.plot(x, y) +ax.set_title('Simple plot') +``` + +![子图演示大全](https://matplotlib.org/_images/sphx_glr_subplots_demo_001.png) + +两个子图,轴数组是一维的。 + +```python +f, axarr = plt.subplots(2, sharex=True) +f.suptitle('Sharing X axis') +axarr[0].plot(x, y) +axarr[1].scatter(x, y) +``` + +![子图演示大全2](https://matplotlib.org/_images/sphx_glr_subplots_demo_002.png) + +Two subplots, unpack the axes array immediately + +```python +f, (ax1, ax2) = plt.subplots(1, 2, sharey=True) +f.suptitle('Sharing Y axis') +ax1.plot(x, y) +ax2.scatter(x, y) +``` + +![子图演示大全3](https://matplotlib.org/_images/sphx_glr_subplots_demo_003.png) + +共享x/y轴的三个子图 + +```python +f, axarr = plt.subplots(3, sharex=True, sharey=True) +f.suptitle('Sharing both axes') +axarr[0].plot(x, y) +axarr[1].scatter(x, y) +axarr[2].scatter(x, 2 * y ** 2 - 1, color='r') +# Bring subplots close to each other. +f.subplots_adjust(hspace=0) +# Hide x labels and tick labels for all but bottom plot. +for ax in axarr: + ax.label_outer() +``` + +![子图演示大全4](https://matplotlib.org/_images/sphx_glr_subplots_demo_004.png) + +行和列共享 + +```python +f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, sharex='col', sharey='row') +f.suptitle('Sharing x per column, y per row') +ax1.plot(x, y) +ax2.scatter(x, y) +ax3.scatter(x, 2 * y ** 2 - 1, color='r') +ax4.plot(x, 2 * y ** 2 - 1, color='r') +``` + +![子图演示大全5](https://matplotlib.org/_images/sphx_glr_subplots_demo_005.png) + +四个轴,作为二维数组返回 + +```python +f, axarr = plt.subplots(2, 2) +axarr[0, 0].plot(x, y) +axarr[0, 0].set_title('Axis [0,0]') +axarr[0, 1].scatter(x, y) +axarr[0, 1].set_title('Axis [0,1]') +axarr[1, 0].plot(x, y ** 2) +axarr[1, 0].set_title('Axis [1,0]') +axarr[1, 1].scatter(x, y ** 2) +axarr[1, 1].set_title('Axis [1,1]') +for ax in axarr.flat: + ax.set(xlabel='x-label', ylabel='y-label') +# Hide x labels and tick labels for top plots and y ticks for right plots. +for ax in axarr.flat: + ax.label_outer() +``` + +![子图演示大全6](https://matplotlib.org/_images/sphx_glr_subplots_demo_006.png) + +四极轴 + +```python +f, axarr = plt.subplots(2, 2, subplot_kw=dict(projection='polar')) +axarr[0, 0].plot(x, y) +axarr[0, 0].set_title('Axis [0,0]') +axarr[0, 1].scatter(x, y) +axarr[0, 1].set_title('Axis [0,1]') +axarr[1, 0].plot(x, y ** 2) +axarr[1, 0].set_title('Axis [1,0]') +axarr[1, 1].scatter(x, y ** 2) +axarr[1, 1].set_title('Axis [1,1]') +# Fine-tune figure; make subplots farther from each other. +f.subplots_adjust(hspace=0.3) + +plt.show() +``` + +![子图演示大全7](https://matplotlib.org/_images/sphx_glr_subplots_demo_007.png) + +## 下载这个示例 + +- [下载python源码: subplots_demo.py](https://matplotlib.org/_downloads/subplots_demo.py) +- [下载Jupyter notebook: subplots_demo.ipynb](https://matplotlib.org/_downloads/subplots_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/two_scales.md b/Python/matplotlab/gallery/subplots_axes_and_figures/two_scales.md new file mode 100644 index 00000000..6f81efb6 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/two_scales.md @@ -0,0 +1,53 @@ +# 绘制不同比例 + +在同一轴上的两个图样,具有不同的左右比例。 + +诀窍是使用共享同一x轴的两个不同的轴。您可以根据需要使用单独的 [matplotlib.ticker](https://matplotlib.org/api/ticker_api.html#module-matplotlib.ticker) 格式化程序和定位器,因为这两个轴是独立的。 + +这些轴是通过调用 [Axes.twinx()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.twinx.html#matplotlib.axes.Axes.twinx) 方法生成的。同样,[Axes.twiny()](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.twiny.html#matplotlib.axes.Axes.twiny) 可用于生成共享y轴但具有不同顶部和底部比例的轴。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Create some mock data +t = np.arange(0.01, 10.0, 0.01) +data1 = np.exp(t) +data2 = np.sin(2 * np.pi * t) + +fig, ax1 = plt.subplots() + +color = 'tab:red' +ax1.set_xlabel('time (s)') +ax1.set_ylabel('exp', color=color) +ax1.plot(t, data1, color=color) +ax1.tick_params(axis='y', labelcolor=color) + +ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis + +color = 'tab:blue' +ax2.set_ylabel('sin', color=color) # we already handled the x-label with ax1 +ax2.plot(t, data2, color=color) +ax2.tick_params(axis='y', labelcolor=color) + +fig.tight_layout() # otherwise the right y-label is slightly clipped +plt.show() +``` + +![绘制不同尺度示例](https://matplotlib.org/_images/sphx_glr_two_scales_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.twinx +matplotlib.axes.Axes.twiny +matplotlib.axes.Axes.tick_params +``` + +## 下载这个示例 + +- [下载python源码: two_scales.py](https://matplotlib.org/_downloads/two_scales.py) +- [下载Jupyter notebook: two_scales.ipynb](https://matplotlib.org/_downloads/two_scales.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/subplots_axes_and_figures/zoom_inset_axes.md b/Python/matplotlab/gallery/subplots_axes_and_figures/zoom_inset_axes.md new file mode 100644 index 00000000..e6ede4e6 --- /dev/null +++ b/Python/matplotlab/gallery/subplots_axes_and_figures/zoom_inset_axes.md @@ -0,0 +1,61 @@ +# 缩放区域嵌入轴 + +插入轴的示例和显示缩放位置的矩形。 + +```python +import matplotlib.pyplot as plt +import numpy as np + + +def get_demo_image(): + from matplotlib.cbook import get_sample_data + import numpy as np + f = get_sample_data("axes_grid/bivariate_normal.npy", asfileobj=False) + z = np.load(f) + # z is a numpy array of 15x15 + return z, (-3, 4, -4, 3) + +fig, ax = plt.subplots(figsize=[5, 4]) + +# make data +Z, extent = get_demo_image() +Z2 = np.zeros([150, 150], dtype="d") +ny, nx = Z.shape +Z2[30:30 + ny, 30:30 + nx] = Z + +ax.imshow(Z2, extent=extent, interpolation="nearest", + origin="lower") + +# inset axes.... +axins = ax.inset_axes([0.5, 0.5, 0.47, 0.47]) +axins.imshow(Z2, extent=extent, interpolation="nearest", + origin="lower") +# sub region of the original image +x1, x2, y1, y2 = -1.5, -0.9, -2.5, -1.9 +axins.set_xlim(x1, x2) +axins.set_ylim(y1, y2) +axins.set_xticklabels('') +axins.set_yticklabels('') + +ax.indicate_inset_zoom(axins) + +plt.show() +``` + +![缩放区域嵌入轴示例](https://matplotlib.org/_images/sphx_glr_zoom_inset_axes_001.png) + +## 参考 + +此示例中显示了以下函数和方法的用法: + +```python +import matplotlib +matplotlib.axes.Axes.inset_axes +matplotlib.axes.Axes.indicate_inset_zoom +matplotlib.axes.Axes.imshow +``` + +## 下载这个示例 + +- [下载python源码: zoom_inset_axes.py](https://matplotlib.org/_downloads/zoom_inset_axes.py) +- [下载Jupyter notebook: zoom_inset_axes.ipynb](https://matplotlib.org/_downloads/zoom_inset_axes.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/accented_text.md b/Python/matplotlab/gallery/text_labels_and_annotations/accented_text.md new file mode 100644 index 00000000..b7185693 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/accented_text.md @@ -0,0 +1,39 @@ +# 在matplotlib中使用重音文本 + +Matplotlib通过tex、mathtext或unicode支持重音字符。 + +使用mathtext,提供以下重音:hat,breve,grave,bar,acute,tilde,vec,dot,ddot。所有这些语法都具有相同的语法,例如,要创建一个overbar,你可以使用 bar{o} 或者使用 o 元音来执行 ddot{o}。 还提供了快捷方式,例如: "o 'e `e ~n .x ^y + +![绘制重音文本示例](https://matplotlib.org/_images/sphx_glr_accented_text_001.png) + +![绘制重音文本示例2](https://matplotlib.org/_images/sphx_glr_accented_text_002.png) + +```python +import matplotlib.pyplot as plt + +# Mathtext demo +fig, ax = plt.subplots() +ax.plot(range(10)) +ax.set_title(r'$\ddot{o}\acute{e}\grave{e}\hat{O}' + r'\breve{i}\bar{A}\tilde{n}\vec{q}$', fontsize=20) + +# Shorthand is also supported and curly braces are optional +ax.set_xlabel(r"""$\"o\ddot o \'e\`e\~n\.x\^y$""", fontsize=20) +ax.text(4, 0.5, r"$F=m\ddot{x}$") +fig.tight_layout() + +# Unicode demo +fig, ax = plt.subplots() +ax.set_title("GISCARD CHAHUTÉ À L'ASSEMBLÉE") +ax.set_xlabel("LE COUP DE DÉ DE DE GAULLE") +ax.set_ylabel('André was here!') +ax.text(0.2, 0.8, 'Institut für Festkörperphysik', rotation=45) +ax.text(0.4, 0.2, 'AVA (check kerning)') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: accented_text.py](https://matplotlib.org/_downloads/accented_text.py) +- [下载Jupyter notebook: accented_text.ipynb](https://matplotlib.org/_downloads/accented_text.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/annotation_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/annotation_demo.md new file mode 100644 index 00000000..06ef3eb4 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/annotation_demo.md @@ -0,0 +1,392 @@ +# 注释图 + +以下示例显示了如何在matplotlib中注释绘图。这包括突出显示特定的兴趣点,并使用各种视觉工具来引起对这一点的关注。有关matplotlib中注释和文本工具的更完整和深入的描述,请参阅[注释教程](https://matplotlib.org/tutorials/text/annotations.html)。 + +```python +import matplotlib.pyplot as plt +from matplotlib.patches import Ellipse +import numpy as np +from matplotlib.text import OffsetFrom +``` + +## 指定文本点和注释点 + +您必须指定注释点 xy = (x, y) 来注释此点。另外,您可以为此注释的文本位置指定文本点 xytext=(x, y)。 (可选)您可以使用xycoords和textcoords的以下字符串之一指定xy和xytext的坐标系(默认为'data'): + +```python +'figure points' : points from the lower left corner of the figure +'figure pixels' : pixels from the lower left corner of the figure +'figure fraction' : 0,0 is lower left of figure and 1,1 is upper, right +'axes points' : points from lower left corner of axes +'axes pixels' : pixels from lower left corner of axes +'axes fraction' : 0,0 is lower left of axes and 1,1 is upper right +'offset points' : Specify an offset (in points) from the xy value +'offset pixels' : Specify an offset (in pixels) from the xy value +'data' : use the axes data coordinate system +``` + +注意:对于物理坐标系(点或像素),原点是图形或轴的(底部,左侧)。 + +(可选)您可以通过提供箭头属性字典来指定箭头属性,该属性可以从文本绘制和箭头到注释点 + +有效关键点是: + +```python +width : the width of the arrow in points +frac : the fraction of the arrow length occupied by the head +headwidth : the width of the base of the arrow head in points +shrink : move the tip and base some percent away from the + annotated point and text +any key for matplotlib.patches.polygon (e.g., facecolor) +``` + +```python +# Create our figure and data we'll use for plotting +fig, ax = plt.subplots(figsize=(3, 3)) + +t = np.arange(0.0, 5.0, 0.01) +s = np.cos(2*np.pi*t) + +# Plot a line and add some simple annotations +line, = ax.plot(t, s) +ax.annotate('figure pixels', + xy=(10, 10), xycoords='figure pixels') +ax.annotate('figure points', + xy=(80, 80), xycoords='figure points') +ax.annotate('figure fraction', + xy=(.025, .975), xycoords='figure fraction', + horizontalalignment='left', verticalalignment='top', + fontsize=20) + +# The following examples show off how these arrows are drawn. + +ax.annotate('point offset from data', + xy=(2, 1), xycoords='data', + xytext=(-15, 25), textcoords='offset points', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='right', verticalalignment='bottom') + +ax.annotate('axes fraction', + xy=(3, 1), xycoords='data', + xytext=(0.8, 0.95), textcoords='axes fraction', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='right', verticalalignment='top') + +# You may also use negative points or pixels to specify from (right, top). +# E.g., (-10, 10) is 10 points to the left of the right side of the axes and 10 +# points above the bottom + +ax.annotate('pixel offset from axes fraction', + xy=(1, 0), xycoords='axes fraction', + xytext=(-20, 20), textcoords='offset pixels', + horizontalalignment='right', + verticalalignment='bottom') + +ax.set(xlim=(-1, 5), ylim=(-3, 5)) +``` + +![注释图示例](https://matplotlib.org/_images/sphx_glr_annotation_demo_001.png) + +## 使用多个坐标系和轴类型 + +您可以在不同位置和坐标系中指定xypoint和xytext,也可以选择打开连接线并使用标记标记点。 注释也适用于极轴。 + +在下面的示例中,xy点是本机坐标(xycoords默认为'data')。对于极轴,这是在(θ,半径)空间中。示例中的文本放在小数字坐标系中。文本关键字args如水平和垂直对齐被尊重。 + +```python +fig, ax = plt.subplots(subplot_kw=dict(projection='polar'), figsize=(3, 3)) +r = np.arange(0, 1, 0.001) +theta = 2*2*np.pi*r +line, = ax.plot(theta, r) + +ind = 800 +thisr, thistheta = r[ind], theta[ind] +ax.plot([thistheta], [thisr], 'o') +ax.annotate('a polar annotation', + xy=(thistheta, thisr), # theta, radius + xytext=(0.05, 0.05), # fraction, fraction + textcoords='figure fraction', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='left', + verticalalignment='bottom') + +# You can also use polar notation on a cartesian axes. Here the native +# coordinate system ('data') is cartesian, so you need to specify the +# xycoords and textcoords as 'polar' if you want to use (theta, radius). + +el = Ellipse((0, 0), 10, 20, facecolor='r', alpha=0.5) + +fig, ax = plt.subplots(subplot_kw=dict(aspect='equal')) +ax.add_artist(el) +el.set_clip_box(ax.bbox) +ax.annotate('the top', + xy=(np.pi/2., 10.), # theta, radius + xytext=(np.pi/3, 20.), # theta, radius + xycoords='polar', + textcoords='polar', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='left', + verticalalignment='bottom', + clip_on=True) # clip to the axes bounding box + +ax.set(xlim=[-20, 20], ylim=[-20, 20]) +``` + +![注释图示例2](https://matplotlib.org/_images/sphx_glr_annotation_demo_002.png) + +![注释图示例3](https://matplotlib.org/_images/sphx_glr_annotation_demo_003.png) + +## 自定义箭头和气泡样式 + +xytext和注释点之间的箭头以及覆盖注释文本的气泡可高度自定义。 下面是一些参数选项以及它们的结果输出。 + +```python +fig, ax = plt.subplots(figsize=(8, 5)) + +t = np.arange(0.0, 5.0, 0.01) +s = np.cos(2*np.pi*t) +line, = ax.plot(t, s, lw=3) + +ax.annotate('straight', + xy=(0, 1), xycoords='data', + xytext=(-50, 30), textcoords='offset points', + arrowprops=dict(arrowstyle="->")) + +ax.annotate('arc3,\nrad 0.2', + xy=(0.5, -1), xycoords='data', + xytext=(-80, -60), textcoords='offset points', + arrowprops=dict(arrowstyle="->", + connectionstyle="arc3,rad=.2")) + +ax.annotate('arc,\nangle 50', + xy=(1., 1), xycoords='data', + xytext=(-90, 50), textcoords='offset points', + arrowprops=dict(arrowstyle="->", + connectionstyle="arc,angleA=0,armA=50,rad=10")) + +ax.annotate('arc,\narms', + xy=(1.5, -1), xycoords='data', + xytext=(-80, -60), textcoords='offset points', + arrowprops=dict(arrowstyle="->", + connectionstyle="arc,angleA=0,armA=40,angleB=-90,armB=30,rad=7")) + +ax.annotate('angle,\nangle 90', + xy=(2., 1), xycoords='data', + xytext=(-70, 30), textcoords='offset points', + arrowprops=dict(arrowstyle="->", + connectionstyle="angle,angleA=0,angleB=90,rad=10")) + +ax.annotate('angle3,\nangle -90', + xy=(2.5, -1), xycoords='data', + xytext=(-80, -60), textcoords='offset points', + arrowprops=dict(arrowstyle="->", + connectionstyle="angle3,angleA=0,angleB=-90")) + +ax.annotate('angle,\nround', + xy=(3., 1), xycoords='data', + xytext=(-60, 30), textcoords='offset points', + bbox=dict(boxstyle="round", fc="0.8"), + arrowprops=dict(arrowstyle="->", + connectionstyle="angle,angleA=0,angleB=90,rad=10")) + +ax.annotate('angle,\nround4', + xy=(3.5, -1), xycoords='data', + xytext=(-70, -80), textcoords='offset points', + size=20, + bbox=dict(boxstyle="round4,pad=.5", fc="0.8"), + arrowprops=dict(arrowstyle="->", + connectionstyle="angle,angleA=0,angleB=-90,rad=10")) + +ax.annotate('angle,\nshrink', + xy=(4., 1), xycoords='data', + xytext=(-60, 30), textcoords='offset points', + bbox=dict(boxstyle="round", fc="0.8"), + arrowprops=dict(arrowstyle="->", + shrinkA=0, shrinkB=10, + connectionstyle="angle,angleA=0,angleB=90,rad=10")) + +# You can pass an empty string to get only annotation arrows rendered +ann = ax.annotate('', xy=(4., 1.), xycoords='data', + xytext=(4.5, -1), textcoords='data', + arrowprops=dict(arrowstyle="<->", + connectionstyle="bar", + ec="k", + shrinkA=5, shrinkB=5)) + +ax.set(xlim=(-1, 5), ylim=(-4, 3)) + +# We'll create another figure so that it doesn't get too cluttered +fig, ax = plt.subplots() + +el = Ellipse((2, -1), 0.5, 0.5) +ax.add_patch(el) + +ax.annotate('$->$', + xy=(2., -1), xycoords='data', + xytext=(-150, -140), textcoords='offset points', + bbox=dict(boxstyle="round", fc="0.8"), + arrowprops=dict(arrowstyle="->", + patchB=el, + connectionstyle="angle,angleA=90,angleB=0,rad=10")) + +ax.annotate('arrow\nfancy', + xy=(2., -1), xycoords='data', + xytext=(-100, 60), textcoords='offset points', + size=20, + # bbox=dict(boxstyle="round", fc="0.8"), + arrowprops=dict(arrowstyle="fancy", + fc="0.6", ec="none", + patchB=el, + connectionstyle="angle3,angleA=0,angleB=-90")) + +ax.annotate('arrow\nsimple', + xy=(2., -1), xycoords='data', + xytext=(100, 60), textcoords='offset points', + size=20, + # bbox=dict(boxstyle="round", fc="0.8"), + arrowprops=dict(arrowstyle="simple", + fc="0.6", ec="none", + patchB=el, + connectionstyle="arc3,rad=0.3")) + +ax.annotate('wedge', + xy=(2., -1), xycoords='data', + xytext=(-100, -100), textcoords='offset points', + size=20, + # bbox=dict(boxstyle="round", fc="0.8"), + arrowprops=dict(arrowstyle="wedge,tail_width=0.7", + fc="0.6", ec="none", + patchB=el, + connectionstyle="arc3,rad=-0.3")) + +ann = ax.annotate('bubble,\ncontours', + xy=(2., -1), xycoords='data', + xytext=(0, -70), textcoords='offset points', + size=20, + bbox=dict(boxstyle="round", + fc=(1.0, 0.7, 0.7), + ec=(1., .5, .5)), + arrowprops=dict(arrowstyle="wedge,tail_width=1.", + fc=(1.0, 0.7, 0.7), ec=(1., .5, .5), + patchA=None, + patchB=el, + relpos=(0.2, 0.8), + connectionstyle="arc3,rad=-0.1")) + +ann = ax.annotate('bubble', + xy=(2., -1), xycoords='data', + xytext=(55, 0), textcoords='offset points', + size=20, va="center", + bbox=dict(boxstyle="round", fc=(1.0, 0.7, 0.7), ec="none"), + arrowprops=dict(arrowstyle="wedge,tail_width=1.", + fc=(1.0, 0.7, 0.7), ec="none", + patchA=None, + patchB=el, + relpos=(0.2, 0.5))) + +ax.set(xlim=(-1, 5), ylim=(-5, 3)) +``` + +![注释图示例4](https://matplotlib.org/_images/sphx_glr_annotation_demo_005.png) + +![注释图示例5](https://matplotlib.org/_images/sphx_glr_annotation_demo_005.png) + +## 更多坐标系的例子 + +下面我们将展示几个坐标系的例子,以及如何指定注释的位置。 + +```python +fig, (ax1, ax2) = plt.subplots(1, 2) + +bbox_args = dict(boxstyle="round", fc="0.8") +arrow_args = dict(arrowstyle="->") + +# Here we'll demonstrate the extents of the coordinate system and how +# we place annotating text. + +ax1.annotate('figure fraction : 0, 0', xy=(0, 0), xycoords='figure fraction', + xytext=(20, 20), textcoords='offset points', + ha="left", va="bottom", + bbox=bbox_args, + arrowprops=arrow_args) + +ax1.annotate('figure fraction : 1, 1', xy=(1, 1), xycoords='figure fraction', + xytext=(-20, -20), textcoords='offset points', + ha="right", va="top", + bbox=bbox_args, + arrowprops=arrow_args) + +ax1.annotate('axes fraction : 0, 0', xy=(0, 0), xycoords='axes fraction', + xytext=(20, 20), textcoords='offset points', + ha="left", va="bottom", + bbox=bbox_args, + arrowprops=arrow_args) + +ax1.annotate('axes fraction : 1, 1', xy=(1, 1), xycoords='axes fraction', + xytext=(-20, -20), textcoords='offset points', + ha="right", va="top", + bbox=bbox_args, + arrowprops=arrow_args) + +# It is also possible to generate draggable annotations + +an1 = ax1.annotate('Drag me 1', xy=(.5, .7), xycoords='data', + #xytext=(.5, .7), textcoords='data', + ha="center", va="center", + bbox=bbox_args, + #arrowprops=arrow_args + ) + +an2 = ax1.annotate('Drag me 2', xy=(.5, .5), xycoords=an1, + xytext=(.5, .3), textcoords='axes fraction', + ha="center", va="center", + bbox=bbox_args, + arrowprops=dict(patchB=an1.get_bbox_patch(), + connectionstyle="arc3,rad=0.2", + **arrow_args)) +an1.draggable() +an2.draggable() + +an3 = ax1.annotate('', xy=(.5, .5), xycoords=an2, + xytext=(.5, .5), textcoords=an1, + ha="center", va="center", + bbox=bbox_args, + arrowprops=dict(patchA=an1.get_bbox_patch(), + patchB=an2.get_bbox_patch(), + connectionstyle="arc3,rad=0.2", + **arrow_args)) + +# Finally we'll show off some more complex annotation and placement + +text = ax2.annotate('xy=(0, 1)\nxycoords=("data", "axes fraction")', + xy=(0, 1), xycoords=("data", 'axes fraction'), + xytext=(0, -20), textcoords='offset points', + ha="center", va="top", + bbox=bbox_args, + arrowprops=arrow_args) + +ax2.annotate('xy=(0.5, 0)\nxycoords=artist', + xy=(0.5, 0.), xycoords=text, + xytext=(0, -20), textcoords='offset points', + ha="center", va="top", + bbox=bbox_args, + arrowprops=arrow_args) + +ax2.annotate('xy=(0.8, 0.5)\nxycoords=ax1.transData', + xy=(0.8, 0.5), xycoords=ax1.transData, + xytext=(10, 10), + textcoords=OffsetFrom(ax2.bbox, (0, 0), "points"), + ha="left", va="bottom", + bbox=bbox_args, + arrowprops=arrow_args) + +ax2.set(xlim=[-2, 2], ylim=[-2, 2]) +plt.show() +``` + +![注释图示例6](https://matplotlib.org/_images/sphx_glr_annotation_demo_006.png) + +## 下载这个示例 + +- [下载python源码: annotation_demo.py](https://matplotlib.org/_downloads/annotation_demo.py) +- [下载Jupyter notebook: annotation_demo.ipynb](https://matplotlib.org/_downloads/annotation_demo.ipynb) diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/arrow_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/arrow_demo.md new file mode 100644 index 00000000..86e79e0d --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/arrow_demo.md @@ -0,0 +1,316 @@ +# 箭头符号演示 + +新的花式箭头工具的箭头绘制示例。 + +代码由此人贡献: Rob Knight < rob@spot.colorado.edu > + +用法: + +![箭头符号图示例](https://matplotlib.org/_images/sphx_glr_arrow_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +rates_to_bases = {'r1': 'AT', 'r2': 'TA', 'r3': 'GA', 'r4': 'AG', 'r5': 'CA', + 'r6': 'AC', 'r7': 'GT', 'r8': 'TG', 'r9': 'CT', 'r10': 'TC', + 'r11': 'GC', 'r12': 'CG'} +numbered_bases_to_rates = {v: k for k, v in rates_to_bases.items()} +lettered_bases_to_rates = {v: 'r' + v for k, v in rates_to_bases.items()} + + +def add_dicts(d1, d2): + """Adds two dicts and returns the result.""" + result = d1.copy() + result.update(d2) + return result + + +def make_arrow_plot(data, size=4, display='length', shape='right', + max_arrow_width=0.03, arrow_sep=0.02, alpha=0.5, + normalize_data=False, ec=None, labelcolor=None, + head_starts_at_zero=True, + rate_labels=lettered_bases_to_rates, + **kwargs): + """Makes an arrow plot. + + Parameters: + + data: dict with probabilities for the bases and pair transitions. + size: size of the graph in inches. + display: 'length', 'width', or 'alpha' for arrow property to change. + shape: 'full', 'left', or 'right' for full or half arrows. + max_arrow_width: maximum width of an arrow, data coordinates. + arrow_sep: separation between arrows in a pair, data coordinates. + alpha: maximum opacity of arrows, default 0.8. + + **kwargs can be anything allowed by a Arrow object, e.g. + linewidth and edgecolor. + """ + + plt.xlim(-0.5, 1.5) + plt.ylim(-0.5, 1.5) + plt.gcf().set_size_inches(size, size) + plt.xticks([]) + plt.yticks([]) + max_text_size = size * 12 + min_text_size = size + label_text_size = size * 2.5 + text_params = {'ha': 'center', 'va': 'center', 'family': 'sans-serif', + 'fontweight': 'bold'} + r2 = np.sqrt(2) + + deltas = { + 'AT': (1, 0), + 'TA': (-1, 0), + 'GA': (0, 1), + 'AG': (0, -1), + 'CA': (-1 / r2, 1 / r2), + 'AC': (1 / r2, -1 / r2), + 'GT': (1 / r2, 1 / r2), + 'TG': (-1 / r2, -1 / r2), + 'CT': (0, 1), + 'TC': (0, -1), + 'GC': (1, 0), + 'CG': (-1, 0)} + + colors = { + 'AT': 'r', + 'TA': 'k', + 'GA': 'g', + 'AG': 'r', + 'CA': 'b', + 'AC': 'r', + 'GT': 'g', + 'TG': 'k', + 'CT': 'b', + 'TC': 'k', + 'GC': 'g', + 'CG': 'b'} + + label_positions = { + 'AT': 'center', + 'TA': 'center', + 'GA': 'center', + 'AG': 'center', + 'CA': 'left', + 'AC': 'left', + 'GT': 'left', + 'TG': 'left', + 'CT': 'center', + 'TC': 'center', + 'GC': 'center', + 'CG': 'center'} + + def do_fontsize(k): + return float(np.clip(max_text_size * np.sqrt(data[k]), + min_text_size, max_text_size)) + + A = plt.text(0, 1, '$A_3$', color='r', size=do_fontsize('A'), + **text_params) + T = plt.text(1, 1, '$T_3$', color='k', size=do_fontsize('T'), + **text_params) + G = plt.text(0, 0, '$G_3$', color='g', size=do_fontsize('G'), + **text_params) + C = plt.text(1, 0, '$C_3$', color='b', size=do_fontsize('C'), + **text_params) + + arrow_h_offset = 0.25 # data coordinates, empirically determined + max_arrow_length = 1 - 2 * arrow_h_offset + max_head_width = 2.5 * max_arrow_width + max_head_length = 2 * max_arrow_width + arrow_params = {'length_includes_head': True, 'shape': shape, + 'head_starts_at_zero': head_starts_at_zero} + ax = plt.gca() + sf = 0.6 # max arrow size represents this in data coords + + d = (r2 / 2 + arrow_h_offset - 0.5) / r2 # distance for diags + r2v = arrow_sep / r2 # offset for diags + + # tuple of x, y for start position + positions = { + 'AT': (arrow_h_offset, 1 + arrow_sep), + 'TA': (1 - arrow_h_offset, 1 - arrow_sep), + 'GA': (-arrow_sep, arrow_h_offset), + 'AG': (arrow_sep, 1 - arrow_h_offset), + 'CA': (1 - d - r2v, d - r2v), + 'AC': (d + r2v, 1 - d + r2v), + 'GT': (d - r2v, d + r2v), + 'TG': (1 - d + r2v, 1 - d - r2v), + 'CT': (1 - arrow_sep, arrow_h_offset), + 'TC': (1 + arrow_sep, 1 - arrow_h_offset), + 'GC': (arrow_h_offset, arrow_sep), + 'CG': (1 - arrow_h_offset, -arrow_sep)} + + if normalize_data: + # find maximum value for rates, i.e. where keys are 2 chars long + max_val = 0 + for k, v in data.items(): + if len(k) == 2: + max_val = max(max_val, v) + # divide rates by max val, multiply by arrow scale factor + for k, v in data.items(): + data[k] = v / max_val * sf + + def draw_arrow(pair, alpha=alpha, ec=ec, labelcolor=labelcolor): + # set the length of the arrow + if display == 'length': + length = max_head_length + data[pair] / sf * (max_arrow_length - + max_head_length) + else: + length = max_arrow_length + # set the transparency of the arrow + if display == 'alpha': + alpha = min(data[pair] / sf, alpha) + + # set the width of the arrow + if display == 'width': + scale = data[pair] / sf + width = max_arrow_width * scale + head_width = max_head_width * scale + head_length = max_head_length * scale + else: + width = max_arrow_width + head_width = max_head_width + head_length = max_head_length + + fc = colors[pair] + ec = ec or fc + + x_scale, y_scale = deltas[pair] + x_pos, y_pos = positions[pair] + plt.arrow(x_pos, y_pos, x_scale * length, y_scale * length, + fc=fc, ec=ec, alpha=alpha, width=width, + head_width=head_width, head_length=head_length, + **arrow_params) + + # figure out coordinates for text + # if drawing relative to base: x and y are same as for arrow + # dx and dy are one arrow width left and up + # need to rotate based on direction of arrow, use x_scale and y_scale + # as sin x and cos x? + sx, cx = y_scale, x_scale + + where = label_positions[pair] + if where == 'left': + orig_position = 3 * np.array([[max_arrow_width, max_arrow_width]]) + elif where == 'absolute': + orig_position = np.array([[max_arrow_length / 2.0, + 3 * max_arrow_width]]) + elif where == 'right': + orig_position = np.array([[length - 3 * max_arrow_width, + 3 * max_arrow_width]]) + elif where == 'center': + orig_position = np.array([[length / 2.0, 3 * max_arrow_width]]) + else: + raise ValueError("Got unknown position parameter %s" % where) + + M = np.array([[cx, sx], [-sx, cx]]) + coords = np.dot(orig_position, M) + [[x_pos, y_pos]] + x, y = np.ravel(coords) + orig_label = rate_labels[pair] + label = r'$%s_{_{\mathrm{%s}}}$' % (orig_label[0], orig_label[1:]) + + plt.text(x, y, label, size=label_text_size, ha='center', va='center', + color=labelcolor or fc) + + for p in sorted(positions): + draw_arrow(p) + + +# test data +all_on_max = dict([(i, 1) for i in 'TCAG'] + + [(i + j, 0.6) for i in 'TCAG' for j in 'TCAG']) + +realistic_data = { + 'A': 0.4, + 'T': 0.3, + 'G': 0.5, + 'C': 0.2, + 'AT': 0.4, + 'AC': 0.3, + 'AG': 0.2, + 'TA': 0.2, + 'TC': 0.3, + 'TG': 0.4, + 'CT': 0.2, + 'CG': 0.3, + 'CA': 0.2, + 'GA': 0.1, + 'GT': 0.4, + 'GC': 0.1} + +extreme_data = { + 'A': 0.75, + 'T': 0.10, + 'G': 0.10, + 'C': 0.05, + 'AT': 0.6, + 'AC': 0.3, + 'AG': 0.1, + 'TA': 0.02, + 'TC': 0.3, + 'TG': 0.01, + 'CT': 0.2, + 'CG': 0.5, + 'CA': 0.2, + 'GA': 0.1, + 'GT': 0.4, + 'GC': 0.2} + +sample_data = { + 'A': 0.2137, + 'T': 0.3541, + 'G': 0.1946, + 'C': 0.2376, + 'AT': 0.0228, + 'AC': 0.0684, + 'AG': 0.2056, + 'TA': 0.0315, + 'TC': 0.0629, + 'TG': 0.0315, + 'CT': 0.1355, + 'CG': 0.0401, + 'CA': 0.0703, + 'GA': 0.1824, + 'GT': 0.0387, + 'GC': 0.1106} + + +if __name__ == '__main__': + from sys import argv + d = None + if len(argv) > 1: + if argv[1] == 'full': + d = all_on_max + scaled = False + elif argv[1] == 'extreme': + d = extreme_data + scaled = False + elif argv[1] == 'realistic': + d = realistic_data + scaled = False + elif argv[1] == 'sample': + d = sample_data + scaled = True + if d is None: + d = all_on_max + scaled = False + if len(argv) > 2: + display = argv[2] + else: + display = 'length' + + size = 4 + plt.figure(figsize=(size, size)) + + make_arrow_plot(d, display=display, linewidth=0.001, edgecolor=None, + normalize_data=scaled, head_starts_at_zero=True, size=size) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: arrow_demo.py](https://matplotlib.org/_downloads/arrow_demo.py) +- [下载Jupyter notebook: arrow_demo.ipynb](https://matplotlib.org/_downloads/arrow_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/arrow_simple_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/arrow_simple_demo.md new file mode 100644 index 00000000..6c5aa0f3 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/arrow_simple_demo.md @@ -0,0 +1,16 @@ +# 箭头符号简单演示 + +![箭头符号简单演示](https://matplotlib.org/_images/sphx_glr_arrow_simple_demo_001.png) + +```python +import matplotlib.pyplot as plt + +ax = plt.axes() +ax.arrow(0, 0, 0.5, 0.5, head_width=0.05, head_length=0.1, fc='k', ec='k') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: arrow_simple_demo.py](https://matplotlib.org/_downloads/arrow_simple_demo.py) +- [下载Jupyter notebook: arrow_simple_demo.ipynb](https://matplotlib.org/_downloads/arrow_simple_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/autowrap.md b/Python/matplotlab/gallery/text_labels_and_annotations/autowrap.md new file mode 100644 index 00000000..53f0c369 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/autowrap.md @@ -0,0 +1,29 @@ +# 文本自动换行 + +Matplotlib can wrap text automatically, but if it's too long, the text will be displayed slightly outside of the boundaries of the axis anyways. + +![文本自动换行示例](https://matplotlib.org/_images/sphx_glr_autowrap_001.png) + +```python +import matplotlib.pyplot as plt + +fig = plt.figure() +plt.axis([0, 10, 0, 10]) +t = ("This is a really long string that I'd rather have wrapped so that it " + "doesn't go outside of the figure, but if it's long enough it will go " + "off the top or bottom!") +plt.text(4, 1, t, ha='left', rotation=15, wrap=True) +plt.text(6, 5, t, ha='left', rotation=15, wrap=True) +plt.text(5, 5, t, ha='right', rotation=-15, wrap=True) +plt.text(5, 10, t, fontsize=18, style='oblique', ha='center', + va='top', wrap=True) +plt.text(3, 4, t, family='serif', style='italic', ha='right', wrap=True) +plt.text(-1, 0, t, ha='left', rotation=-15, wrap=True) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: autowrap.py](https://matplotlib.org/_downloads/autowrap.py) +- [下载Jupyter notebook: autowrap.ipynb](https://matplotlib.org/_downloads/autowrap.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/custom_legends.md b/Python/matplotlab/gallery/text_labels_and_annotations/custom_legends.md new file mode 100644 index 00000000..81ca4cc9 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/custom_legends.md @@ -0,0 +1,75 @@ +# 撰写自定义图例 + +Composing custom legends piece-by-piece. + +**注意**: + +For more information on creating and customizing legends, see the following pages: + +- [Legend guide](https://matplotlib.org/tutorials/intermediate/legend_guide.html) +- [Legend Demo](https://matplotlib.org/tutorials/intermediate/legend_guide.html) + +有时您不希望与已绘制的数据明确关联的图例。例如,假设您已绘制了10行,但不希望每个行都显示图例项。如果您只是绘制线条并调用ax.legend(),您将获得以下内容: + +```python +# sphinx_gallery_thumbnail_number = 2 +from matplotlib import rcParams, cycler +import matplotlib.pyplot as plt +import numpy as np + +# Fixing random state for reproducibility +np.random.seed(19680801) + +N = 10 +data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)] +data = np.array(data).T +cmap = plt.cm.coolwarm +rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N))) + +fig, ax = plt.subplots() +lines = ax.plot(data) +ax.legend(lines) +``` + +![撰写自定义图例](https://matplotlib.org/_images/sphx_glr_custom_legends_001.png) + +请注意,每行创建一个图例项。在这种情况下,我们可以使用未明确绑定到绘制数据的Matplotlib对象组成图例。例如: + +```python +from matplotlib.lines import Line2D +custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4), + Line2D([0], [0], color=cmap(.5), lw=4), + Line2D([0], [0], color=cmap(1.), lw=4)] + +fig, ax = plt.subplots() +lines = ax.plot(data) +ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']) +``` + +![撰写自定义图例2](https://matplotlib.org/_images/sphx_glr_custom_legends_002.png) + +还有许多其他Matplotlib对象可以这种方式使用。 在下面的代码中,我们列出了一些常见的代码。 + +```python +from matplotlib.patches import Patch +from matplotlib.lines import Line2D + +legend_elements = [Line2D([0], [0], color='b', lw=4, label='Line'), + Line2D([0], [0], marker='o', color='w', label='Scatter', + markerfacecolor='g', markersize=15), + Patch(facecolor='orange', edgecolor='r', + label='Color Patch')] + +# Create the figure +fig, ax = plt.subplots() +ax.legend(handles=legend_elements, loc='center') + +plt.show() +``` + +![撰写自定义图例3](https://matplotlib.org/_images/sphx_glr_custom_legends_003.png) + +## 下载这个示例 + +- [下载python源码: custom_legends.py](https://matplotlib.org/_downloads/custom_legends.py) +- [下载Jupyter notebook: custom_legends.ipynb](https://matplotlib.org/_downloads/custom_legends.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/dashpointlabel.md b/Python/matplotlab/gallery/text_labels_and_annotations/dashpointlabel.md new file mode 100644 index 00000000..ce702c57 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/dashpointlabel.md @@ -0,0 +1,45 @@ +# Dashpoint标签 + +![Dashpoint标签示例](https://matplotlib.org/_images/sphx_glr_dashpointlabel_001.png) + +```python +import matplotlib.pyplot as plt + +DATA = ((1, 3), + (2, 4), + (3, 1), + (4, 2)) +# dash_style = +# direction, length, (text)rotation, dashrotation, push +# (The parameters are varied to show their effects, not for visual appeal). +dash_style = ( + (0, 20, -15, 30, 10), + (1, 30, 0, 15, 10), + (0, 40, 15, 15, 10), + (1, 20, 30, 60, 10)) + +fig, ax = plt.subplots() + +(x, y) = zip(*DATA) +ax.plot(x, y, marker='o') +for i in range(len(DATA)): + (x, y) = DATA[i] + (dd, dl, r, dr, dp) = dash_style[i] + t = ax.text(x, y, str((x, y)), withdash=True, + dashdirection=dd, + dashlength=dl, + rotation=r, + dashrotation=dr, + dashpush=dp, + ) + +ax.set_xlim((0, 5)) +ax.set_ylim((0, 5)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: dashpointlabel.py](https://matplotlib.org/_downloads/dashpointlabel.py) +- [下载Jupyter notebook: dashpointlabel.ipynb](https://matplotlib.org/_downloads/dashpointlabel.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/date.md b/Python/matplotlab/gallery/text_labels_and_annotations/date.md new file mode 100644 index 00000000..f77db452 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/date.md @@ -0,0 +1,56 @@ +# 日期刻度标签 + +演示如何使用日期刻度定位器和格式化程序在matplotlib中创建日期图。有关控制主要和次要刻度的更多信息,请参阅major_minor_demo1.py + +所有matplotlib日期绘图都是通过将日期实例转换为自 0001-01-01 00:00:00 UTC 加上一天后的天数(由于历史原因)来完成的。 转换,刻度定位和格式化是在幕后完成的,因此这对您来说是最透明的。 日期模块提供了几个转换器函数 [matplotlib.dates.date2num](https://matplotlib.org/api/dates_api.html#matplotlib.dates.date2num) 和[matplotlib.dates.num2date](https://matplotlib.org/api/dates_api.html#matplotlib.dates.num2date)。这些可以在[datetime.datetime](https://docs.python.org/3/library/datetime.html#datetime.datetime) 对象和 ``numpy.datetime64`` 对象之间进行转换。 + +![日期刻度标签示例](https://matplotlib.org/_images/sphx_glr_date_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.dates as mdates +import matplotlib.cbook as cbook + +years = mdates.YearLocator() # every year +months = mdates.MonthLocator() # every month +yearsFmt = mdates.DateFormatter('%Y') + +# Load a numpy record array from yahoo csv data with fields date, open, close, +# volume, adj_close from the mpl-data/example directory. The record array +# stores the date as an np.datetime64 with a day unit ('D') in the date column. +with cbook.get_sample_data('goog.npz') as datafile: + r = np.load(datafile)['price_data'].view(np.recarray) + +fig, ax = plt.subplots() +ax.plot(r.date, r.adj_close) + +# format the ticks +ax.xaxis.set_major_locator(years) +ax.xaxis.set_major_formatter(yearsFmt) +ax.xaxis.set_minor_locator(months) + +# round to nearest years... +datemin = np.datetime64(r.date[0], 'Y') +datemax = np.datetime64(r.date[-1], 'Y') + np.timedelta64(1, 'Y') +ax.set_xlim(datemin, datemax) + + +# format the coords message box +def price(x): + return '$%1.2f' % x +ax.format_xdata = mdates.DateFormatter('%Y-%m-%d') +ax.format_ydata = price +ax.grid(True) + +# rotates and right aligns the x labels, and moves the bottom of the +# axes up to make room for them +fig.autofmt_xdate() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: date.py](https://matplotlib.org/_downloads/date.py) +- [下载Jupyter notebook: date.ipynb](https://matplotlib.org/_downloads/date.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/date_index_formatter.md b/Python/matplotlab/gallery/text_labels_and_annotations/date_index_formatter.md new file mode 100644 index 00000000..e4cafabf --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/date_index_formatter.md @@ -0,0 +1,51 @@ +# 时间序列的自定义刻度格式化程序 + +当绘制时间序列(例如,金融时间序列)时,人们经常想要省去没有数据的日子,即周末。下面的示例显示了如何使用“索引格式化程序”来实现所需的绘图。 + +![时间序列的自定义刻度格式化程序示例](https://matplotlib.org/_images/sphx_glr_date_index_formatter_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook +import matplotlib.ticker as ticker + +# Load a numpy record array from yahoo csv data with fields date, open, close, +# volume, adj_close from the mpl-data/example directory. The record array +# stores the date as an np.datetime64 with a day unit ('D') in the date column. +with cbook.get_sample_data('goog.npz') as datafile: + r = np.load(datafile)['price_data'].view(np.recarray) +r = r[-30:] # get the last 30 days +# Matplotlib works better with datetime.datetime than np.datetime64, but the +# latter is more portable. +date = r.date.astype('O') + +# first we'll do it the default way, with gaps on weekends +fig, axes = plt.subplots(ncols=2, figsize=(8, 4)) +ax = axes[0] +ax.plot(date, r.adj_close, 'o-') +ax.set_title("Default") +fig.autofmt_xdate() + +# next we'll write a custom formatter +N = len(r) +ind = np.arange(N) # the evenly spaced plot indices + + +def format_date(x, pos=None): + thisind = np.clip(int(x + 0.5), 0, N - 1) + return date[thisind].strftime('%Y-%m-%d') + +ax = axes[1] +ax.plot(ind, r.adj_close, 'o-') +ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date)) +ax.set_title("Custom tick formatter") +fig.autofmt_xdate() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: date_index_formatter.py](https://matplotlib.org/_downloads/date_index_formatter.py) +- [下载Jupyter notebook: date_index_formatter.ipynb](https://matplotlib.org/_downloads/date_index_formatter.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/demo_annotation_box.md b/Python/matplotlab/gallery/text_labels_and_annotations/demo_annotation_box.md new file mode 100644 index 00000000..f9cb0e21 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/demo_annotation_box.md @@ -0,0 +1,103 @@ +# 图中插入注释框 + +![图中插入注释框示例](https://matplotlib.org/_images/sphx_glr_demo_annotation_box_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +from matplotlib.patches import Circle +from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage, + AnnotationBbox) +from matplotlib.cbook import get_sample_data + + +if 1: + fig, ax = plt.subplots() + + # Define a 1st position to annotate (display it with a marker) + xy = (0.5, 0.7) + ax.plot(xy[0], xy[1], ".r") + + # Annotate the 1st position with a text box ('Test 1') + offsetbox = TextArea("Test 1", minimumdescent=False) + + ab = AnnotationBbox(offsetbox, xy, + xybox=(-20, 40), + xycoords='data', + boxcoords="offset points", + arrowprops=dict(arrowstyle="->")) + ax.add_artist(ab) + + # Annotate the 1st position with another text box ('Test') + offsetbox = TextArea("Test", minimumdescent=False) + + ab = AnnotationBbox(offsetbox, xy, + xybox=(1.02, xy[1]), + xycoords='data', + boxcoords=("axes fraction", "data"), + box_alignment=(0., 0.5), + arrowprops=dict(arrowstyle="->")) + ax.add_artist(ab) + + # Define a 2nd position to annotate (don't display with a marker this time) + xy = [0.3, 0.55] + + # Annotate the 2nd position with a circle patch + da = DrawingArea(20, 20, 0, 0) + p = Circle((10, 10), 10) + da.add_artist(p) + + ab = AnnotationBbox(da, xy, + xybox=(1.02, xy[1]), + xycoords='data', + boxcoords=("axes fraction", "data"), + box_alignment=(0., 0.5), + arrowprops=dict(arrowstyle="->")) + + ax.add_artist(ab) + + # Annotate the 2nd position with an image (a generated array of pixels) + arr = np.arange(100).reshape((10, 10)) + im = OffsetImage(arr, zoom=2) + im.image.axes = ax + + ab = AnnotationBbox(im, xy, + xybox=(-50., 50.), + xycoords='data', + boxcoords="offset points", + pad=0.3, + arrowprops=dict(arrowstyle="->")) + + ax.add_artist(ab) + + # Annotate the 2nd position with another image (a Grace Hopper portrait) + fn = get_sample_data("grace_hopper.png", asfileobj=False) + arr_img = plt.imread(fn, format='png') + + imagebox = OffsetImage(arr_img, zoom=0.2) + imagebox.image.axes = ax + + ab = AnnotationBbox(imagebox, xy, + xybox=(120., -80.), + xycoords='data', + boxcoords="offset points", + pad=0.5, + arrowprops=dict( + arrowstyle="->", + connectionstyle="angle,angleA=0,angleB=90,rad=3") + ) + + ax.add_artist(ab) + + # Fix the display limits to see everything + ax.set_xlim(0, 1) + ax.set_ylim(0, 1) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_annotation_box.py](https://matplotlib.org/_downloads/demo_annotation_box.py) +- [下载Jupyter notebook: demo_annotation_box.ipynb](https://matplotlib.org/_downloads/demo_annotation_box.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/demo_text_path.md b/Python/matplotlab/gallery/text_labels_and_annotations/demo_text_path.md new file mode 100644 index 00000000..9f8b1f37 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/demo_text_path.md @@ -0,0 +1,162 @@ +# 文本路径演示 + +使用文本作为路径。允许这种转换的工具是 [TextPath](https://matplotlib.org/api/textpath_api.html#matplotlib.textpath.TextPath)。可以采用所得的路径,例如,作为图像的剪辑路径。 + +![文本路径演示](https://matplotlib.org/_images/sphx_glr_demo_text_path_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.image import BboxImage +import numpy as np +from matplotlib.transforms import IdentityTransform + +import matplotlib.patches as mpatches + +from matplotlib.offsetbox import AnnotationBbox,\ + AnchoredOffsetbox, AuxTransformBox + +from matplotlib.cbook import get_sample_data + +from matplotlib.text import TextPath + + +class PathClippedImagePatch(mpatches.PathPatch): + """ + The given image is used to draw the face of the patch. Internally, + it uses BboxImage whose clippath set to the path of the patch. + + FIXME : The result is currently dpi dependent. + """ + + def __init__(self, path, bbox_image, **kwargs): + mpatches.PathPatch.__init__(self, path, **kwargs) + self._init_bbox_image(bbox_image) + + def set_facecolor(self, color): + """simply ignore facecolor""" + mpatches.PathPatch.set_facecolor(self, "none") + + def _init_bbox_image(self, im): + + bbox_image = BboxImage(self.get_window_extent, + norm=None, + origin=None, + ) + bbox_image.set_transform(IdentityTransform()) + + bbox_image.set_data(im) + self.bbox_image = bbox_image + + def draw(self, renderer=None): + + # the clip path must be updated every draw. any solution? -JJ + self.bbox_image.set_clip_path(self._path, self.get_transform()) + self.bbox_image.draw(renderer) + + mpatches.PathPatch.draw(self, renderer) + + +if 1: + + usetex = plt.rcParams["text.usetex"] + + fig = plt.figure(1) + + # EXAMPLE 1 + + ax = plt.subplot(211) + + arr = plt.imread(get_sample_data("grace_hopper.png")) + + text_path = TextPath((0, 0), "!?", size=150) + p = PathClippedImagePatch(text_path, arr, ec="k", + transform=IdentityTransform()) + + # p.set_clip_on(False) + + # make offset box + offsetbox = AuxTransformBox(IdentityTransform()) + offsetbox.add_artist(p) + + # make anchored offset box + ao = AnchoredOffsetbox(loc='upper left', child=offsetbox, frameon=True, + borderpad=0.2) + ax.add_artist(ao) + + # another text + from matplotlib.patches import PathPatch + if usetex: + r = r"\mbox{textpath supports mathtext \& \TeX}" + else: + r = r"textpath supports mathtext & TeX" + + text_path = TextPath((0, 0), r, + size=20, usetex=usetex) + + p1 = PathPatch(text_path, ec="w", lw=3, fc="w", alpha=0.9, + transform=IdentityTransform()) + p2 = PathPatch(text_path, ec="none", fc="k", + transform=IdentityTransform()) + + offsetbox2 = AuxTransformBox(IdentityTransform()) + offsetbox2.add_artist(p1) + offsetbox2.add_artist(p2) + + ab = AnnotationBbox(offsetbox2, (0.95, 0.05), + xycoords='axes fraction', + boxcoords="offset points", + box_alignment=(1., 0.), + frameon=False + ) + ax.add_artist(ab) + + ax.imshow([[0, 1, 2], [1, 2, 3]], cmap=plt.cm.gist_gray_r, + interpolation="bilinear", + aspect="auto") + + # EXAMPLE 2 + + ax = plt.subplot(212) + + arr = np.arange(256).reshape(1, 256)/256. + + if usetex: + s = (r"$\displaystyle\left[\sum_{n=1}^\infty" + r"\frac{-e^{i\pi}}{2^n}\right]$!") + else: + s = r"$\left[\sum_{n=1}^\infty\frac{-e^{i\pi}}{2^n}\right]$!" + text_path = TextPath((0, 0), s, size=40, usetex=usetex) + text_patch = PathClippedImagePatch(text_path, arr, ec="none", + transform=IdentityTransform()) + + shadow1 = mpatches.Shadow(text_patch, 1, -1, + props=dict(fc="none", ec="0.6", lw=3)) + shadow2 = mpatches.Shadow(text_patch, 1, -1, + props=dict(fc="0.3", ec="none")) + + # make offset box + offsetbox = AuxTransformBox(IdentityTransform()) + offsetbox.add_artist(shadow1) + offsetbox.add_artist(shadow2) + offsetbox.add_artist(text_patch) + + # place the anchored offset box using AnnotationBbox + ab = AnnotationBbox(offsetbox, (0.5, 0.5), + xycoords='data', + boxcoords="offset points", + box_alignment=(0.5, 0.5), + ) + # text_path.set_size(10) + + ax.add_artist(ab) + + ax.set_xlim(0, 1) + ax.set_ylim(0, 1) + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_text_path.py](https://matplotlib.org/_downloads/demo_text_path.py) +- [下载Jupyter notebook: demo_text_path.ipynb](https://matplotlib.org/_downloads/demo_text_path.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/demo_text_rotation_mode.md b/Python/matplotlab/gallery/text_labels_and_annotations/demo_text_rotation_mode.md new file mode 100644 index 00000000..5a24d564 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/demo_text_rotation_mode.md @@ -0,0 +1,57 @@ +# 演示文本旋转模式 + +![演示文本旋转模式示例](https://matplotlib.org/_images/sphx_glr_demo_text_rotation_mode_001.png) + +```python +from mpl_toolkits.axes_grid1.axes_grid import ImageGrid + + +def test_rotation_mode(fig, mode, subplot_location): + ha_list = "left center right".split() + va_list = "top center baseline bottom".split() + grid = ImageGrid(fig, subplot_location, + nrows_ncols=(len(va_list), len(ha_list)), + share_all=True, aspect=True, + # label_mode='1', + cbar_mode=None) + + for ha, ax in zip(ha_list, grid.axes_row[-1]): + ax.axis["bottom"].label.set_text(ha) + + grid.axes_row[0][1].set_title(mode, size="large") + + for va, ax in zip(va_list, grid.axes_column[0]): + ax.axis["left"].label.set_text(va) + + i = 0 + for va in va_list: + for ha in ha_list: + ax = grid[i] + for axis in ax.axis.values(): + axis.toggle(ticks=False, ticklabels=False) + + ax.text(0.5, 0.5, "Tpg", + size="large", rotation=40, + bbox=dict(boxstyle="square,pad=0.", + ec="none", fc="0.5", alpha=0.5), + ha=ha, va=va, + rotation_mode=mode) + ax.axvline(0.5) + ax.axhline(0.5) + i += 1 + + +if 1: + import matplotlib.pyplot as plt + fig = plt.figure(1, figsize=(5.5, 4)) + fig.clf() + + test_rotation_mode(fig, "default", 121) + test_rotation_mode(fig, "anchor", 122) + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_text_rotation_mode.py](https://matplotlib.org/_downloads/demo_text_rotation_mode.py) +- [下载Jupyter notebook: demo_text_rotation_mode.ipynb](https://matplotlib.org/_downloads/demo_text_rotation_mode.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/dfrac_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/dfrac_demo.md new file mode 100644 index 00000000..202d9112 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/dfrac_demo.md @@ -0,0 +1,25 @@ +# \dfrac 和 \frac之间的区别 + +在此示例中,说明了 \dfrac和\frac TeX宏之间的差异; 特别是,使用Mathtex时显示样式和文本样式分数之间的差异。 + +*New in version 2.1*. + +**注意**:要将 \dfrac与LaTeX引擎一起使用(text.usetex:True),您需要使用text.latex.preamble rc导入amsmath包,这是一个不受支持的功能; 因此,在 \frac宏之前使用 \displaystyle选项来获取LaTeX引擎的这种行为可能是个更好的主意。 + +![dfrac喝frac公式](https://matplotlib.org/_images/sphx_glr_dfrac_demo_001.png) + +```python +import matplotlib.pyplot as plt + +fig = plt.figure(figsize=(5.25, 0.75)) +fig.text(0.5, 0.3, r'\dfrac: $\dfrac{a}{b}$', + horizontalalignment='center', verticalalignment='center') +fig.text(0.5, 0.7, r'\frac: $\frac{a}{b}$', + horizontalalignment='center', verticalalignment='center') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: dfrac_demo.py](https://matplotlib.org/_downloads/dfrac_demo.py) +- [下载Jupyter notebook: dfrac_demo.ipynb](https://matplotlib.org/_downloads/dfrac_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/engineering_formatter.md b/Python/matplotlab/gallery/text_labels_and_annotations/engineering_formatter.md new file mode 100644 index 00000000..69211394 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/engineering_formatter.md @@ -0,0 +1,49 @@ +# 使用工程符号标记刻度线 + +使用工程格式化程序。 + +![工程格式化示例](https://matplotlib.org/_images/sphx_glr_engineering_formatter_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +from matplotlib.ticker import EngFormatter + +# Fixing random state for reproducibility +prng = np.random.RandomState(19680801) + +# Create artificial data to plot. +# The x data span over several decades to demonstrate several SI prefixes. +xs = np.logspace(1, 9, 100) +ys = (0.8 + 0.4 * prng.uniform(size=100)) * np.log10(xs)**2 + +# Figure width is doubled (2*6.4) to display nicely 2 subplots side by side. +fig, (ax0, ax1) = plt.subplots(nrows=2, figsize=(7, 9.6)) +for ax in (ax0, ax1): + ax.set_xscale('log') + +# Demo of the default settings, with a user-defined unit label. +ax0.set_title('Full unit ticklabels, w/ default precision & space separator') +formatter0 = EngFormatter(unit='Hz') +ax0.xaxis.set_major_formatter(formatter0) +ax0.plot(xs, ys) +ax0.set_xlabel('Frequency') + +# Demo of the options `places` (number of digit after decimal point) and +# `sep` (separator between the number and the prefix/unit). +ax1.set_title('SI-prefix only ticklabels, 1-digit precision & ' + 'thin space separator') +formatter1 = EngFormatter(places=1, sep="\N{THIN SPACE}") # U+2009 +ax1.xaxis.set_major_formatter(formatter1) +ax1.plot(xs, ys) +ax1.set_xlabel('Frequency [Hz]') + +plt.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: engineering_formatter.py](https://matplotlib.org/_downloads/engineering_formatter.py) +- [下载Jupyter notebook: engineering_formatter.ipynb](https://matplotlib.org/_downloads/engineering_formatter.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/fancyarrow_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/fancyarrow_demo.md new file mode 100644 index 00000000..3624ae58 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/fancyarrow_demo.md @@ -0,0 +1,59 @@ +# 花式箭头符号演示 + +![花式箭头符号演示](https://matplotlib.org/_images/sphx_glr_fancyarrow_demo_001.png) + +```python +import matplotlib.patches as mpatches +import matplotlib.pyplot as plt + +styles = mpatches.ArrowStyle.get_styles() + +ncol = 2 +nrow = (len(styles) + 1) // ncol +figheight = (nrow + 0.5) +fig1 = plt.figure(1, (4 * ncol / 1.5, figheight / 1.5)) +fontsize = 0.2 * 70 + + +ax = fig1.add_axes([0, 0, 1, 1], frameon=False, aspect=1.) + +ax.set_xlim(0, 4 * ncol) +ax.set_ylim(0, figheight) + + +def to_texstring(s): + s = s.replace("<", r"$<$") + s = s.replace(">", r"$>$") + s = s.replace("|", r"$|$") + return s + + +for i, (stylename, styleclass) in enumerate(sorted(styles.items())): + x = 3.2 + (i // nrow) * 4 + y = (figheight - 0.7 - i % nrow) # /figheight + p = mpatches.Circle((x, y), 0.2) + ax.add_patch(p) + + ax.annotate(to_texstring(stylename), (x, y), + (x - 1.2, y), + ha="right", va="center", + size=fontsize, + arrowprops=dict(arrowstyle=stylename, + patchB=p, + shrinkA=5, + shrinkB=5, + fc="k", ec="k", + connectionstyle="arc3,rad=-0.05", + ), + bbox=dict(boxstyle="square", fc="w")) + +ax.xaxis.set_visible(False) +ax.yaxis.set_visible(False) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fancyarrow_demo.py](https://matplotlib.org/_downloads/fancyarrow_demo.py) +- [下载Jupyter notebook: fancyarrow_demo.ipynb](https://matplotlib.org/_downloads/fancyarrow_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/fancytextbox_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/fancytextbox_demo.md new file mode 100644 index 00000000..428945c6 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/fancytextbox_demo.md @@ -0,0 +1,30 @@ +# 花式文本框演示 + +![花式文本框演示](https://matplotlib.org/_images/sphx_glr_fancytextbox_demo_001.png) + +```python +import matplotlib.pyplot as plt + +plt.text(0.6, 0.5, "test", size=50, rotation=30., + ha="center", va="center", + bbox=dict(boxstyle="round", + ec=(1., 0.5, 0.5), + fc=(1., 0.8, 0.8), + ) + ) + +plt.text(0.5, 0.4, "test", size=50, rotation=-30., + ha="right", va="top", + bbox=dict(boxstyle="square", + ec=(1., 0.5, 0.5), + fc=(1., 0.8, 0.8), + ) + ) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fancytextbox_demo.py](https://matplotlib.org/_downloads/fancytextbox_demo.py) +- [下载Jupyter notebook: fancytextbox_demo.ipynb](https://matplotlib.org/_downloads/fancytextbox_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/figlegend_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/figlegend_demo.md new file mode 100644 index 00000000..792cb49f --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/figlegend_demo.md @@ -0,0 +1,32 @@ +# 图形图例演示 + +不是在每个轴上绘制图例,而是可以绘制图形的所有子轴上的所有艺术家的图例。 + +![图形图例演示](https://matplotlib.org/_images/sphx_glr_figlegend_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +fig, axs = plt.subplots(1, 2) + +x = np.arange(0.0, 2.0, 0.02) +y1 = np.sin(2 * np.pi * x) +y2 = np.exp(-x) +l1, l2 = axs[0].plot(x, y1, 'rs-', x, y2, 'go') + +y3 = np.sin(4 * np.pi * x) +y4 = np.exp(-2 * x) +l3, l4 = axs[1].plot(x, y3, 'yd-', x, y4, 'k^') + +fig.legend((l1, l2), ('Line 1', 'Line 2'), 'upper left') +fig.legend((l3, l4), ('Line 3', 'Line 4'), 'upper right') + +plt.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: figlegend_demo.py](https://matplotlib.org/_downloads/figlegend_demo.py) +- [下载Jupyter notebook: figlegend_demo.ipynb](https://matplotlib.org/_downloads/figlegend_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/font_family_rc_sgskip.md b/Python/matplotlab/gallery/text_labels_and_annotations/font_family_rc_sgskip.md new file mode 100644 index 00000000..d1f63f4b --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/font_family_rc_sgskip.md @@ -0,0 +1,34 @@ +# 配置字体系列 + +你可以明确地设置为给定字体样式拾取的字体系列(例如,‘serif’、‘sans-serif’或‘monSpace’)。 + +在下面的示例中,我们只允许一个字体系列(Tahoma)用于sans-serif字体样式。你是font.family rc param的默认系列,例如: + +```python +rcParams['font.family'] = 'sans-serif' +``` + +并为font.family设置一个字体样式列表,以尝试按顺序查找: + +```python +rcParams['font.sans-serif'] = ['Tahoma', 'DejaVu Sans', + 'Lucida Grande', 'Verdana'] +``` + +```python +from matplotlib import rcParams +rcParams['font.family'] = 'sans-serif' +rcParams['font.sans-serif'] = ['Tahoma'] +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() +ax.plot([1, 2, 3], label='test') + +ax.legend() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: font_family_rc_sgskip.py](https://matplotlib.org/_downloads/font_family_rc_sgskip.py) +- [下载Jupyter notebook: font_family_rc_sgskip.ipynb](https://matplotlib.org/_downloads/font_family_rc_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/font_file.md b/Python/matplotlab/gallery/text_labels_and_annotations/font_file.md new file mode 100644 index 00000000..70991a80 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/font_file.md @@ -0,0 +1,40 @@ +# 在Matplotlib中使用TTF字体文件 + +虽然为字体实例显式指向单个ttf文件通常不是一个好主意,但您可以使用 ``font_manager.FontProperties`` *fname* 参数执行此操作。 + +在这里,我们使用Matplotlib附带的计算机现代罗马字体(``cmr10``)。 + +有关更灵活的解决方案,请参见[配置字体系列](https://matplotlib.org/gallery/text_labels_and_annotations/font_family_rc_sgskip.html)和[字体演示(面向对象的样式)](https://matplotlib.org/gallery/text_labels_and_annotations/fonts_demo.html)。 + +```python +import os +from matplotlib import font_manager as fm, rcParams +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() + +fpath = os.path.join(rcParams["datapath"], "fonts/ttf/cmr10.ttf") +prop = fm.FontProperties(fname=fpath) +fname = os.path.split(fpath)[1] +ax.set_title('This is a special font: {}'.format(fname), fontproperties=prop) +ax.set_xlabel('This is the default font') + +plt.show() +``` + +![在Matplotlib中使用TTF字体文件](https://matplotlib.org/_images/sphx_glr_font_file_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.font_manager.FontProperties +matplotlib.axes.Axes.set_title +``` + +## 下载这个示例 + +- [下载python源码: font_file.py](https://matplotlib.org/_downloads/font_file.py) +- [下载Jupyter notebook: font_file.ipynb](https://matplotlib.org/_downloads/font_file.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/font_table_ttf_sgskip.md b/Python/matplotlab/gallery/text_labels_and_annotations/font_table_ttf_sgskip.md new file mode 100644 index 00000000..d3e4f3e8 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/font_table_ttf_sgskip.md @@ -0,0 +1,66 @@ +# TTF字体表 + +Matplotlib支持FreeType字体。下面是一个使用‘table’命令构建字体表的小示例,该表按字符代码显示字形。 + +用法python font_table_ttf.py somefile.ttf + +```python +import sys +import os + +import matplotlib +from matplotlib.ft2font import FT2Font +from matplotlib.font_manager import FontProperties +import matplotlib.pyplot as plt + +# the font table grid + +labelc = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', + 'A', 'B', 'C', 'D', 'E', 'F'] +labelr = ['00', '10', '20', '30', '40', '50', '60', '70', '80', '90', + 'A0', 'B0', 'C0', 'D0', 'E0', 'F0'] + +if len(sys.argv) > 1: + fontname = sys.argv[1] +else: + fontname = os.path.join(matplotlib.get_data_path(), + 'fonts', 'ttf', 'DejaVuSans.ttf') + +font = FT2Font(fontname) +codes = sorted(font.get_charmap().items()) + +# a 16,16 array of character strings +chars = [['' for c in range(16)] for r in range(16)] +colors = [[(0.95, 0.95, 0.95) for c in range(16)] for r in range(16)] + +plt.figure(figsize=(8, 4), dpi=120) +for ccode, glyphind in codes: + if ccode >= 256: + continue + r, c = divmod(ccode, 16) + s = chr(ccode) + chars[r][c] = s + +lightgrn = (0.5, 0.8, 0.5) +plt.title(fontname) +tab = plt.table(cellText=chars, + rowLabels=labelr, + colLabels=labelc, + rowColours=[lightgrn] * 16, + colColours=[lightgrn] * 16, + cellColours=colors, + cellLoc='center', + loc='upper left') + +for key, cell in tab.get_celld().items(): + row, col = key + if row > 0 and col > 0: + cell.set_text_props(fontproperties=FontProperties(fname=fontname)) +plt.axis('off') +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: font_table_ttf_sgskip.py](https://matplotlib.org/_downloads/font_table_ttf_sgskip.py) +- [下载Jupyter notebook: font_table_ttf_sgskip.ipynb](https://matplotlib.org/_downloads/font_table_ttf_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/fonts_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/fonts_demo.md new file mode 100644 index 00000000..9d50d7a2 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/fonts_demo.md @@ -0,0 +1,121 @@ +# 字体演示(面向对象的风格) + +使用setter设置字体属性。 + +请参见字体演示(Kwargs),以使用kwargs实现相同的效果。 + +![字体演示](https://matplotlib.org/_images/sphx_glr_fonts_demo_001.png) + +```python +from matplotlib.font_manager import FontProperties +import matplotlib.pyplot as plt + +plt.subplot(111, facecolor='w') + +font0 = FontProperties() +alignment = {'horizontalalignment': 'center', 'verticalalignment': 'baseline'} +# Show family options + +families = ['serif', 'sans-serif', 'cursive', 'fantasy', 'monospace'] + +font1 = font0.copy() +font1.set_size('large') + +t = plt.text(-0.8, 0.9, 'family', fontproperties=font1, + **alignment) + +yp = [0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2] + +for k, family in enumerate(families): + font = font0.copy() + font.set_family(family) + t = plt.text(-0.8, yp[k], family, fontproperties=font, + **alignment) + +# Show style options + +styles = ['normal', 'italic', 'oblique'] + +t = plt.text(-0.4, 0.9, 'style', fontproperties=font1, + **alignment) + +for k, style in enumerate(styles): + font = font0.copy() + font.set_family('sans-serif') + font.set_style(style) + t = plt.text(-0.4, yp[k], style, fontproperties=font, + **alignment) + +# Show variant options + +variants = ['normal', 'small-caps'] + +t = plt.text(0.0, 0.9, 'variant', fontproperties=font1, + **alignment) + +for k, variant in enumerate(variants): + font = font0.copy() + font.set_family('serif') + font.set_variant(variant) + t = plt.text(0.0, yp[k], variant, fontproperties=font, + **alignment) + +# Show weight options + +weights = ['light', 'normal', 'medium', 'semibold', 'bold', 'heavy', 'black'] + +t = plt.text(0.4, 0.9, 'weight', fontproperties=font1, + **alignment) + +for k, weight in enumerate(weights): + font = font0.copy() + font.set_weight(weight) + t = plt.text(0.4, yp[k], weight, fontproperties=font, + **alignment) + +# Show size options + +sizes = ['xx-small', 'x-small', 'small', 'medium', 'large', + 'x-large', 'xx-large'] + +t = plt.text(0.8, 0.9, 'size', fontproperties=font1, + **alignment) + +for k, size in enumerate(sizes): + font = font0.copy() + font.set_size(size) + t = plt.text(0.8, yp[k], size, fontproperties=font, + **alignment) + +# Show bold italic + +font = font0.copy() +font.set_style('italic') +font.set_weight('bold') +font.set_size('x-small') +t = plt.text(-0.4, 0.1, 'bold italic', fontproperties=font, + **alignment) + +font = font0.copy() +font.set_style('italic') +font.set_weight('bold') +font.set_size('medium') +t = plt.text(-0.4, 0.2, 'bold italic', fontproperties=font, + **alignment) + +font = font0.copy() +font.set_style('italic') +font.set_weight('bold') +font.set_size('x-large') +t = plt.text(-0.4, 0.3, 'bold italic', fontproperties=font, + **alignment) + +plt.axis([-1, 1, 0, 1]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fonts_demo.py](https://matplotlib.org/_downloads/fonts_demo.py) +- [下载Jupyter notebook: fonts_demo.ipynb](https://matplotlib.org/_downloads/fonts_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/fonts_demo_kw.md b/Python/matplotlab/gallery/text_labels_and_annotations/fonts_demo_kw.md new file mode 100644 index 00000000..60e46168 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/fonts_demo_kw.md @@ -0,0 +1,89 @@ +# 字体演示(kwargs) + +使用kwargs设置字体属性。 + +请参阅[字体演示(面向对象样式)](/gallery/text_labels_and_annotations/fonts_demo.html),以使用setter实现相同的效果。 + +![字体演示](https://matplotlib.org/_images/sphx_glr_fonts_demo_kw_001.png) + +```python +import matplotlib.pyplot as plt + +plt.subplot(111, facecolor='w') +alignment = {'horizontalalignment': 'center', 'verticalalignment': 'baseline'} + +# Show family options + +families = ['serif', 'sans-serif', 'cursive', 'fantasy', 'monospace'] + +t = plt.text(-0.8, 0.9, 'family', size='large', **alignment) + +yp = [0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2] + +for k, family in enumerate(families): + t = plt.text(-0.8, yp[k], family, family=family, **alignment) + +# Show style options + +styles = ['normal', 'italic', 'oblique'] + +t = plt.text(-0.4, 0.9, 'style', **alignment) + +for k, style in enumerate(styles): + t = plt.text(-0.4, yp[k], style, family='sans-serif', style=style, + **alignment) + +# Show variant options + +variants = ['normal', 'small-caps'] + +t = plt.text(0.0, 0.9, 'variant', **alignment) + +for k, variant in enumerate(variants): + t = plt.text(0.0, yp[k], variant, family='serif', variant=variant, + **alignment) + +# Show weight options + +weights = ['light', 'normal', 'medium', 'semibold', 'bold', 'heavy', 'black'] + +t = plt.text(0.4, 0.9, 'weight', **alignment) + +for k, weight in enumerate(weights): + t = plt.text(0.4, yp[k], weight, weight=weight, + **alignment) + +# Show size options + +sizes = ['xx-small', 'x-small', 'small', 'medium', 'large', + 'x-large', 'xx-large'] + +t = plt.text(0.8, 0.9, 'size', **alignment) + +for k, size in enumerate(sizes): + t = plt.text(0.8, yp[k], size, size=size, + **alignment) + +x = -0.4 +# Show bold italic +t = plt.text(x, 0.1, 'bold italic', style='italic', + weight='bold', size='x-small', + **alignment) + +t = plt.text(x, 0.2, 'bold italic', + style='italic', weight='bold', size='medium', + **alignment) + +t = plt.text(x, 0.3, 'bold italic', + style='italic', weight='bold', size='x-large', + **alignment) + +plt.axis([-1, 1, 0, 1]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: fonts_demo_kw.py](https://matplotlib.org/_downloads/fonts_demo_kw.py) +- [下载Jupyter notebook: fonts_demo_kw.ipynb](https://matplotlib.org/_downloads/fonts_demo_kw.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/legend.md b/Python/matplotlab/gallery/text_labels_and_annotations/legend.md new file mode 100644 index 00000000..7d79873c --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/legend.md @@ -0,0 +1,45 @@ +# 使用预定义标签的图例 + +使用图定义图例标签。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Make some fake data. +a = b = np.arange(0, 3, .02) +c = np.exp(a) +d = c[::-1] + +# Create plots with pre-defined labels. +fig, ax = plt.subplots() +ax.plot(a, c, 'k--', label='Model length') +ax.plot(a, d, 'k:', label='Data length') +ax.plot(a, c + d, 'k', label='Total message length') + +legend = ax.legend(loc='upper center', shadow=True, fontsize='x-large') + +# Put a nicer background color on the legend. +legend.get_frame().set_facecolor('C0') + +plt.show() +``` + +![预定义标签的示例](https://matplotlib.org/_images/sphx_glr_legend_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.axes.Axes.plot +matplotlib.pyplot.plot +matplotlib.axes.Axes.legend +matplotlib.pyplot.legend +``` + +## 下载这个示例 + +- [下载python源码: legend.py](https://matplotlib.org/_downloads/legend.py) +- [下载Jupyter notebook: legend.ipynb](https://matplotlib.org/_downloads/legend.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/legend_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/legend_demo.md new file mode 100644 index 00000000..d44cc7ec --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/legend_demo.md @@ -0,0 +1,194 @@ +# 图例(Legend)演示 + +在Matplotlib中绘制图例。 + +在Matplotlib中有很多方法可以创建和自定义图例。 下面我们将展示一些如何操作的示例。 + +首先,我们将展示如何为特定线条制作图例。 + +```python +import matplotlib.pyplot as plt +import matplotlib.collections as mcol +from matplotlib.legend_handler import HandlerLineCollection, HandlerTuple +from matplotlib.lines import Line2D +import numpy as np + +t1 = np.arange(0.0, 2.0, 0.1) +t2 = np.arange(0.0, 2.0, 0.01) + +fig, ax = plt.subplots() + +# note that plot returns a list of lines. The "l1, = plot" usage +# extracts the first element of the list into l1 using tuple +# unpacking. So l1 is a Line2D instance, not a sequence of lines +l1, = ax.plot(t2, np.exp(-t2)) +l2, l3 = ax.plot(t2, np.sin(2 * np.pi * t2), '--o', t1, np.log(1 + t1), '.') +l4, = ax.plot(t2, np.exp(-t2) * np.sin(2 * np.pi * t2), 's-.') + +ax.legend((l2, l4), ('oscillatory', 'damped'), loc='upper right', shadow=True) +ax.set_xlabel('time') +ax.set_ylabel('volts') +ax.set_title('Damped oscillation') +plt.show() +``` + +![图例示例](https://matplotlib.org/_images/sphx_glr_legend_demo_001.png) + +接下来,我们将演示绘制更复杂的标签。 + +```python +x = np.linspace(0, 1) + +fig, (ax0, ax1) = plt.subplots(2, 1) + +# Plot the lines y=x**n for n=1..4. +for n in range(1, 5): + ax0.plot(x, x**n, label="n={0}".format(n)) +leg = ax0.legend(loc="upper left", bbox_to_anchor=[0, 1], + ncol=2, shadow=True, title="Legend", fancybox=True) +leg.get_title().set_color("red") + +# Demonstrate some more complex labels. +ax1.plot(x, x**2, label="multi\nline") +half_pi = np.linspace(0, np.pi / 2) +ax1.plot(np.sin(half_pi), np.cos(half_pi), label=r"$\frac{1}{2}\pi$") +ax1.plot(x, 2**(x**2), label="$2^{x^2}$") +ax1.legend(shadow=True, fancybox=True) + +plt.show() +``` + +![图例示例2](https://matplotlib.org/_images/sphx_glr_legend_demo_002.png) + +在这里,我们将图例附加到更复杂的图上。 + +```python +fig, axes = plt.subplots(3, 1, constrained_layout=True) +top_ax, middle_ax, bottom_ax = axes + +top_ax.bar([0, 1, 2], [0.2, 0.3, 0.1], width=0.4, label="Bar 1", + align="center") +top_ax.bar([0.5, 1.5, 2.5], [0.3, 0.2, 0.2], color="red", width=0.4, + label="Bar 2", align="center") +top_ax.legend() + +middle_ax.errorbar([0, 1, 2], [2, 3, 1], xerr=0.4, fmt="s", label="test 1") +middle_ax.errorbar([0, 1, 2], [3, 2, 4], yerr=0.3, fmt="o", label="test 2") +middle_ax.errorbar([0, 1, 2], [1, 1, 3], xerr=0.4, yerr=0.3, fmt="^", + label="test 3") +middle_ax.legend() + +bottom_ax.stem([0.3, 1.5, 2.7], [1, 3.6, 2.7], label="stem test") +bottom_ax.legend() + +plt.show() +``` + +![图例示例3](https://matplotlib.org/_images/sphx_glr_legend_demo_003.png) + +现在我们将展示带有多个关键图例的图例条目。 + +```python +fig, (ax1, ax2) = plt.subplots(2, 1, constrained_layout=True) + +# First plot: two legend keys for a single entry +p1 = ax1.scatter([1], [5], c='r', marker='s', s=100) +p2 = ax1.scatter([3], [2], c='b', marker='o', s=100) +# `plot` returns a list, but we want the handle - thus the comma on the left +p3, = ax1.plot([1, 5], [4, 4], 'm-d') + +# Assign two of the handles to the same legend entry by putting them in a tuple +# and using a generic handler map (which would be used for any additional +# tuples of handles like (p1, p3)). +l = ax1.legend([(p1, p3), p2], ['two keys', 'one key'], scatterpoints=1, + numpoints=1, handler_map={tuple: HandlerTuple(ndivide=None)}) + +# Second plot: plot two bar charts on top of each other and change the padding +# between the legend keys +x_left = [1, 2, 3] +y_pos = [1, 3, 2] +y_neg = [2, 1, 4] + +rneg = ax2.bar(x_left, y_neg, width=0.5, color='w', hatch='///', label='-1') +rpos = ax2.bar(x_left, y_pos, width=0.5, color='k', label='+1') + +# Treat each legend entry differently by using specific `HandlerTuple`s +l = ax2.legend([(rpos, rneg), (rneg, rpos)], ['pad!=0', 'pad=0'], + handler_map={(rpos, rneg): HandlerTuple(ndivide=None), + (rneg, rpos): HandlerTuple(ndivide=None, pad=0.)}) +plt.show() +``` + +![图例示例4](https://matplotlib.org/_images/sphx_glr_legend_demo_004.png) + +最后,还可以编写定义如何对图例进行样式化的自定义对象。 + +```python +class HandlerDashedLines(HandlerLineCollection): + """ + Custom Handler for LineCollection instances. + """ + def create_artists(self, legend, orig_handle, + xdescent, ydescent, width, height, fontsize, trans): + # figure out how many lines there are + numlines = len(orig_handle.get_segments()) + xdata, xdata_marker = self.get_xdata(legend, xdescent, ydescent, + width, height, fontsize) + leglines = [] + # divide the vertical space where the lines will go + # into equal parts based on the number of lines + ydata = ((height) / (numlines + 1)) * np.ones(xdata.shape, float) + # for each line, create the line at the proper location + # and set the dash pattern + for i in range(numlines): + legline = Line2D(xdata, ydata * (numlines - i) - ydescent) + self.update_prop(legline, orig_handle, legend) + # set color, dash pattern, and linewidth to that + # of the lines in linecollection + try: + color = orig_handle.get_colors()[i] + except IndexError: + color = orig_handle.get_colors()[0] + try: + dashes = orig_handle.get_dashes()[i] + except IndexError: + dashes = orig_handle.get_dashes()[0] + try: + lw = orig_handle.get_linewidths()[i] + except IndexError: + lw = orig_handle.get_linewidths()[0] + if dashes[0] is not None: + legline.set_dashes(dashes[1]) + legline.set_color(color) + legline.set_transform(trans) + legline.set_linewidth(lw) + leglines.append(legline) + return leglines + +x = np.linspace(0, 5, 100) + +fig, ax = plt.subplots() +colors = plt.rcParams['axes.prop_cycle'].by_key()['color'][:5] +styles = ['solid', 'dashed', 'dashed', 'dashed', 'solid'] +lines = [] +for i, color, style in zip(range(5), colors, styles): + ax.plot(x, np.sin(x) - .1 * i, c=color, ls=style) + +# make proxy artists +# make list of one line -- doesn't matter what the coordinates are +line = [[(0, 0)]] +# set up the proxy artist +lc = mcol.LineCollection(5 * line, linestyles=styles, colors=colors) +# create the legend +ax.legend([lc], ['multi-line'], handler_map={type(lc): HandlerDashedLines()}, + handlelength=2.5, handleheight=3) + +plt.show() +``` + +![图例示例5](https://matplotlib.org/_images/sphx_glr_legend_demo_005.png) + +## 下载这个示例 + +- [下载python源码: legend_demo.py](https://matplotlib.org/_downloads/legend_demo.py) +- [下载Jupyter notebook: legend_demo.ipynb](https://matplotlib.org/_downloads/legend_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/line_with_text.md b/Python/matplotlab/gallery/text_labels_and_annotations/line_with_text.md new file mode 100644 index 00000000..e465196b --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/line_with_text.md @@ -0,0 +1,92 @@ +# 艺术家中的艺术家 + +重写基本方法,以便一个艺术家对象可以包含另一个艺术家对象。在这种情况下,该行包含一个文本实例来为其添加标签。 + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.lines as lines +import matplotlib.transforms as mtransforms +import matplotlib.text as mtext + + +class MyLine(lines.Line2D): + def __init__(self, *args, **kwargs): + # we'll update the position when the line data is set + self.text = mtext.Text(0, 0, '') + lines.Line2D.__init__(self, *args, **kwargs) + + # we can't access the label attr until *after* the line is + # initiated + self.text.set_text(self.get_label()) + + def set_figure(self, figure): + self.text.set_figure(figure) + lines.Line2D.set_figure(self, figure) + + def set_axes(self, axes): + self.text.set_axes(axes) + lines.Line2D.set_axes(self, axes) + + def set_transform(self, transform): + # 2 pixel offset + texttrans = transform + mtransforms.Affine2D().translate(2, 2) + self.text.set_transform(texttrans) + lines.Line2D.set_transform(self, transform) + + def set_data(self, x, y): + if len(x): + self.text.set_position((x[-1], y[-1])) + + lines.Line2D.set_data(self, x, y) + + def draw(self, renderer): + # draw my label at the end of the line with 2 pixel offset + lines.Line2D.draw(self, renderer) + self.text.draw(renderer) + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig, ax = plt.subplots() +x, y = np.random.rand(2, 20) +line = MyLine(x, y, mfc='red', ms=12, label='line label') +#line.text.set_text('line label') +line.text.set_color('red') +line.text.set_fontsize(16) + +ax.add_line(line) + +plt.show() +``` + +![艺术家中的艺术家](https://matplotlib.org/_images/sphx_glr_line_with_text_001.png) + +## 参考 + +此示例显示了以下函数、方法、类和模块的使用: + +```python +import matplotlib +matplotlib.lines +matplotlib.lines.Line2D +matplotlib.lines.Line2D.set_data +matplotlib.artist +matplotlib.artist.Artist +matplotlib.artist.Artist.draw +matplotlib.artist.Artist.set_transform +matplotlib.text +matplotlib.text.Text +matplotlib.text.Text.set_color +matplotlib.text.Text.set_fontsize +matplotlib.text.Text.set_position +matplotlib.axes.Axes.add_line +matplotlib.transforms +matplotlib.transforms.Affine2D +``` + +## 下载这个示例 + +- [下载python源码: line_with_text.py](https://matplotlib.org/_downloads/line_with_text.py) +- [下载Jupyter notebook: line_with_text.ipynb](https://matplotlib.org/_downloads/line_with_text.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_asarray.md b/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_asarray.md new file mode 100644 index 00000000..26bb08dd --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_asarray.md @@ -0,0 +1,46 @@ +# 数学文本图像作为numpy数组 + +从LaTeX字符串制作图像。 + +```python +import matplotlib.mathtext as mathtext +import matplotlib.pyplot as plt +import matplotlib +matplotlib.rc('image', origin='upper') + +parser = mathtext.MathTextParser("Bitmap") +parser.to_png('test2.png', + r'$\left[\left\lfloor\frac{5}{\frac{\left(3\right)}{4}} ' + r'y\right)\right]$', color='green', fontsize=14, dpi=100) + +rgba1, depth1 = parser.to_rgba( + r'IQ: $\sigma_i=15$', color='blue', fontsize=20, dpi=200) +rgba2, depth2 = parser.to_rgba( + r'some other string', color='red', fontsize=20, dpi=200) + +fig = plt.figure() +fig.figimage(rgba1, 100, 100) +fig.figimage(rgba2, 100, 300) + +plt.show() +``` + +![数学文本图像作为numpy数组](https://matplotlib.org/_images/sphx_glr_mathtext_asarray_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.mathtext +matplotlib.mathtext.MathTextParser +matplotlib.mathtext.MathTextParser.to_png +matplotlib.mathtext.MathTextParser.to_rgba +matplotlib.figure.Figure.figimage +``` + +## 下载这个示例 + +- [下载python源码: mathtext_asarray.py](https://matplotlib.org/_downloads/mathtext_asarray.py) +- [下载Jupyter notebook: mathtext_asarray.ipynb](https://matplotlib.org/_downloads/mathtext_asarray.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_demo.md b/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_demo.md new file mode 100644 index 00000000..efe2f2d0 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_demo.md @@ -0,0 +1,30 @@ +# 数学文本演示 + +使用Matplotlib的内部LaTeX解析器和布局引擎。 有关真正的LaTeX渲染,请参阅text.usetex选项。 + +![数学文本演示](https://matplotlib.org/_images/sphx_glr_mathtext_demo_001.png) + +```python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() + +ax.plot([1, 2, 3], 'r', label=r'$\sqrt{x^2}$') +ax.legend() + +ax.set_xlabel(r'$\Delta_i^j$', fontsize=20) +ax.set_ylabel(r'$\Delta_{i+1}^j$', fontsize=20) +ax.set_title(r'$\Delta_i^j \hspace{0.4} \mathrm{versus} \hspace{0.4} ' + r'\Delta_{i+1}^j$', fontsize=20) + +tex = r'$\mathcal{R}\prod_{i=\alpha_{i+1}}^\infty a_i\sin(2 \pi f x_i)$' +ax.text(1, 1.6, tex, fontsize=20, va='bottom') + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: mathtext_demo.py](https://matplotlib.org/_downloads/mathtext_demo.py) +- [下载Jupyter notebook: mathtext_demo.ipynb](https://matplotlib.org/_downloads/mathtext_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_examples.md b/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_examples.md new file mode 100644 index 00000000..08a8988c --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/mathtext_examples.md @@ -0,0 +1,147 @@ +# 数学文本例子 + +Matplotlib的数学渲染引擎的选定功能。 + +![数学文本例子](https://matplotlib.org/_images/sphx_glr_mathtext_examples_001.png) + +Out: + +```python +0 $W^{3\beta}_{\delta_1 \rho_1 \sigma_2} = U^{3\beta}_{\delta_1 \rho_1} + \frac{1}{8 \pi 2} \int^{\alpha_2}_{\alpha_2} d \alpha^\prime_2 \left[\frac{ U^{2\beta}_{\delta_1 \rho_1} - \alpha^\prime_2U^{1\beta}_{\rho_1 \sigma_2} }{U^{0\beta}_{\rho_1 \sigma_2}}\right]$ +1 $\alpha_i > \beta_i,\ \alpha_{i+1}^j = {\rm sin}(2\pi f_j t_i) e^{-5 t_i/\tau},\ \ldots$ +2 $\frac{3}{4},\ \binom{3}{4},\ \stackrel{3}{4},\ \left(\frac{5 - \frac{1}{x}}{4}\right),\ \ldots$ +3 $\sqrt{2},\ \sqrt[3]{x},\ \ldots$ +4 $\mathrm{Roman}\ , \ \mathit{Italic}\ , \ \mathtt{Typewriter} \ \mathrm{or}\ \mathcal{CALLIGRAPHY}$ +5 $\acute a,\ \bar a,\ \breve a,\ \dot a,\ \ddot a, \ \grave a, \ \hat a,\ \tilde a,\ \vec a,\ \widehat{xyz},\ \widetilde{xyz},\ \ldots$ +6 $\alpha,\ \beta,\ \chi,\ \delta,\ \lambda,\ \mu,\ \Delta,\ \Gamma,\ \Omega,\ \Phi,\ \Pi,\ \Upsilon,\ \nabla,\ \aleph,\ \beth,\ \daleth,\ \gimel,\ \ldots$ +7 $\coprod,\ \int,\ \oint,\ \prod,\ \sum,\ \log,\ \sin,\ \approx,\ \oplus,\ \star,\ \varpropto,\ \infty,\ \partial,\ \Re,\ \leftrightsquigarrow, \ \ldots$ +``` + +```python +import matplotlib.pyplot as plt +import subprocess +import sys +import re + +# Selection of features following "Writing mathematical expressions" tutorial +mathtext_titles = { + 0: "Header demo", + 1: "Subscripts and superscripts", + 2: "Fractions, binomials and stacked numbers", + 3: "Radicals", + 4: "Fonts", + 5: "Accents", + 6: "Greek, Hebrew", + 7: "Delimiters, functions and Symbols"} +n_lines = len(mathtext_titles) + +# Randomly picked examples +mathext_demos = { + 0: r"$W^{3\beta}_{\delta_1 \rho_1 \sigma_2} = " + r"U^{3\beta}_{\delta_1 \rho_1} + \frac{1}{8 \pi 2} " + r"\int^{\alpha_2}_{\alpha_2} d \alpha^\prime_2 \left[\frac{ " + r"U^{2\beta}_{\delta_1 \rho_1} - \alpha^\prime_2U^{1\beta}_" + r"{\rho_1 \sigma_2} }{U^{0\beta}_{\rho_1 \sigma_2}}\right]$", + + 1: r"$\alpha_i > \beta_i,\ " + r"\alpha_{i+1}^j = {\rm sin}(2\pi f_j t_i) e^{-5 t_i/\tau},\ " + r"\ldots$", + + 2: r"$\frac{3}{4},\ \binom{3}{4},\ \stackrel{3}{4},\ " + r"\left(\frac{5 - \frac{1}{x}}{4}\right),\ \ldots$", + + 3: r"$\sqrt{2},\ \sqrt[3]{x},\ \ldots$", + + 4: r"$\mathrm{Roman}\ , \ \mathit{Italic}\ , \ \mathtt{Typewriter} \ " + r"\mathrm{or}\ \mathcal{CALLIGRAPHY}$", + + 5: r"$\acute a,\ \bar a,\ \breve a,\ \dot a,\ \ddot a, \ \grave a, \ " + r"\hat a,\ \tilde a,\ \vec a,\ \widehat{xyz},\ \widetilde{xyz},\ " + r"\ldots$", + + 6: r"$\alpha,\ \beta,\ \chi,\ \delta,\ \lambda,\ \mu,\ " + r"\Delta,\ \Gamma,\ \Omega,\ \Phi,\ \Pi,\ \Upsilon,\ \nabla,\ " + r"\aleph,\ \beth,\ \daleth,\ \gimel,\ \ldots$", + + 7: r"$\coprod,\ \int,\ \oint,\ \prod,\ \sum,\ " + r"\log,\ \sin,\ \approx,\ \oplus,\ \star,\ \varpropto,\ " + r"\infty,\ \partial,\ \Re,\ \leftrightsquigarrow, \ \ldots$"} + + +def doall(): + # Colors used in mpl online documentation. + mpl_blue_rvb = (191. / 255., 209. / 256., 212. / 255.) + mpl_orange_rvb = (202. / 255., 121. / 256., 0. / 255.) + mpl_grey_rvb = (51. / 255., 51. / 255., 51. / 255.) + + # Creating figure and axis. + plt.figure(figsize=(6, 7)) + plt.axes([0.01, 0.01, 0.98, 0.90], facecolor="white", frameon=True) + plt.gca().set_xlim(0., 1.) + plt.gca().set_ylim(0., 1.) + plt.gca().set_title("Matplotlib's math rendering engine", + color=mpl_grey_rvb, fontsize=14, weight='bold') + plt.gca().set_xticklabels("", visible=False) + plt.gca().set_yticklabels("", visible=False) + + # Gap between lines in axes coords + line_axesfrac = (1. / (n_lines)) + + # Plotting header demonstration formula + full_demo = mathext_demos[0] + plt.annotate(full_demo, + xy=(0.5, 1. - 0.59 * line_axesfrac), + xycoords='data', color=mpl_orange_rvb, ha='center', + fontsize=20) + + # Plotting features demonstration formulae + for i_line in range(1, n_lines): + baseline = 1 - (i_line) * line_axesfrac + baseline_next = baseline - line_axesfrac + title = mathtext_titles[i_line] + ":" + fill_color = ['white', mpl_blue_rvb][i_line % 2] + plt.fill_between([0., 1.], [baseline, baseline], + [baseline_next, baseline_next], + color=fill_color, alpha=0.5) + plt.annotate(title, + xy=(0.07, baseline - 0.3 * line_axesfrac), + xycoords='data', color=mpl_grey_rvb, weight='bold') + demo = mathext_demos[i_line] + plt.annotate(demo, + xy=(0.05, baseline - 0.75 * line_axesfrac), + xycoords='data', color=mpl_grey_rvb, + fontsize=16) + + for i in range(n_lines): + s = mathext_demos[i] + print(i, s) + plt.show() + + +if '--latex' in sys.argv: + # Run: python mathtext_examples.py --latex + # Need amsmath and amssymb packages. + fd = open("mathtext_examples.ltx", "w") + fd.write("\\documentclass{article}\n") + fd.write("\\usepackage{amsmath, amssymb}\n") + fd.write("\\begin{document}\n") + fd.write("\\begin{enumerate}\n") + + for i in range(n_lines): + s = mathext_demos[i] + s = re.sub(r"(?", connectionstyle="arc3")) +plt.text(0, 0.1, r'$\delta$', + {'color': 'k', 'fontsize': 24, 'ha': 'center', 'va': 'center', + 'bbox': dict(boxstyle="round", fc="w", ec="k", pad=0.2)}) + +# Use tex in labels +plt.xticks((-1, 0, 1), ('$-1$', r'$\pm 0$', '$+1$'), color='k', size=20) + +# Left Y-axis labels, combine math mode and text mode +plt.ylabel(r'\bf{phase field} $\phi$', {'color': 'C0', 'fontsize': 20}) +plt.yticks((0, 0.5, 1), (r'\bf{0}', r'\bf{.5}', r'\bf{1}'), color='k', size=20) + +# Right Y-axis labels +plt.text(1.02, 0.5, r"\bf{level set} $\phi$", {'color': 'C2', 'fontsize': 20}, + horizontalalignment='left', + verticalalignment='center', + rotation=90, + clip_on=False, + transform=plt.gca().transAxes) + +# Use multiline environment inside a `text`. +# level set equations +eq1 = r"\begin{eqnarray*}" + \ + r"|\nabla\phi| &=& 1,\\" + \ + r"\frac{\partial \phi}{\partial t} + U|\nabla \phi| &=& 0 " + \ + r"\end{eqnarray*}" +plt.text(1, 0.9, eq1, {'color': 'C2', 'fontsize': 18}, va="top", ha="right") + +# phase field equations +eq2 = r'\begin{eqnarray*}' + \ + r'\mathcal{F} &=& \int f\left( \phi, c \right) dV, \\ ' + \ + r'\frac{ \partial \phi } { \partial t } &=& -M_{ \phi } ' + \ + r'\frac{ \delta \mathcal{F} } { \delta \phi }' + \ + r'\end{eqnarray*}' +plt.text(0.18, 0.18, eq2, {'color': 'C0', 'fontsize': 16}) + +plt.text(-1, .30, r'gamma: $\gamma$', {'color': 'r', 'fontsize': 20}) +plt.text(-1, .18, r'Omega: $\Omega$', {'color': 'b', 'fontsize': 20}) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: usetex_demo.py](https://matplotlib.org/_downloads/usetex_demo.py) +- [下载Jupyter notebook: usetex_demo.ipynb](https://matplotlib.org/_downloads/usetex_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/usetex_fonteffects.md b/Python/matplotlab/gallery/text_labels_and_annotations/usetex_fonteffects.md new file mode 100644 index 00000000..42ab20e4 --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/usetex_fonteffects.md @@ -0,0 +1,37 @@ +# Usetex 字体效果 + +此脚本演示了pdf usetex现在支持pdftex.map中指定的字体效果。 + +![Usetex 字体效果示例](https://matplotlib.org/_images/sphx_glr_usetex_fonteffects_001.png) + +```python +import matplotlib +import matplotlib.pyplot as plt +matplotlib.rc('text', usetex=True) + + +def setfont(font): + return r'\font\a %s at 14pt\a ' % font + + +for y, font, text in zip(range(5), + ['ptmr8r', 'ptmri8r', 'ptmro8r', + 'ptmr8rn', 'ptmrr8re'], + ['Nimbus Roman No9 L ' + x for x in + ['', 'Italics (real italics for comparison)', + '(slanted)', '(condensed)', '(extended)']]): + plt.text(0, y, setfont(font) + text) + +plt.ylim(-1, 5) +plt.xlim(-0.2, 0.6) +plt.setp(plt.gca(), frame_on=False, xticks=(), yticks=()) +plt.title('Usetex font effects') +plt.savefig('usetex_fonteffects.pdf') +``` + +脚本的总运行时间:(0分1.262秒) + +## 下载这个示例 + +- [下载python源码: usetex_fonteffects.py](https://matplotlib.org/_downloads/usetex_fonteffects.py) +- [下载Jupyter notebook: usetex_fonteffects.ipynb](https://matplotlib.org/_downloads/usetex_fonteffects.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/text_labels_and_annotations/watermark_text.md b/Python/matplotlab/gallery/text_labels_and_annotations/watermark_text.md new file mode 100644 index 00000000..43c6e3fb --- /dev/null +++ b/Python/matplotlab/gallery/text_labels_and_annotations/watermark_text.md @@ -0,0 +1,38 @@ +# 文字水印 + +添加文字水印。 + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +fig, ax = plt.subplots() +ax.plot(np.random.rand(20), '-o', ms=20, lw=2, alpha=0.7, mfc='orange') +ax.grid() + +fig.text(0.95, 0.05, 'Property of MPL', + fontsize=50, color='gray', + ha='right', va='bottom', alpha=0.5) + +plt.show() +``` + +![文字水印示例](https://matplotlib.org/_images/sphx_glr_watermark_text_001.png) + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.figure.Figure.text +``` + +## 下载这个示例 + +- [下载python源码: watermark_text.py](https://matplotlib.org/_downloads/watermark_text.py) +- [下载Jupyter notebook: watermark_text.ipynb](https://matplotlib.org/_downloads/watermark_text.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/auto_ticks.md b/Python/matplotlab/gallery/ticks_and_spines/auto_ticks.md new file mode 100644 index 00000000..d0711675 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/auto_ticks.md @@ -0,0 +1,59 @@ +# 自动设置记号标签 + +设置刻度自动放置的行为。 + +如果您没有明确设置刻度位置/标签,Matplotlib将尝试根据显示的数据及其限制自动选择它们。 + +默认情况下,这会尝试选择沿轴分布的刻度位置: + +```python +import matplotlib.pyplot as plt +import numpy as np +np.random.seed(19680801) + +fig, ax = plt.subplots() +dots = np.arange(10) / 100. + .03 +x, y = np.meshgrid(dots, dots) +data = [x.ravel(), y.ravel()] +ax.scatter(*data, c=data[1]) +``` + +![自动设置记号标签示例](https://matplotlib.org/_images/sphx_glr_auto_ticks_001.png) + +有时选择均匀分布的刻度会产生奇怪的刻度数。 如果您希望Matplotlib保持位于圆形数字的刻度线,则可以使用以下rcParams值更改此行为: + +```python +print(plt.rcParams['axes.autolimit_mode']) + +# Now change this value and see the results +with plt.rc_context({'axes.autolimit_mode': 'round_numbers'}): + fig, ax = plt.subplots() + ax.scatter(*data, c=data[1]) +``` + +![自动设置记号标签示例2](https://matplotlib.org/_images/sphx_glr_auto_ticks_002.png) + +输出: + +```python +data +``` + +您还可以通过轴改变数据周围轴的边距。(x,y)边距: + +```python +with plt.rc_context({'axes.autolimit_mode': 'round_numbers', + 'axes.xmargin': .8, + 'axes.ymargin': .8}): + fig, ax = plt.subplots() + ax.scatter(*data, c=data[1]) + +plt.show() +``` + +![自动设置记号标签示例3](https://matplotlib.org/_images/sphx_glr_auto_ticks_003.png) + +## 下载这个示例 + +- [下载python源码: auto_ticks.py](https://matplotlib.org/_downloads/auto_ticks.py) +- [下载Jupyter notebook: auto_ticks.ipynb](https://matplotlib.org/_downloads/auto_ticks.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/centered_ticklabels.md b/Python/matplotlab/gallery/ticks_and_spines/centered_ticklabels.md new file mode 100644 index 00000000..68197f37 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/centered_ticklabels.md @@ -0,0 +1,50 @@ +# 居中TickLabels + +有时将ticklabels置于中心是件好事。 Matplotlib目前将标签与刻度线关联,标签可以使用水平对齐属性对齐“中心”,“左”或“右”: + +```python +ax.xaxis.set_tick_params(horizontalalignment='right') +``` + +但这并没有帮助将标签置于刻度之间。一种解决方案是“伪造它”。 使用次要刻度在主要刻度之间放置一个刻度。这是一个标记月份的示例,以ticks为中心。 + +![居中TickLabels示例](https://matplotlib.org/_images/sphx_glr_centered_ticklabels_001.png) + +```python +import numpy as np +import matplotlib.cbook as cbook +import matplotlib.dates as dates +import matplotlib.ticker as ticker +import matplotlib.pyplot as plt + +# load some financial data; apple's stock price +with cbook.get_sample_data('aapl.npz') as fh: + r = np.load(fh)['price_data'].view(np.recarray) +r = r[-250:] # get the last 250 days +# Matplotlib works better with datetime.datetime than np.datetime64, but the +# latter is more portable. +date = r.date.astype('O') + +fig, ax = plt.subplots() +ax.plot(date, r.adj_close) + +ax.xaxis.set_major_locator(dates.MonthLocator()) +ax.xaxis.set_minor_locator(dates.MonthLocator(bymonthday=15)) + +ax.xaxis.set_major_formatter(ticker.NullFormatter()) +ax.xaxis.set_minor_formatter(dates.DateFormatter('%b')) + +for tick in ax.xaxis.get_minor_ticks(): + tick.tick1line.set_markersize(0) + tick.tick2line.set_markersize(0) + tick.label1.set_horizontalalignment('center') + +imid = len(r) // 2 +ax.set_xlabel(str(date[imid].year)) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: centered_ticklabels.py](https://matplotlib.org/_downloads/centered_ticklabels.py) +- [下载Jupyter notebook: centered_ticklabels.ipynb](https://matplotlib.org/_downloads/centered_ticklabels.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/colorbar_tick_labelling_demo.md b/Python/matplotlab/gallery/ticks_and_spines/colorbar_tick_labelling_demo.md new file mode 100644 index 00000000..09203b1c --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/colorbar_tick_labelling_demo.md @@ -0,0 +1,52 @@ +# 颜色栏刻度标签演示 + +为彩条生成自定义标签。 + +供稿人:Scott Sinclair + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib import cm +from numpy.random import randn +``` + +使用垂直(默认)颜色条创建绘图 + +```python +fig, ax = plt.subplots() + +data = np.clip(randn(250, 250), -1, 1) + +cax = ax.imshow(data, interpolation='nearest', cmap=cm.coolwarm) +ax.set_title('Gaussian noise with vertical colorbar') + +# Add colorbar, make sure to specify tick locations to match desired ticklabels +cbar = fig.colorbar(cax, ticks=[-1, 0, 1]) +cbar.ax.set_yticklabels(['< -1', '0', '> 1']) # vertically oriented colorbar +``` + +![颜色栏刻度标签示例](https://matplotlib.org/_images/sphx_glr_colorbar_tick_labelling_demo_001.png) + +使用水平颜色条制作绘图 + +```python +fig, ax = plt.subplots() + +data = np.clip(randn(250, 250), -1, 1) + +cax = ax.imshow(data, interpolation='nearest', cmap=cm.afmhot) +ax.set_title('Gaussian noise with horizontal colorbar') + +cbar = fig.colorbar(cax, ticks=[-1, 0, 1], orientation='horizontal') +cbar.ax.set_xticklabels(['Low', 'Medium', 'High']) # horizontal colorbar + +plt.show() +``` + +![颜色栏刻度标签示例2](https://matplotlib.org/_images/sphx_glr_colorbar_tick_labelling_demo_002.png) + +## 下载这个示例 + +- [下载python源码: colorbar_tick_labelling_demo.py](https://matplotlib.org/_downloads/colorbar_tick_labelling_demo.py) +- [下载Jupyter notebook: colorbar_tick_labelling_demo.ipynb](https://matplotlib.org/_downloads/colorbar_tick_labelling_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/custom_ticker1.md b/Python/matplotlab/gallery/ticks_and_spines/custom_ticker1.md new file mode 100644 index 00000000..f6c51941 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/custom_ticker1.md @@ -0,0 +1,35 @@ +# 自定义Ticker1 + +新的自动收报机代码旨在明确支持用户自定义滴答。[matplotlib.ticker](https://matplotlib.org/api/ticker_api.html#module-matplotlib.ticker) 的文档详细介绍了此过程。 该代码定义了许多预设代码,但主要设计为用户可扩展。 + +在此示例中,用户定义的函数用于格式化y轴上数百万美元的刻度。 + +![自定义Ticker1示例](https://matplotlib.org/_images/sphx_glr_custom_ticker1_001.png) + +```python +from matplotlib.ticker import FuncFormatter +import matplotlib.pyplot as plt +import numpy as np + +x = np.arange(4) +money = [1.5e5, 2.5e6, 5.5e6, 2.0e7] + + +def millions(x, pos): + 'The two args are the value and tick position' + return '$%1.1fM' % (x * 1e-6) + + +formatter = FuncFormatter(millions) + +fig, ax = plt.subplots() +ax.yaxis.set_major_formatter(formatter) +plt.bar(x, money) +plt.xticks(x, ('Bill', 'Fred', 'Mary', 'Sue')) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: custom_ticker1.py](https://matplotlib.org/_downloads/custom_ticker1.py) +- [下载Jupyter notebook: custom_ticker1.ipynb](https://matplotlib.org/_downloads/custom_ticker1.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/date_demo_convert.md b/Python/matplotlab/gallery/ticks_and_spines/date_demo_convert.md new file mode 100644 index 00000000..ab7136c1 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/date_demo_convert.md @@ -0,0 +1,42 @@ +# 演示日期转换 + +![演示日期转换示例](https://matplotlib.org/_images/sphx_glr_date_demo_convert_001.png) + +```python +import datetime +import matplotlib.pyplot as plt +from matplotlib.dates import DayLocator, HourLocator, DateFormatter, drange +import numpy as np + +date1 = datetime.datetime(2000, 3, 2) +date2 = datetime.datetime(2000, 3, 6) +delta = datetime.timedelta(hours=6) +dates = drange(date1, date2, delta) + +y = np.arange(len(dates)) + +fig, ax = plt.subplots() +ax.plot_date(dates, y ** 2) + +# this is superfluous, since the autoscaler should get it right, but +# use date2num and num2date to convert between dates and floats if +# you want; both date2num and num2date convert an instance or sequence +ax.set_xlim(dates[0], dates[-1]) + +# The hour locator takes the hour or sequence of hours you want to +# tick, not the base multiple + +ax.xaxis.set_major_locator(DayLocator()) +ax.xaxis.set_minor_locator(HourLocator(range(0, 25, 6))) +ax.xaxis.set_major_formatter(DateFormatter('%Y-%m-%d')) + +ax.fmt_xdata = DateFormatter('%Y-%m-%d %H:%M:%S') +fig.autofmt_xdate() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: date_demo_convert.py](https://matplotlib.org/_downloads/date_demo_convert.py) +- [下载Jupyter notebook: date_demo_convert.ipynb](https://matplotlib.org/_downloads/date_demo_convert.ipynb) diff --git a/Python/matplotlab/gallery/ticks_and_spines/date_demo_rrule.md b/Python/matplotlab/gallery/ticks_and_spines/date_demo_rrule.md new file mode 100644 index 00000000..657ebdab --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/date_demo_rrule.md @@ -0,0 +1,44 @@ +# 演示日期Rrule + +展示如何使用Rrule实例制作自定义日期自动收报机 - 这里我们在每5个复活节放置一个刻度线。 + +有关Rrules的帮助,请参阅https://dateutil.readthedocs.io/en/stable/ + +![演示日期Rrule示例](https://matplotlib.org/_images/sphx_glr_date_demo_rrule_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.dates import (YEARLY, DateFormatter, + rrulewrapper, RRuleLocator, drange) +import numpy as np +import datetime + +# Fixing random state for reproducibility +np.random.seed(19680801) + + +# tick every 5th easter +rule = rrulewrapper(YEARLY, byeaster=1, interval=5) +loc = RRuleLocator(rule) +formatter = DateFormatter('%m/%d/%y') +date1 = datetime.date(1952, 1, 1) +date2 = datetime.date(2004, 4, 12) +delta = datetime.timedelta(days=100) + +dates = drange(date1, date2, delta) +s = np.random.rand(len(dates)) # make up some random y values + + +fig, ax = plt.subplots() +plt.plot_date(dates, s) +ax.xaxis.set_major_locator(loc) +ax.xaxis.set_major_formatter(formatter) +ax.xaxis.set_tick_params(rotation=30, labelsize=10) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: date_demo_rrule.py](https://matplotlib.org/_downloads/date_demo_rrule.py) +- [下载Jupyter notebook: date_demo_rrule.ipynb](https://matplotlib.org/_downloads/date_demo_rrule.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/date_index_formatter2.md b/Python/matplotlab/gallery/ticks_and_spines/date_index_formatter2.md new file mode 100644 index 00000000..ee8c2930 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/date_index_formatter2.md @@ -0,0 +1,53 @@ +# 日期索引格式化程序 + +在绘制每日数据时,频繁的请求是绘制忽略跳过的数据,例如,周末没有额外的空格。这在金融时间序列中尤为常见,因为您可能拥有M-F而非Sat,Sun的数据,并且您不需要x轴上的间隙。方法是简单地使用xdata的整数索引和自定义刻度Formatter来获取给定索引的适当日期字符串。 + +![日期索引格式化程序示例](https://matplotlib.org/_images/sphx_glr_date_index_formatter2_001.png) + +输出: + +```python +loading /home/tcaswell/mc3/envs/dd37/lib/python3.7/site-packages/matplotlib/mpl-data/sample_data/msft.csv +``` + +```python +import numpy as np + +import matplotlib.pyplot as plt +import matplotlib.cbook as cbook +from matplotlib.dates import bytespdate2num, num2date +from matplotlib.ticker import Formatter + + +datafile = cbook.get_sample_data('msft.csv', asfileobj=False) +print('loading %s' % datafile) +msft_data = np.genfromtxt(datafile, delimiter=',', names=True, + converters={0: bytespdate2num('%d-%b-%y')})[-40:] + + +class MyFormatter(Formatter): + def __init__(self, dates, fmt='%Y-%m-%d'): + self.dates = dates + self.fmt = fmt + + def __call__(self, x, pos=0): + 'Return the label for time x at position pos' + ind = int(np.round(x)) + if ind >= len(self.dates) or ind < 0: + return '' + + return num2date(self.dates[ind]).strftime(self.fmt) + +formatter = MyFormatter(msft_data['Date']) + +fig, ax = plt.subplots() +ax.xaxis.set_major_formatter(formatter) +ax.plot(np.arange(len(msft_data)), msft_data['Close'], 'o-') +fig.autofmt_xdate() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: date_index_formatter2.py](https://matplotlib.org/_downloads/date_index_formatter2.py) +- [下载Jupyter notebook: date_index_formatter2.ipynb](https://matplotlib.org/_downloads/date_index_formatter2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/major_minor_demo.md b/Python/matplotlab/gallery/ticks_and_spines/major_minor_demo.md new file mode 100644 index 00000000..7e4c44e4 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/major_minor_demo.md @@ -0,0 +1,79 @@ +# Major和Minor的演示 + +演示如何使用Major和Minor代码。 + +两个相关的用户空类是Locators和Formatters。定位器确定刻度的位置,格式化程序控制刻度的格式。 + +默认情况下,小刻度线是关闭的(NullLocator和NullFormatter)。您可以通过设置次要定位器来转换w / o标签上的次要刻度。您还可以通过设置次要格式化程序为次要股票代码打开标签 + +制作一个主刻度线为20的倍数和小刻度为5的倍数的图。标记主要刻度与%d格式但不标记次刻度。 + +MultipleLocator自动收报机类用于在一些基数的倍数上放置滴答。 FormatStrFormatter使用字符串格式字符串(例如,'%d'或'%1.2f'或'%1.1f cm')来格式化刻度线 + +pyplot interface grid命令一起更改y轴和y轴的主刻度线的网格设置。如果要控制给定轴的次刻度的网格,请使用例子中的方式。 + +```python +ax.xaxis.grid(True, which='minor') +``` + +请注意,您不应在不同的Axis之间使用相同的定位器,因为定位器存储对Axis数据和视图限制的引用。 + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.ticker import (MultipleLocator, FormatStrFormatter, + AutoMinorLocator) + +majorLocator = MultipleLocator(20) +majorFormatter = FormatStrFormatter('%d') +minorLocator = MultipleLocator(5) + + +t = np.arange(0.0, 100.0, 0.1) +s = np.sin(0.1 * np.pi * t) * np.exp(-t * 0.01) + +fig, ax = plt.subplots() +ax.plot(t, s) + +ax.xaxis.set_major_locator(majorLocator) +ax.xaxis.set_major_formatter(majorFormatter) + +# for the minor ticks, use no labels; default NullFormatter +ax.xaxis.set_minor_locator(minorLocator) + +plt.show() +``` + +![Major和Minor的示例](https://matplotlib.org/_images/sphx_glr_major_minor_demo_001.png) + +主要和次要刻度的自动刻度选择。 + +使用交互式平移和缩放来查看滴答间隔如何变化。每个主要间隔将有4或5个次要滴答间隔,具体取决于主要间隔。 + +可以为AutoMinorLocator提供一个参数,以指定每个主要区间的固定数量的次要区间,例如:minorLocator = AutoMinorLocator(2) 将导致主要区间之间的单个小标记。 + +```python +minorLocator = AutoMinorLocator() + + +t = np.arange(0.0, 100.0, 0.01) +s = np.sin(2 * np.pi * t) * np.exp(-t * 0.01) + +fig, ax = plt.subplots() +ax.plot(t, s) + +ax.xaxis.set_minor_locator(minorLocator) + +ax.tick_params(which='both', width=2) +ax.tick_params(which='major', length=7) +ax.tick_params(which='minor', length=4, color='r') + +plt.show() +``` + +![Major和Minor的示例2](https://matplotlib.org/_images/sphx_glr_major_minor_demo_002.png) + +## 下载这个示例 + +- [下载python源码: major_minor_demo.py](https://matplotlib.org/_downloads/major_minor_demo.py) +- [下载Jupyter notebook: major_minor_demo.ipynb](https://matplotlib.org/_downloads/major_minor_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/multiple_yaxis_with_spines.md b/Python/matplotlab/gallery/ticks_and_spines/multiple_yaxis_with_spines.md new file mode 100644 index 00000000..9d74e9a4 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/multiple_yaxis_with_spines.md @@ -0,0 +1,70 @@ +# 多个Yaxis与Spines + +使用共享x轴创建多个y轴。这是通过创建一个[双轴](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.twinx.html#matplotlib.axes.Axes.twinx),转动所有脊柱但右边的一个不可见并使用[set_position](https://matplotlib.org/api/spines_api.html#matplotlib.spines.Spine.set_position)偏移其位置来完成的。 + +请注意,此方法使用 [matplotlib.axes.Axes](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) 及其Spines。 寄生虫轴的另一种方法显示在[Demo Parasite Axes](https://matplotlib.org/gallery/axisartist/demo_parasite_axes.html) 和 [Demo Parasite Axes2](https://matplotlib.org/gallery/axisartist/demo_parasite_axes2.html)示例中。 + +![多个Yaxis与Spines示例](https://matplotlib.org/_images/sphx_glr_multiple_yaxis_with_spines_001.png) + +```python +import matplotlib.pyplot as plt + + +def make_patch_spines_invisible(ax): + ax.set_frame_on(True) + ax.patch.set_visible(False) + for sp in ax.spines.values(): + sp.set_visible(False) + + +fig, host = plt.subplots() +fig.subplots_adjust(right=0.75) + +par1 = host.twinx() +par2 = host.twinx() + +# Offset the right spine of par2. The ticks and label have already been +# placed on the right by twinx above. +par2.spines["right"].set_position(("axes", 1.2)) +# Having been created by twinx, par2 has its frame off, so the line of its +# detached spine is invisible. First, activate the frame but make the patch +# and spines invisible. +make_patch_spines_invisible(par2) +# Second, show the right spine. +par2.spines["right"].set_visible(True) + +p1, = host.plot([0, 1, 2], [0, 1, 2], "b-", label="Density") +p2, = par1.plot([0, 1, 2], [0, 3, 2], "r-", label="Temperature") +p3, = par2.plot([0, 1, 2], [50, 30, 15], "g-", label="Velocity") + +host.set_xlim(0, 2) +host.set_ylim(0, 2) +par1.set_ylim(0, 4) +par2.set_ylim(1, 65) + +host.set_xlabel("Distance") +host.set_ylabel("Density") +par1.set_ylabel("Temperature") +par2.set_ylabel("Velocity") + +host.yaxis.label.set_color(p1.get_color()) +par1.yaxis.label.set_color(p2.get_color()) +par2.yaxis.label.set_color(p3.get_color()) + +tkw = dict(size=4, width=1.5) +host.tick_params(axis='y', colors=p1.get_color(), **tkw) +par1.tick_params(axis='y', colors=p2.get_color(), **tkw) +par2.tick_params(axis='y', colors=p3.get_color(), **tkw) +host.tick_params(axis='x', **tkw) + +lines = [p1, p2, p3] + +host.legend(lines, [l.get_label() for l in lines]) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: multiple_yaxis_with_spines.py](https://matplotlib.org/_downloads/multiple_yaxis_with_spines.py) +- [下载Jupyter notebook: multiple_yaxis_with_spines.ipynb](https://matplotlib.org/_downloads/multiple_yaxis_with_spines.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/scalarformatter.md b/Python/matplotlab/gallery/ticks_and_spines/scalarformatter.md new file mode 100644 index 00000000..854543a2 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/scalarformatter.md @@ -0,0 +1,113 @@ +# 使用ScalarFormat标记格式 + +该示例显示了ScalarFormatter与不同设置的使用。 + +例子1:默认 + +例子2:没有图形偏移 + +例子3:使用Mathtext + +```python +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.ticker import ScalarFormatter +``` + +例子1: + +```python +x = np.arange(0, 1, .01) +fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(6, 6)) +fig.text(0.5, 0.975, 'The new formatter, default settings', + horizontalalignment='center', + verticalalignment='top') + +ax1.plot(x * 1e5 + 1e10, x * 1e-10 + 1e-5) +ax1.xaxis.set_major_formatter(ScalarFormatter()) +ax1.yaxis.set_major_formatter(ScalarFormatter()) + +ax2.plot(x * 1e5, x * 1e-4) +ax2.xaxis.set_major_formatter(ScalarFormatter()) +ax2.yaxis.set_major_formatter(ScalarFormatter()) + +ax3.plot(-x * 1e5 - 1e10, -x * 1e-5 - 1e-10) +ax3.xaxis.set_major_formatter(ScalarFormatter()) +ax3.yaxis.set_major_formatter(ScalarFormatter()) + +ax4.plot(-x * 1e5, -x * 1e-4) +ax4.xaxis.set_major_formatter(ScalarFormatter()) +ax4.yaxis.set_major_formatter(ScalarFormatter()) + +fig.subplots_adjust(wspace=0.7, hspace=0.6) +``` + +![使用ScalarFormat标记格式示例](https://matplotlib.org/_images/sphx_glr_scalarformatter_001.png) + +例子2: + +```python +x = np.arange(0, 1, .01) +fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(6, 6)) +fig.text(0.5, 0.975, 'The new formatter, no numerical offset', + horizontalalignment='center', + verticalalignment='top') + +ax1.plot(x * 1e5 + 1e10, x * 1e-10 + 1e-5) +ax1.xaxis.set_major_formatter(ScalarFormatter(useOffset=False)) +ax1.yaxis.set_major_formatter(ScalarFormatter(useOffset=False)) + +ax2.plot(x * 1e5, x * 1e-4) +ax2.xaxis.set_major_formatter(ScalarFormatter(useOffset=False)) +ax2.yaxis.set_major_formatter(ScalarFormatter(useOffset=False)) + +ax3.plot(-x * 1e5 - 1e10, -x * 1e-5 - 1e-10) +ax3.xaxis.set_major_formatter(ScalarFormatter(useOffset=False)) +ax3.yaxis.set_major_formatter(ScalarFormatter(useOffset=False)) + +ax4.plot(-x * 1e5, -x * 1e-4) +ax4.xaxis.set_major_formatter(ScalarFormatter(useOffset=False)) +ax4.yaxis.set_major_formatter(ScalarFormatter(useOffset=False)) + +fig.subplots_adjust(wspace=0.7, hspace=0.6) +``` + +![使用ScalarFormat标记格式示例2](https://matplotlib.org/_images/sphx_glr_scalarformatter_002.png) + +例子3: + +```python +x = np.arange(0, 1, .01) +fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(6, 6)) +fig.text(0.5, 0.975, 'The new formatter, with mathtext', + horizontalalignment='center', + verticalalignment='top') + +ax1.plot(x * 1e5 + 1e10, x * 1e-10 + 1e-5) +ax1.xaxis.set_major_formatter(ScalarFormatter(useMathText=True)) +ax1.yaxis.set_major_formatter(ScalarFormatter(useMathText=True)) + +ax2.plot(x * 1e5, x * 1e-4) +ax2.xaxis.set_major_formatter(ScalarFormatter(useMathText=True)) +ax2.yaxis.set_major_formatter(ScalarFormatter(useMathText=True)) + +ax3.plot(-x * 1e5 - 1e10, -x * 1e-5 - 1e-10) +ax3.xaxis.set_major_formatter(ScalarFormatter(useMathText=True)) +ax3.yaxis.set_major_formatter(ScalarFormatter(useMathText=True)) + +ax4.plot(-x * 1e5, -x * 1e-4) +ax4.xaxis.set_major_formatter(ScalarFormatter(useMathText=True)) +ax4.yaxis.set_major_formatter(ScalarFormatter(useMathText=True)) + +fig.subplots_adjust(wspace=0.7, hspace=0.6) + +plt.show() +``` + +![使用ScalarFormat标记格式示例3](https://matplotlib.org/_images/sphx_glr_scalarformatter_003.png) + +## 下载这个示例 + +- [下载python源码: scalarformatter.py](https://matplotlib.org/_downloads/scalarformatter.py) +- [下载Jupyter notebook: scalarformatter.ipynb](https://matplotlib.org/_downloads/scalarformatter.ipynb) + diff --git a/Python/matplotlab/gallery/ticks_and_spines/spine_placement_demo.md b/Python/matplotlab/gallery/ticks_and_spines/spine_placement_demo.md new file mode 100644 index 00000000..2e411ddd --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/spine_placement_demo.md @@ -0,0 +1,123 @@ +# Spine的放置演示 + +调整轴Spine的位置和外观。 + +```python +import numpy as np +import matplotlib.pyplot as plt +``` + +```python +fig = plt.figure() +x = np.linspace(-np.pi, np.pi, 100) +y = 2 * np.sin(x) + +ax = fig.add_subplot(2, 2, 1) +ax.set_title('centered spines') +ax.plot(x, y) +ax.spines['left'].set_position('center') +ax.spines['right'].set_color('none') +ax.spines['bottom'].set_position('center') +ax.spines['top'].set_color('none') +ax.spines['left'].set_smart_bounds(True) +ax.spines['bottom'].set_smart_bounds(True) +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') + +ax = fig.add_subplot(2, 2, 2) +ax.set_title('zeroed spines') +ax.plot(x, y) +ax.spines['left'].set_position('zero') +ax.spines['right'].set_color('none') +ax.spines['bottom'].set_position('zero') +ax.spines['top'].set_color('none') +ax.spines['left'].set_smart_bounds(True) +ax.spines['bottom'].set_smart_bounds(True) +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') + +ax = fig.add_subplot(2, 2, 3) +ax.set_title('spines at axes (0.6, 0.1)') +ax.plot(x, y) +ax.spines['left'].set_position(('axes', 0.6)) +ax.spines['right'].set_color('none') +ax.spines['bottom'].set_position(('axes', 0.1)) +ax.spines['top'].set_color('none') +ax.spines['left'].set_smart_bounds(True) +ax.spines['bottom'].set_smart_bounds(True) +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') + +ax = fig.add_subplot(2, 2, 4) +ax.set_title('spines at data (1, 2)') +ax.plot(x, y) +ax.spines['left'].set_position(('data', 1)) +ax.spines['right'].set_color('none') +ax.spines['bottom'].set_position(('data', 2)) +ax.spines['top'].set_color('none') +ax.spines['left'].set_smart_bounds(True) +ax.spines['bottom'].set_smart_bounds(True) +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') +``` + +![Spine的放置演示](https://matplotlib.org/_images/sphx_glr_spine_placement_demo_001.png) + +定义一个调整轴Spine位置的方法 + +```python +def adjust_spines(ax, spines): + for loc, spine in ax.spines.items(): + if loc in spines: + spine.set_position(('outward', 10)) # outward by 10 points + spine.set_smart_bounds(True) + else: + spine.set_color('none') # don't draw spine + + # turn off ticks where there is no spine + if 'left' in spines: + ax.yaxis.set_ticks_position('left') + else: + # no yaxis ticks + ax.yaxis.set_ticks([]) + + if 'bottom' in spines: + ax.xaxis.set_ticks_position('bottom') + else: + # no xaxis ticks + ax.xaxis.set_ticks([]) +``` + +使用我们的新adjust_spines方法创建另一个图形。 + +```python +fig = plt.figure() + +x = np.linspace(0, 2 * np.pi, 100) +y = 2 * np.sin(x) + +ax = fig.add_subplot(2, 2, 1) +ax.plot(x, y, clip_on=False) +adjust_spines(ax, ['left']) + +ax = fig.add_subplot(2, 2, 2) +ax.plot(x, y, clip_on=False) +adjust_spines(ax, []) + +ax = fig.add_subplot(2, 2, 3) +ax.plot(x, y, clip_on=False) +adjust_spines(ax, ['left', 'bottom']) + +ax = fig.add_subplot(2, 2, 4) +ax.plot(x, y, clip_on=False) +adjust_spines(ax, ['bottom']) + +plt.show() +``` + +![Spine的放置演示2](https://matplotlib.org/_images/sphx_glr_spine_placement_demo_002.png) + +## 下载这个示例 + +- [下载python源码: spine_placement_demo.py](https://matplotlib.org/_downloads/spine_placement_demo.py) +- [下载Jupyter notebook: spine_placement_demo.ipynb](https://matplotlib.org/_downloads/spine_placement_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/spines.md b/Python/matplotlab/gallery/ticks_and_spines/spines.md new file mode 100644 index 00000000..1c272a29 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/spines.md @@ -0,0 +1,53 @@ +# Spines图 + +这个演示比较: + + - 正常轴,四边都有spine; + - 仅在左侧和底部有spine的轴; + - 使用自定义边界限制spine范围的轴。 + +![Spines图示例](https://matplotlib.org/_images/sphx_glr_spines_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + + +x = np.linspace(0, 2 * np.pi, 100) +y = 2 * np.sin(x) + +fig, (ax0, ax1, ax2) = plt.subplots(nrows=3) + +ax0.plot(x, y) +ax0.set_title('normal spines') + +ax1.plot(x, y) +ax1.set_title('bottom-left spines') + +# Hide the right and top spines +ax1.spines['right'].set_visible(False) +ax1.spines['top'].set_visible(False) +# Only show ticks on the left and bottom spines +ax1.yaxis.set_ticks_position('left') +ax1.xaxis.set_ticks_position('bottom') + +ax2.plot(x, y) + +# Only draw spine between the y-ticks +ax2.spines['left'].set_bounds(-1, 1) +# Hide the right and top spines +ax2.spines['right'].set_visible(False) +ax2.spines['top'].set_visible(False) +# Only show ticks on the left and bottom spines +ax2.yaxis.set_ticks_position('left') +ax2.xaxis.set_ticks_position('bottom') + +# Tweak spacing between subplots to prevent labels from overlapping +plt.subplots_adjust(hspace=0.5) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: spines.py](https://matplotlib.org/_downloads/spines.py) +- [下载Jupyter notebook: spines.ipynb](https://matplotlib.org/_downloads/spines.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/spines_bounds.md b/Python/matplotlab/gallery/ticks_and_spines/spines_bounds.md new file mode 100644 index 00000000..f895cf48 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/spines_bounds.md @@ -0,0 +1,44 @@ +# 自定义spine边界 + +使用自定义边界来限制脊椎范围的spine演示。 + +![自定义spine边界示例](https://matplotlib.org/_images/sphx_glr_spines_bounds_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + +x = np.linspace(0, 2*np.pi, 50) +y = np.sin(x) +y2 = y + 0.1 * np.random.normal(size=x.shape) + +fig, ax = plt.subplots() +ax.plot(x, y, 'k--') +ax.plot(x, y2, 'ro') + +# set ticks and tick labels +ax.set_xlim((0, 2*np.pi)) +ax.set_xticks([0, np.pi, 2*np.pi]) +ax.set_xticklabels(['0', r'$\pi$', r'2$\pi$']) +ax.set_ylim((-1.5, 1.5)) +ax.set_yticks([-1, 0, 1]) + +# Only draw spine between the y-ticks +ax.spines['left'].set_bounds(-1, 1) +# Hide the right and top spines +ax.spines['right'].set_visible(False) +ax.spines['top'].set_visible(False) +# Only show ticks on the left and bottom spines +ax.yaxis.set_ticks_position('left') +ax.xaxis.set_ticks_position('bottom') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: spines_bounds.py](https://matplotlib.org/_downloads/spines_bounds.py) +- [下载Jupyter notebook: spines_bounds.ipynb](https://matplotlib.org/_downloads/spines_bounds.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/spines_dropped.md b/Python/matplotlab/gallery/ticks_and_spines/spines_dropped.md new file mode 100644 index 00000000..80276b24 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/spines_dropped.md @@ -0,0 +1,36 @@ +# 掉落的spines + +从轴上偏移的spines的演示(a.k.a。“掉落的spines”)。 + +![掉落的spines示例](https://matplotlib.org/_images/sphx_glr_spines_dropped_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt + +# Fixing random state for reproducibility +np.random.seed(19680801) + +fig, ax = plt.subplots() + +image = np.random.uniform(size=(10, 10)) +ax.imshow(image, cmap=plt.cm.gray, interpolation='nearest') +ax.set_title('dropped spines') + +# Move left and bottom spines outward by 10 points +ax.spines['left'].set_position(('outward', 10)) +ax.spines['bottom'].set_position(('outward', 10)) +# Hide the right and top spines +ax.spines['right'].set_visible(False) +ax.spines['top'].set_visible(False) +# Only show ticks on the left and bottom spines +ax.yaxis.set_ticks_position('left') +ax.xaxis.set_ticks_position('bottom') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: spines_dropped.py](https://matplotlib.org/_downloads/spines_dropped.py) +- [下载Jupyter notebook: spines_dropped.ipynb](https://matplotlib.org/_downloads/spines_dropped.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/tick_formatters.md b/Python/matplotlab/gallery/ticks_and_spines/tick_formatters.md new file mode 100644 index 00000000..b785c82b --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/tick_formatters.md @@ -0,0 +1,113 @@ +# 刻度格式化程序 + +显示不同的刻度格式化程序。 + +![刻度格式化程序示例](https://matplotlib.org/_images/sphx_glr_tick-formatters_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker + + +# Setup a plot such that only the bottom spine is shown +def setup(ax): + ax.spines['right'].set_color('none') + ax.spines['left'].set_color('none') + ax.yaxis.set_major_locator(ticker.NullLocator()) + ax.spines['top'].set_color('none') + ax.xaxis.set_ticks_position('bottom') + ax.tick_params(which='major', width=1.00, length=5) + ax.tick_params(which='minor', width=0.75, length=2.5, labelsize=10) + ax.set_xlim(0, 5) + ax.set_ylim(0, 1) + ax.patch.set_alpha(0.0) + + +fig = plt.figure(figsize=(8, 6)) +n = 7 + +# Null formatter +ax = fig.add_subplot(n, 1, 1) +setup(ax) +ax.xaxis.set_major_locator(ticker.MultipleLocator(1.00)) +ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.25)) +ax.xaxis.set_major_formatter(ticker.NullFormatter()) +ax.xaxis.set_minor_formatter(ticker.NullFormatter()) +ax.text(0.0, 0.1, "NullFormatter()", fontsize=16, transform=ax.transAxes) + +# Fixed formatter +ax = fig.add_subplot(n, 1, 2) +setup(ax) +ax.xaxis.set_major_locator(ticker.MultipleLocator(1.0)) +ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.25)) +majors = ["", "0", "1", "2", "3", "4", "5"] +ax.xaxis.set_major_formatter(ticker.FixedFormatter(majors)) +minors = [""] + ["%.2f" % (x-int(x)) if (x-int(x)) + else "" for x in np.arange(0, 5, 0.25)] +ax.xaxis.set_minor_formatter(ticker.FixedFormatter(minors)) +ax.text(0.0, 0.1, "FixedFormatter(['', '0', '1', ...])", + fontsize=15, transform=ax.transAxes) + + +# FuncFormatter can be used as a decorator +@ticker.FuncFormatter +def major_formatter(x, pos): + return "[%.2f]" % x + + +ax = fig.add_subplot(n, 1, 3) +setup(ax) +ax.xaxis.set_major_locator(ticker.MultipleLocator(1.00)) +ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.25)) +ax.xaxis.set_major_formatter(major_formatter) +ax.text(0.0, 0.1, 'FuncFormatter(lambda x, pos: "[%.2f]" % x)', + fontsize=15, transform=ax.transAxes) + + +# FormatStr formatter +ax = fig.add_subplot(n, 1, 4) +setup(ax) +ax.xaxis.set_major_locator(ticker.MultipleLocator(1.00)) +ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.25)) +ax.xaxis.set_major_formatter(ticker.FormatStrFormatter(">%d<")) +ax.text(0.0, 0.1, "FormatStrFormatter('>%d<')", + fontsize=15, transform=ax.transAxes) + +# Scalar formatter +ax = fig.add_subplot(n, 1, 5) +setup(ax) +ax.xaxis.set_major_locator(ticker.AutoLocator()) +ax.xaxis.set_minor_locator(ticker.AutoMinorLocator()) +ax.xaxis.set_major_formatter(ticker.ScalarFormatter(useMathText=True)) +ax.text(0.0, 0.1, "ScalarFormatter()", fontsize=15, transform=ax.transAxes) + +# StrMethod formatter +ax = fig.add_subplot(n, 1, 6) +setup(ax) +ax.xaxis.set_major_locator(ticker.MultipleLocator(1.00)) +ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.25)) +ax.xaxis.set_major_formatter(ticker.StrMethodFormatter("{x}")) +ax.text(0.0, 0.1, "StrMethodFormatter('{x}')", + fontsize=15, transform=ax.transAxes) + +# Percent formatter +ax = fig.add_subplot(n, 1, 7) +setup(ax) +ax.xaxis.set_major_locator(ticker.MultipleLocator(1.00)) +ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.25)) +ax.xaxis.set_major_formatter(ticker.PercentFormatter(xmax=5)) +ax.text(0.0, 0.1, "PercentFormatter(xmax=5)", + fontsize=15, transform=ax.transAxes) + +# Push the top of the top axes outside the figure because we only show the +# bottom spine. +fig.subplots_adjust(left=0.05, right=0.95, bottom=0.05, top=1.05) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: tick-formatters.py](https://matplotlib.org/_downloads/tick-formatters.py) +- [下载Jupyter notebook: tick-formatters.ipynb](https://matplotlib.org/_downloads/tick-formatters.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/tick_label_right.md b/Python/matplotlab/gallery/ticks_and_spines/tick_label_right.md new file mode 100644 index 00000000..580a4ea2 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/tick_label_right.md @@ -0,0 +1,30 @@ +# 在右侧设置默认的y轴刻度标签 + +我们可以使用[rcParams["ytick.labelright"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams)(默认为False)和[rcParams["ytick.right"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams)(默认为False)和[rcParams["ytick.labelleft"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams)(默认为True)和 [rcParams["ytick.left"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams)(默认为True)控制轴上的刻度和标签出现的位置。这些属性也可以在.matplotlib / matplotlibrc中设置。 + +![在右侧设置默认的y轴刻度标签示例](https://matplotlib.org/_images/sphx_glr_tick_label_right_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +plt.rcParams['ytick.right'] = plt.rcParams['ytick.labelright'] = True +plt.rcParams['ytick.left'] = plt.rcParams['ytick.labelleft'] = False + +x = np.arange(10) + +fig, (ax0, ax1) = plt.subplots(2, 1, sharex=True, figsize=(6, 6)) + +ax0.plot(x) +ax0.yaxis.tick_left() + +# use default parameter in rcParams, not calling tick_right() +ax1.plot(x) + +plt.show() +``` + +## 下载这个示例 + +-[下载python源码: tick_label_right.py](https://matplotlib.org/_downloads/tick_label_right.py) +-[下载Jupyter notebook: tick_label_right.ipynb](https://matplotlib.org/_downloads/tick_label_right.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/tick_labels_from_values.md b/Python/matplotlab/gallery/ticks_and_spines/tick_labels_from_values.md new file mode 100644 index 00000000..89982716 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/tick_labels_from_values.md @@ -0,0 +1,36 @@ +# 从值列表中设置刻度标签 + +使用ax.set_xticks会导致在当前选择的刻度上设置刻度标签。 但是,您可能希望允许matplotlib动态选择刻度数及其间距。 + +在这种情况下,最好从刻度线上的值确定刻度标签。 以下示例显示了如何执行此操作。 + +注意:这里使用MaxNLocator来确保刻度值取整数值。 + +![从值列表中设置刻度标签示例](https://matplotlib.org/_images/sphx_glr_tick_labels_from_values_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.ticker import FuncFormatter, MaxNLocator +fig, ax = plt.subplots() +xs = range(26) +ys = range(26) +labels = list('abcdefghijklmnopqrstuvwxyz') + + +def format_fn(tick_val, tick_pos): + if int(tick_val) in xs: + return labels[int(tick_val)] + else: + return '' + + +ax.xaxis.set_major_formatter(FuncFormatter(format_fn)) +ax.xaxis.set_major_locator(MaxNLocator(integer=True)) +ax.plot(xs, ys) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: tick_labels_from_values.py](https://matplotlib.org/_downloads/tick_labels_from_values.py) +- [下载Jupyter notebook: tick_labels_from_values.ipynb](https://matplotlib.org/_downloads/tick_labels_from_values.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/tick_locators.md b/Python/matplotlab/gallery/ticks_and_spines/tick_locators.md new file mode 100644 index 00000000..9fab2d9c --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/tick_locators.md @@ -0,0 +1,106 @@ +# 刻度定位器 + +显示不同的刻度定位器。 + +![刻度定位器示例](https://matplotlib.org/_images/sphx_glr_tick-locators_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker + + +# Setup a plot such that only the bottom spine is shown +def setup(ax): + ax.spines['right'].set_color('none') + ax.spines['left'].set_color('none') + ax.yaxis.set_major_locator(ticker.NullLocator()) + ax.spines['top'].set_color('none') + ax.xaxis.set_ticks_position('bottom') + ax.tick_params(which='major', width=1.00) + ax.tick_params(which='major', length=5) + ax.tick_params(which='minor', width=0.75) + ax.tick_params(which='minor', length=2.5) + ax.set_xlim(0, 5) + ax.set_ylim(0, 1) + ax.patch.set_alpha(0.0) + + +plt.figure(figsize=(8, 6)) +n = 8 + +# Null Locator +ax = plt.subplot(n, 1, 1) +setup(ax) +ax.xaxis.set_major_locator(ticker.NullLocator()) +ax.xaxis.set_minor_locator(ticker.NullLocator()) +ax.text(0.0, 0.1, "NullLocator()", fontsize=14, transform=ax.transAxes) + +# Multiple Locator +ax = plt.subplot(n, 1, 2) +setup(ax) +ax.xaxis.set_major_locator(ticker.MultipleLocator(0.5)) +ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.1)) +ax.text(0.0, 0.1, "MultipleLocator(0.5)", fontsize=14, + transform=ax.transAxes) + +# Fixed Locator +ax = plt.subplot(n, 1, 3) +setup(ax) +majors = [0, 1, 5] +ax.xaxis.set_major_locator(ticker.FixedLocator(majors)) +minors = np.linspace(0, 1, 11)[1:-1] +ax.xaxis.set_minor_locator(ticker.FixedLocator(minors)) +ax.text(0.0, 0.1, "FixedLocator([0, 1, 5])", fontsize=14, + transform=ax.transAxes) + +# Linear Locator +ax = plt.subplot(n, 1, 4) +setup(ax) +ax.xaxis.set_major_locator(ticker.LinearLocator(3)) +ax.xaxis.set_minor_locator(ticker.LinearLocator(31)) +ax.text(0.0, 0.1, "LinearLocator(numticks=3)", + fontsize=14, transform=ax.transAxes) + +# Index Locator +ax = plt.subplot(n, 1, 5) +setup(ax) +ax.plot(range(0, 5), [0]*5, color='White') +ax.xaxis.set_major_locator(ticker.IndexLocator(base=.5, offset=.25)) +ax.text(0.0, 0.1, "IndexLocator(base=0.5, offset=0.25)", + fontsize=14, transform=ax.transAxes) + +# Auto Locator +ax = plt.subplot(n, 1, 6) +setup(ax) +ax.xaxis.set_major_locator(ticker.AutoLocator()) +ax.xaxis.set_minor_locator(ticker.AutoMinorLocator()) +ax.text(0.0, 0.1, "AutoLocator()", fontsize=14, transform=ax.transAxes) + +# MaxN Locator +ax = plt.subplot(n, 1, 7) +setup(ax) +ax.xaxis.set_major_locator(ticker.MaxNLocator(4)) +ax.xaxis.set_minor_locator(ticker.MaxNLocator(40)) +ax.text(0.0, 0.1, "MaxNLocator(n=4)", fontsize=14, transform=ax.transAxes) + +# Log Locator +ax = plt.subplot(n, 1, 8) +setup(ax) +ax.set_xlim(10**3, 10**10) +ax.set_xscale('log') +ax.xaxis.set_major_locator(ticker.LogLocator(base=10.0, numticks=15)) +ax.text(0.0, 0.1, "LogLocator(base=10, numticks=15)", + fontsize=15, transform=ax.transAxes) + +# Push the top of the top axes outside the figure because we only show the +# bottom spine. +plt.subplots_adjust(left=0.05, right=0.95, bottom=0.05, top=1.05) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: tick-locators.py](https://matplotlib.org/_downloads/tick-locators.py) +- [下载Jupyter notebook: tick-locators.ipynb](https://matplotlib.org/_downloads/tick-locators.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/tick_xlabel_top.md b/Python/matplotlab/gallery/ticks_and_spines/tick_xlabel_top.md new file mode 100644 index 00000000..edb6083f --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/tick_xlabel_top.md @@ -0,0 +1,30 @@ +# 在顶部设置默认的x轴刻度标签 + +我们可以使用 [rcParams["xtick.labeltop"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams)(默认为False)和[rcParams["xtick.top"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams)(默认为False)和[rcParams["xtick.labelbottom"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams)(默认为True)和 [rcParams["xtick.bottom"]](https://matplotlib.org/tutorials/introductory/customizing.html#matplotlib-rcparams) (默认为True)控制轴上的刻度和标签出现的位置。 + +这些属性也可以在.matplotlib / matplotlibrc中设置。 + +![在顶部设置默认的x轴刻度标签示例](https://matplotlib.org/_images/sphx_glr_tick_xlabel_top_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + + +plt.rcParams['xtick.bottom'] = plt.rcParams['xtick.labelbottom'] = False +plt.rcParams['xtick.top'] = plt.rcParams['xtick.labeltop'] = True + +x = np.arange(10) + +fig, ax = plt.subplots() + +ax.plot(x) +ax.set_title('xlabel top') # Note title moves to make room for ticks + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: tick_xlabel_top.py](https://matplotlib.org/_downloads/tick_xlabel_top.py) +- [下载Jupyter notebook: tick_xlabel_top.ipynb](https://matplotlib.org/_downloads/tick_xlabel_top.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/ticks_and_spines/ticklabels_rotation.md b/Python/matplotlab/gallery/ticks_and_spines/ticklabels_rotation.md new file mode 100644 index 00000000..02ffc298 --- /dev/null +++ b/Python/matplotlab/gallery/ticks_and_spines/ticklabels_rotation.md @@ -0,0 +1,28 @@ +# 旋转自定义刻度标签 + +使用用户定义的旋转演示自定义刻度标签。 + +![旋转自定义刻度标签示例](https://matplotlib.org/_images/sphx_glr_ticklabels_rotation_001.png) + +```python +import matplotlib.pyplot as plt + + +x = [1, 2, 3, 4] +y = [1, 4, 9, 6] +labels = ['Frogs', 'Hogs', 'Bogs', 'Slogs'] + +plt.plot(x, y, 'ro') +# You can specify a rotation for the tick labels in degrees or with keywords. +plt.xticks(x, labels, rotation='vertical') +# Pad margins so that markers don't get clipped by the axes +plt.margins(0.2) +# Tweak spacing to prevent clipping of tick-labels +plt.subplots_adjust(bottom=0.15) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: ticklabels_rotation.py](https://matplotlib.org/_downloads/ticklabels_rotation.py) +- [下载Jupyter notebook: ticklabels_rotation.ipynb](https://matplotlib.org/_downloads/ticklabels_rotation.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/annotate_with_units.md b/Python/matplotlab/gallery/units/annotate_with_units.md new file mode 100644 index 00000000..c5d7c8c3 --- /dev/null +++ b/Python/matplotlab/gallery/units/annotate_with_units.md @@ -0,0 +1,37 @@ +# 带有单位的注释 + +该示例说明了如何使用厘米级绘图创建文本和箭头注释。 + +此示例需要 [basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + +![带有单位的注释示例](https://matplotlib.org/_images/sphx_glr_annotate_with_units_001.png) + +```python +import matplotlib.pyplot as plt +from basic_units import cm + +fig, ax = plt.subplots() + +ax.annotate("Note 01", [0.5*cm, 0.5*cm]) + +# xy and text both unitized +ax.annotate('local max', xy=(3*cm, 1*cm), xycoords='data', + xytext=(0.8*cm, 0.95*cm), textcoords='data', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='right', verticalalignment='top') + +# mixing units w/ nonunits +ax.annotate('local max', xy=(3*cm, 1*cm), xycoords='data', + xytext=(0.8, 0.95), textcoords='axes fraction', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='right', verticalalignment='top') + + +ax.set_xlim(0*cm, 4*cm) +ax.set_ylim(0*cm, 4*cm) +plt.show() +``` +## 下载这个示例 + +- [下载python源码: annotate_with_units.py](https://matplotlib.org/_downloads/annotate_with_units.py) +- [下载Jupyter notebook: annotate_with_units.ipynb](https://matplotlib.org/_downloads/annotate_with_units.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/artist_tests.md b/Python/matplotlab/gallery/units/artist_tests.md new file mode 100644 index 00000000..af221652 --- /dev/null +++ b/Python/matplotlab/gallery/units/artist_tests.md @@ -0,0 +1,66 @@ +# 艺术家对象测试 + +每个Matplotlib原始艺术家类型的测试单元支持。 + +轴处理单位转换,艺术家保留指向其父轴的指针。如果要将它们与单位数据一起使用,则必须使用轴实例初始化艺术家,否则他们将不知道如何将单位转换为标量。 + +此示例需要 [basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + +![艺术家对象测试示例](https://matplotlib.org/_images/sphx_glr_artist_tests_001.png) + +```python +import random +import matplotlib.lines as lines +import matplotlib.patches as patches +import matplotlib.text as text +import matplotlib.collections as collections + +from basic_units import cm, inch +import numpy as np +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() +ax.xaxis.set_units(cm) +ax.yaxis.set_units(cm) + +# Fixing random state for reproducibility +np.random.seed(19680801) + +if 0: + # test a line collection + # Not supported at present. + verts = [] + for i in range(10): + # a random line segment in inches + verts.append(zip(*inch*10*np.random.rand(2, random.randint(2, 15)))) + lc = collections.LineCollection(verts, axes=ax) + ax.add_collection(lc) + +# test a plain-ol-line +line = lines.Line2D([0*cm, 1.5*cm], [0*cm, 2.5*cm], + lw=2, color='black', axes=ax) +ax.add_line(line) + +if 0: + # test a patch + # Not supported at present. + rect = patches.Rectangle((1*cm, 1*cm), width=5*cm, height=2*cm, + alpha=0.2, axes=ax) + ax.add_patch(rect) + + +t = text.Text(3*cm, 2.5*cm, 'text label', ha='left', va='bottom', axes=ax) +ax.add_artist(t) + +ax.set_xlim(-1*cm, 10*cm) +ax.set_ylim(-1*cm, 10*cm) +# ax.xaxis.set_units(inch) +ax.grid(True) +ax.set_title("Artists with units") +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: artist_tests.py](https://matplotlib.org/_downloads/artist_tests.py) +- [下载Jupyter notebook: artist_tests.ipynb](https://matplotlib.org/_downloads/artist_tests.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/bar_demo2.md b/Python/matplotlab/gallery/units/bar_demo2.md new file mode 100644 index 00000000..ed5723cf --- /dev/null +++ b/Python/matplotlab/gallery/units/bar_demo2.md @@ -0,0 +1,38 @@ +# 单位条形图 + +使用各种厘米和英寸转换的图像。此示例显示默认单位内省如何工作(ax1),如何使用各种关键字来设置x和y单位以覆盖默认值(ax2,ax3,ax4)以及如何使用标量设置xlimits(ax3,当前单位) 假设)或单位(用于将数字转换为当前单位的转换)。 + +此示例需要 [basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + + +![单位条形图示例](https://matplotlib.org/_images/sphx_glr_bar_demo2_001.png) + +```python +import numpy as np +from basic_units import cm, inch +import matplotlib.pyplot as plt + +cms = cm * np.arange(0, 10, 2) +bottom = 0 * cm +width = 0.8 * cm + +fig, axs = plt.subplots(2, 2) + +axs[0, 0].bar(cms, cms, bottom=bottom) + +axs[0, 1].bar(cms, cms, bottom=bottom, width=width, xunits=cm, yunits=inch) + +axs[1, 0].bar(cms, cms, bottom=bottom, width=width, xunits=inch, yunits=cm) +axs[1, 0].set_xlim(2, 6) # scalars are interpreted in current units + +axs[1, 1].bar(cms, cms, bottom=bottom, width=width, xunits=inch, yunits=inch) +axs[1, 1].set_xlim(2 * cm, 6 * cm) # cm are converted to inches + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bar_demo2.py](https://matplotlib.org/_downloads/bar_demo2.py) +- [下载Jupyter notebook: bar_demo2.ipynb](https://matplotlib.org/_downloads/bar_demo2.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/bar_unit_demo.md b/Python/matplotlab/gallery/units/bar_unit_demo.md new file mode 100644 index 00000000..dc549593 --- /dev/null +++ b/Python/matplotlab/gallery/units/bar_unit_demo.md @@ -0,0 +1,45 @@ +# 与单位组合的条形图 + +此示例与以厘米为单位的[条形图演示](https://matplotlib.org/gallery/lines_bars_and_markers/barchart.html)相同。 + +此示例需要 [basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + +![与单位组合的条形图示例](https://matplotlib.org/_images/sphx_glr_bar_unit_demo_001.png) + +```python +import numpy as np +from basic_units import cm, inch +import matplotlib.pyplot as plt + + +N = 5 +menMeans = (150*cm, 160*cm, 146*cm, 172*cm, 155*cm) +menStd = (20*cm, 30*cm, 32*cm, 10*cm, 20*cm) + +fig, ax = plt.subplots() + +ind = np.arange(N) # the x locations for the groups +width = 0.35 # the width of the bars +p1 = ax.bar(ind, menMeans, width, color='r', bottom=0*cm, yerr=menStd) + + +womenMeans = (145*cm, 149*cm, 172*cm, 165*cm, 200*cm) +womenStd = (30*cm, 25*cm, 20*cm, 31*cm, 22*cm) +p2 = ax.bar(ind + width, womenMeans, width, + color='y', bottom=0*cm, yerr=womenStd) + +ax.set_title('Scores by group and gender') +ax.set_xticks(ind + width / 2) +ax.set_xticklabels(('G1', 'G2', 'G3', 'G4', 'G5')) + +ax.legend((p1[0], p2[0]), ('Men', 'Women')) +ax.yaxis.set_units(inch) +ax.autoscale_view() + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: bar_unit_demo.py](https://matplotlib.org/_downloads/bar_unit_demo.py) +- [下载Jupyter notebook: bar_unit_demo.ipynb](https://matplotlib.org/_downloads/bar_unit_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/basic_units.md b/Python/matplotlab/gallery/units/basic_units.md new file mode 100644 index 00000000..f5d21fe1 --- /dev/null +++ b/Python/matplotlab/gallery/units/basic_units.md @@ -0,0 +1,373 @@ +# 基本单位 + +```python +import math + +import numpy as np + +import matplotlib.units as units +import matplotlib.ticker as ticker +from matplotlib.cbook import iterable + + +class ProxyDelegate(object): + def __init__(self, fn_name, proxy_type): + self.proxy_type = proxy_type + self.fn_name = fn_name + + def __get__(self, obj, objtype=None): + return self.proxy_type(self.fn_name, obj) + + +class TaggedValueMeta(type): + def __init__(self, name, bases, dict): + for fn_name in self._proxies: + try: + dummy = getattr(self, fn_name) + except AttributeError: + setattr(self, fn_name, + ProxyDelegate(fn_name, self._proxies[fn_name])) + + +class PassThroughProxy(object): + def __init__(self, fn_name, obj): + self.fn_name = fn_name + self.target = obj.proxy_target + + def __call__(self, *args): + fn = getattr(self.target, self.fn_name) + ret = fn(*args) + return ret + + +class ConvertArgsProxy(PassThroughProxy): + def __init__(self, fn_name, obj): + PassThroughProxy.__init__(self, fn_name, obj) + self.unit = obj.unit + + def __call__(self, *args): + converted_args = [] + for a in args: + try: + converted_args.append(a.convert_to(self.unit)) + except AttributeError: + converted_args.append(TaggedValue(a, self.unit)) + converted_args = tuple([c.get_value() for c in converted_args]) + return PassThroughProxy.__call__(self, *converted_args) + + +class ConvertReturnProxy(PassThroughProxy): + def __init__(self, fn_name, obj): + PassThroughProxy.__init__(self, fn_name, obj) + self.unit = obj.unit + + def __call__(self, *args): + ret = PassThroughProxy.__call__(self, *args) + return (NotImplemented if ret is NotImplemented + else TaggedValue(ret, self.unit)) + + +class ConvertAllProxy(PassThroughProxy): + def __init__(self, fn_name, obj): + PassThroughProxy.__init__(self, fn_name, obj) + self.unit = obj.unit + + def __call__(self, *args): + converted_args = [] + arg_units = [self.unit] + for a in args: + if hasattr(a, 'get_unit') and not hasattr(a, 'convert_to'): + # if this arg has a unit type but no conversion ability, + # this operation is prohibited + return NotImplemented + + if hasattr(a, 'convert_to'): + try: + a = a.convert_to(self.unit) + except: + pass + arg_units.append(a.get_unit()) + converted_args.append(a.get_value()) + else: + converted_args.append(a) + if hasattr(a, 'get_unit'): + arg_units.append(a.get_unit()) + else: + arg_units.append(None) + converted_args = tuple(converted_args) + ret = PassThroughProxy.__call__(self, *converted_args) + if ret is NotImplemented: + return NotImplemented + ret_unit = unit_resolver(self.fn_name, arg_units) + if ret_unit is NotImplemented: + return NotImplemented + return TaggedValue(ret, ret_unit) + + +class TaggedValue(metaclass=TaggedValueMeta): + + _proxies = {'__add__': ConvertAllProxy, + '__sub__': ConvertAllProxy, + '__mul__': ConvertAllProxy, + '__rmul__': ConvertAllProxy, + '__cmp__': ConvertAllProxy, + '__lt__': ConvertAllProxy, + '__gt__': ConvertAllProxy, + '__len__': PassThroughProxy} + + def __new__(cls, value, unit): + # generate a new subclass for value + value_class = type(value) + try: + subcls = type('TaggedValue_of_%s' % (value_class.__name__), + tuple([cls, value_class]), + {}) + if subcls not in units.registry: + units.registry[subcls] = basicConverter + return object.__new__(subcls) + except TypeError: + if cls not in units.registry: + units.registry[cls] = basicConverter + return object.__new__(cls) + + def __init__(self, value, unit): + self.value = value + self.unit = unit + self.proxy_target = self.value + + def __getattribute__(self, name): + if name.startswith('__'): + return object.__getattribute__(self, name) + variable = object.__getattribute__(self, 'value') + if hasattr(variable, name) and name not in self.__class__.__dict__: + return getattr(variable, name) + return object.__getattribute__(self, name) + + def __array__(self, dtype=object): + return np.asarray(self.value).astype(dtype) + + def __array_wrap__(self, array, context): + return TaggedValue(array, self.unit) + + def __repr__(self): + return 'TaggedValue({!r}, {!r})'.format(self.value, self.unit) + + def __str__(self): + return str(self.value) + ' in ' + str(self.unit) + + def __len__(self): + return len(self.value) + + def __iter__(self): + # Return a generator expression rather than use `yield`, so that + # TypeError is raised by iter(self) if appropriate when checking for + # iterability. + return (TaggedValue(inner, self.unit) for inner in self.value) + + def get_compressed_copy(self, mask): + new_value = np.ma.masked_array(self.value, mask=mask).compressed() + return TaggedValue(new_value, self.unit) + + def convert_to(self, unit): + if unit == self.unit or not unit: + return self + new_value = self.unit.convert_value_to(self.value, unit) + return TaggedValue(new_value, unit) + + def get_value(self): + return self.value + + def get_unit(self): + return self.unit + + +class BasicUnit(object): + def __init__(self, name, fullname=None): + self.name = name + if fullname is None: + fullname = name + self.fullname = fullname + self.conversions = dict() + + def __repr__(self): + return 'BasicUnit(%s)' % self.name + + def __str__(self): + return self.fullname + + def __call__(self, value): + return TaggedValue(value, self) + + def __mul__(self, rhs): + value = rhs + unit = self + if hasattr(rhs, 'get_unit'): + value = rhs.get_value() + unit = rhs.get_unit() + unit = unit_resolver('__mul__', (self, unit)) + if unit is NotImplemented: + return NotImplemented + return TaggedValue(value, unit) + + def __rmul__(self, lhs): + return self*lhs + + def __array_wrap__(self, array, context): + return TaggedValue(array, self) + + def __array__(self, t=None, context=None): + ret = np.array([1]) + if t is not None: + return ret.astype(t) + else: + return ret + + def add_conversion_factor(self, unit, factor): + def convert(x): + return x*factor + self.conversions[unit] = convert + + def add_conversion_fn(self, unit, fn): + self.conversions[unit] = fn + + def get_conversion_fn(self, unit): + return self.conversions[unit] + + def convert_value_to(self, value, unit): + conversion_fn = self.conversions[unit] + ret = conversion_fn(value) + return ret + + def get_unit(self): + return self + + +class UnitResolver(object): + def addition_rule(self, units): + for unit_1, unit_2 in zip(units[:-1], units[1:]): + if unit_1 != unit_2: + return NotImplemented + return units[0] + + def multiplication_rule(self, units): + non_null = [u for u in units if u] + if len(non_null) > 1: + return NotImplemented + return non_null[0] + + op_dict = { + '__mul__': multiplication_rule, + '__rmul__': multiplication_rule, + '__add__': addition_rule, + '__radd__': addition_rule, + '__sub__': addition_rule, + '__rsub__': addition_rule} + + def __call__(self, operation, units): + if operation not in self.op_dict: + return NotImplemented + + return self.op_dict[operation](self, units) + + +unit_resolver = UnitResolver() + +cm = BasicUnit('cm', 'centimeters') +inch = BasicUnit('inch', 'inches') +inch.add_conversion_factor(cm, 2.54) +cm.add_conversion_factor(inch, 1/2.54) + +radians = BasicUnit('rad', 'radians') +degrees = BasicUnit('deg', 'degrees') +radians.add_conversion_factor(degrees, 180.0/np.pi) +degrees.add_conversion_factor(radians, np.pi/180.0) + +secs = BasicUnit('s', 'seconds') +hertz = BasicUnit('Hz', 'Hertz') +minutes = BasicUnit('min', 'minutes') + +secs.add_conversion_fn(hertz, lambda x: 1./x) +secs.add_conversion_factor(minutes, 1/60.0) + + +# radians formatting +def rad_fn(x, pos=None): + if x >= 0: + n = int((x / np.pi) * 2.0 + 0.25) + else: + n = int((x / np.pi) * 2.0 - 0.25) + + if n == 0: + return '0' + elif n == 1: + return r'$\pi/2$' + elif n == 2: + return r'$\pi$' + elif n == -1: + return r'$-\pi/2$' + elif n == -2: + return r'$-\pi$' + elif n % 2 == 0: + return r'$%s\pi$' % (n//2,) + else: + return r'$%s\pi/2$' % (n,) + + +class BasicUnitConverter(units.ConversionInterface): + @staticmethod + def axisinfo(unit, axis): + 'return AxisInfo instance for x and unit' + + if unit == radians: + return units.AxisInfo( + majloc=ticker.MultipleLocator(base=np.pi/2), + majfmt=ticker.FuncFormatter(rad_fn), + label=unit.fullname, + ) + elif unit == degrees: + return units.AxisInfo( + majloc=ticker.AutoLocator(), + majfmt=ticker.FormatStrFormatter(r'$%i^\circ$'), + label=unit.fullname, + ) + elif unit is not None: + if hasattr(unit, 'fullname'): + return units.AxisInfo(label=unit.fullname) + elif hasattr(unit, 'unit'): + return units.AxisInfo(label=unit.unit.fullname) + return None + + @staticmethod + def convert(val, unit, axis): + if units.ConversionInterface.is_numlike(val): + return val + if iterable(val): + return [thisval.convert_to(unit).get_value() for thisval in val] + else: + return val.convert_to(unit).get_value() + + @staticmethod + def default_units(x, axis): + 'return the default unit for x or None' + if iterable(x): + for thisx in x: + return thisx.unit + return x.unit + + +def cos(x): + if iterable(x): + return [math.cos(val.convert_to(radians).get_value()) for val in x] + else: + return math.cos(x.convert_to(radians).get_value()) + + +basicConverter = BasicUnitConverter() +units.registry[BasicUnit] = basicConverter +units.registry[TaggedValue] = basicConverter +``` + +## 下载这个示例 + +- [下载python源码: basic_units.py](https://matplotlib.org/_downloads/basic_units.py) +- [下载Jupyter notebook: basic_units.ipynb](https://matplotlib.org/_downloads/basic_units.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/ellipse_with_units.md b/Python/matplotlab/gallery/units/ellipse_with_units.md new file mode 100644 index 00000000..7dbb7458 --- /dev/null +++ b/Python/matplotlab/gallery/units/ellipse_with_units.md @@ -0,0 +1,85 @@ +# 椭圆与单位 + +比较用弧形生成的椭圆与多边形近似 + +此示例需要 [basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + +```python +from basic_units import cm +import numpy as np +from matplotlib import patches +import matplotlib.pyplot as plt + + +xcenter, ycenter = 0.38*cm, 0.52*cm +width, height = 1e-1*cm, 3e-1*cm +angle = -30 + +theta = np.deg2rad(np.arange(0.0, 360.0, 1.0)) +x = 0.5 * width * np.cos(theta) +y = 0.5 * height * np.sin(theta) + +rtheta = np.radians(angle) +R = np.array([ + [np.cos(rtheta), -np.sin(rtheta)], + [np.sin(rtheta), np.cos(rtheta)], + ]) + + +x, y = np.dot(R, np.array([x, y])) +x += xcenter +y += ycenter +``` + +```python +fig = plt.figure() +ax = fig.add_subplot(211, aspect='auto') +ax.fill(x, y, alpha=0.2, facecolor='yellow', + edgecolor='yellow', linewidth=1, zorder=1) + +e1 = patches.Ellipse((xcenter, ycenter), width, height, + angle=angle, linewidth=2, fill=False, zorder=2) + +ax.add_patch(e1) + +ax = fig.add_subplot(212, aspect='equal') +ax.fill(x, y, alpha=0.2, facecolor='green', edgecolor='green', zorder=1) +e2 = patches.Ellipse((xcenter, ycenter), width, height, + angle=angle, linewidth=2, fill=False, zorder=2) + + +ax.add_patch(e2) +fig.savefig('ellipse_compare') +``` + +![椭圆与单位示例](https://matplotlib.org/_images/sphx_glr_ellipse_with_units_001.png) + +```python +fig = plt.figure() +ax = fig.add_subplot(211, aspect='auto') +ax.fill(x, y, alpha=0.2, facecolor='yellow', + edgecolor='yellow', linewidth=1, zorder=1) + +e1 = patches.Arc((xcenter, ycenter), width, height, + angle=angle, linewidth=2, fill=False, zorder=2) + +ax.add_patch(e1) + +ax = fig.add_subplot(212, aspect='equal') +ax.fill(x, y, alpha=0.2, facecolor='green', edgecolor='green', zorder=1) +e2 = patches.Arc((xcenter, ycenter), width, height, + angle=angle, linewidth=2, fill=False, zorder=2) + + +ax.add_patch(e2) +fig.savefig('arc_compare') + +plt.show() +``` + +![椭圆与单位示例2](https://matplotlib.org/_images/sphx_glr_ellipse_with_units_002.png) + +## 下载这个示例 + +- [下载python源码: ellipse_with_units.py](https://matplotlib.org/_downloads/ellipse_with_units.py) +- [下载Jupyter notebook: ellipse_with_units.ipynb](https://matplotlib.org/_downloads/ellipse_with_units.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/evans_test.md b/Python/matplotlab/gallery/units/evans_test.md new file mode 100644 index 00000000..6fb9e151 --- /dev/null +++ b/Python/matplotlab/gallery/units/evans_test.md @@ -0,0 +1,94 @@ +# Evans测试 + +一个模型“Foo”单元类,它根据“单元”支持转换和不同的刻度格式。 这里的“单位”只是一个标量转换因子,但是这个例子表明Matplotlib完全不知道客户端软件包使用哪种单位。 + +![Evans测试示例](https://matplotlib.org/_images/sphx_glr_evans_test_001.png) + +```python +from matplotlib.cbook import iterable +import matplotlib.units as units +import matplotlib.ticker as ticker +import matplotlib.pyplot as plt + + +class Foo(object): + def __init__(self, val, unit=1.0): + self.unit = unit + self._val = val * unit + + def value(self, unit): + if unit is None: + unit = self.unit + return self._val / unit + + +class FooConverter(object): + @staticmethod + def axisinfo(unit, axis): + 'return the Foo AxisInfo' + if unit == 1.0 or unit == 2.0: + return units.AxisInfo( + majloc=ticker.IndexLocator(8, 0), + majfmt=ticker.FormatStrFormatter("VAL: %s"), + label='foo', + ) + + else: + return None + + @staticmethod + def convert(obj, unit, axis): + """ + convert obj using unit. If obj is a sequence, return the + converted sequence + """ + if units.ConversionInterface.is_numlike(obj): + return obj + + if iterable(obj): + return [o.value(unit) for o in obj] + else: + return obj.value(unit) + + @staticmethod + def default_units(x, axis): + 'return the default unit for x or None' + if iterable(x): + for thisx in x: + return thisx.unit + else: + return x.unit + + +units.registry[Foo] = FooConverter() + +# create some Foos +x = [] +for val in range(0, 50, 2): + x.append(Foo(val, 1.0)) + +# and some arbitrary y data +y = [i for i in range(len(x))] + + +fig, (ax1, ax2) = plt.subplots(1, 2) +fig.suptitle("Custom units") +fig.subplots_adjust(bottom=0.2) + +# plot specifying units +ax2.plot(x, y, 'o', xunits=2.0) +ax2.set_title("xunits = 2.0") +plt.setp(ax2.get_xticklabels(), rotation=30, ha='right') + +# plot without specifying units; will use the None branch for axisinfo +ax1.plot(x, y) # uses default units +ax1.set_title('default units') +plt.setp(ax1.get_xticklabels(), rotation=30, ha='right') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: evans_test.py](https://matplotlib.org/_downloads/evans_test.py) +- [下载Jupyter notebook: evans_test.ipynb](https://matplotlib.org/_downloads/evans_test.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/index.md b/Python/matplotlab/gallery/units/index.md new file mode 100644 index 00000000..6459857f --- /dev/null +++ b/Python/matplotlab/gallery/units/index.md @@ -0,0 +1,3 @@ +# 单位 + +这些示例涵盖了Matplotlib中单元的许多表示形式。 \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/radian_demo.md b/Python/matplotlab/gallery/units/radian_demo.md new file mode 100644 index 00000000..a1274f54 --- /dev/null +++ b/Python/matplotlab/gallery/units/radian_demo.md @@ -0,0 +1,31 @@ +# 弧度刻度 + +使用basic_units模型示例包中的弧度绘图。 + +此示例显示单元类如何确定刻度定位,格式设置和轴标记。 + +此示例需要[basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + +![弧度刻度示例](https://matplotlib.org/_images/sphx_glr_radian_demo_001.png) + +```python +import matplotlib.pyplot as plt +import numpy as np + +from basic_units import radians, degrees, cos + +x = [val*radians for val in np.arange(0, 15, 0.01)] + +fig, axs = plt.subplots(2) + +axs[0].plot(x, cos(x), xunits=radians) +axs[1].plot(x, cos(x), xunits=degrees) + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: radian_demo.py](https://matplotlib.org/_downloads/radian_demo.py) +- [下载Jupyter notebook: radian_demo.ipynb](https://matplotlib.org/_downloads/radian_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/units_sample.md b/Python/matplotlab/gallery/units/units_sample.md new file mode 100644 index 00000000..93061890 --- /dev/null +++ b/Python/matplotlab/gallery/units/units_sample.md @@ -0,0 +1,34 @@ +# 英寸和厘米 + +该示例说明了使用绘图函数的xunits和yunits参数将默认x和y单位(ax1)覆盖为英寸和厘米的功能。 请注意,应用转换以获取正确单位的数字。 + +此示例需要[basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + +![英寸和厘米示例](https://matplotlib.org/_images/sphx_glr_units_sample_001.png) + +```python +from basic_units import cm, inch +import matplotlib.pyplot as plt +import numpy as np + +cms = cm * np.arange(0, 10, 2) + +fig, axs = plt.subplots(2, 2) + +axs[0, 0].plot(cms, cms) + +axs[0, 1].plot(cms, cms, xunits=cm, yunits=inch) + +axs[1, 0].plot(cms, cms, xunits=inch, yunits=cm) +axs[1, 0].set_xlim(3, 6) # scalars are interpreted in current units + +axs[1, 1].plot(cms, cms, xunits=inch, yunits=inch) +axs[1, 1].set_xlim(3*cm, 6*cm) # cm are converted to inches + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: units_sample.py](https://matplotlib.org/_downloads/units_sample.py) +- [下载Jupyter notebook: units_sample.ipynb](https://matplotlib.org/_downloads/units_sample.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/units/units_scatter.md b/Python/matplotlab/gallery/units/units_scatter.md new file mode 100644 index 00000000..32620784 --- /dev/null +++ b/Python/matplotlab/gallery/units/units_scatter.md @@ -0,0 +1,38 @@ +# 单位处理 + +下面的示例显示了对掩码数组的单位转换的支持。 + +此示例需要[basic_units.py](https://matplotlib.org/_downloads/3a73b4cd6e12aa53ff277b1b80d631c1/basic_units.py) + +![单位处理示例](https://matplotlib.org/_images/sphx_glr_units_scatter_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from basic_units import secs, hertz, minutes + +# create masked array +data = (1, 2, 3, 4, 5, 6, 7, 8) +mask = (1, 0, 1, 0, 0, 0, 1, 0) +xsecs = secs * np.ma.MaskedArray(data, mask, float) + +fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, sharex=True) +ax1.scatter(xsecs, xsecs) +ax1.yaxis.set_units(secs) +ax1.axis([0, 10, 0, 10]) + +ax2.scatter(xsecs, xsecs, yunits=hertz) +ax2.axis([0, 10, 0, 1]) + +ax3.scatter(xsecs, xsecs, yunits=hertz) +ax3.yaxis.set_units(minutes) +ax3.axis([0, 10, 0, 1]) + +fig.tight_layout() +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: units_scatter.py](https://matplotlib.org/_downloads/units_scatter.py) +- [下载Jupyter notebook: units_scatter.ipynb](https://matplotlib.org/_downloads/units_scatter.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/canvasagg.md b/Python/matplotlab/gallery/user_interfaces/canvasagg.md new file mode 100644 index 00000000..5315817a --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/canvasagg.md @@ -0,0 +1,59 @@ +# CanvasAgg演示 + +此示例展示了如何直接使用AGG后端创建图像,对于希望完全控制其代码而不使用pylot界面来管理图形、图形关闭等的Web应用程序开发人员来说,这可能是有用的。 + +**注意:**没有必要避免使用图形前端 - 只需将后端设置为“Agg”就足够了。 + +在这个例子中,我们展示了如何将画布的内容保存到文件,以及如何将它们提取到一个字符串,该字符串可以传递给PIL或放在一个numpy数组中。 后一种功能允许例如使用没有文档到磁盘的cp脚本。 + +```python +from matplotlib.backends.backend_agg import FigureCanvasAgg +from matplotlib.figure import Figure +import numpy as np + +fig = Figure(figsize=(5, 4), dpi=100) +# A canvas must be manually attached to the figure (pyplot would automatically +# do it). This is done by instantiating the canvas with the figure as +# argument. +canvas = FigureCanvasAgg(fig) + +# Do some plotting. +ax = fig.add_subplot(111) +ax.plot([1, 2, 3]) + +# Option 1: Save the figure to a file; can also be a file-like object (BytesIO, +# etc.). +fig.savefig("test.png") + +# Option 2: Save the figure to a string. +canvas.draw() +s, (width, height) = canvas.print_to_buffer() + +# Option 2a: Convert to a NumPy array. +X = np.fromstring(s, np.uint8).reshape((height, width, 4)) + +# Option 2b: Pass off to PIL. +from PIL import Image +im = Image.frombytes("RGBA", (width, height), s) + +# Uncomment this line to display the image using ImageMagick's `display` tool. +# im.show() +``` + +## 参考 + +此示例中显示了以下函数,方法,类和模块的使用: + +```python +import matplotlib +matplotlib.backends.backend_agg.FigureCanvasAgg +matplotlib.figure.Figure +matplotlib.figure.Figure.add_subplot +matplotlib.figure.Figure.savefig +matplotlib.axes.Axes.plot +``` + +## 下载这个示例 + +- [下载python源码: canvasagg.py](https://matplotlib.org/_downloads/canvasagg.py) +- [下载Jupyter notebook: canvasagg.ipynb](https://matplotlib.org/_downloads/canvasagg.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_gtk3_panzoom_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_gtk3_panzoom_sgskip.md new file mode 100644 index 00000000..a087c0dd --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_gtk3_panzoom_sgskip.md @@ -0,0 +1,46 @@ +# 嵌入GTK3 Panzoom + +演示通过pygobject访问GTK3的NavigationToolbar。 + +```python +import gi +gi.require_version('Gtk', '3.0') +from gi.repository import Gtk + +from matplotlib.backends.backend_gtk3 import ( + NavigationToolbar2GTK3 as NavigationToolbar) +from matplotlib.backends.backend_gtk3agg import ( + FigureCanvasGTK3Agg as FigureCanvas) +from matplotlib.figure import Figure +import numpy as np + +win = Gtk.Window() +win.connect("delete-event", Gtk.main_quit) +win.set_default_size(400, 300) +win.set_title("Embedding in GTK") + +f = Figure(figsize=(5, 4), dpi=100) +a = f.add_subplot(1, 1, 1) +t = np.arange(0.0, 3.0, 0.01) +s = np.sin(2*np.pi*t) +a.plot(t, s) + +vbox = Gtk.VBox() +win.add(vbox) + +# Add canvas to vbox +canvas = FigureCanvas(f) # a Gtk.DrawingArea +vbox.pack_start(canvas, True, True, 0) + +# Create toolbar +toolbar = NavigationToolbar(canvas, win) +vbox.pack_start(toolbar, False, False, 0) + +win.show_all() +Gtk.main() +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_gtk3_panzoom_sgskip.py](https://matplotlib.org/_downloads/embedding_in_gtk3_panzoom_sgskip.py) +- [下载Jupyter notebook: embedding_in_gtk3_panzoom_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_gtk3_panzoom_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_gtk3_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_gtk3_sgskip.md new file mode 100644 index 00000000..7220393e --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_gtk3_sgskip.md @@ -0,0 +1,42 @@ +# 嵌入GTK3 + +演示使用通过pygobject访问的GTK3将FigureCanvasGTK3Agg小部件添加到Gtk.ScrolledWindow。 + +```python +import gi +gi.require_version('Gtk', '3.0') +from gi.repository import Gtk + +from matplotlib.backends.backend_gtk3agg import ( + FigureCanvasGTK3Agg as FigureCanvas) +from matplotlib.figure import Figure +import numpy as np + +win = Gtk.Window() +win.connect("delete-event", Gtk.main_quit) +win.set_default_size(400, 300) +win.set_title("Embedding in GTK") + +f = Figure(figsize=(5, 4), dpi=100) +a = f.add_subplot(111) +t = np.arange(0.0, 3.0, 0.01) +s = np.sin(2*np.pi*t) +a.plot(t, s) + +sw = Gtk.ScrolledWindow() +win.add(sw) +# A scrolled window border goes outside the scrollbars and viewport +sw.set_border_width(10) + +canvas = FigureCanvas(f) # a Gtk.DrawingArea +canvas.set_size_request(800, 600) +sw.add_with_viewport(canvas) + +win.show_all() +Gtk.main() +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_gtk3_sgskip.py](https://matplotlib.org/_downloads/embedding_in_gtk3_sgskip.py) +- [下载Jupyter notebook: embedding_in_gtk3_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_gtk3_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_qt_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_qt_sgskip.md new file mode 100644 index 00000000..9eec1b4a --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_qt_sgskip.md @@ -0,0 +1,64 @@ +# 嵌入Qt + +简单的Qt应用程序嵌入Matplotlib画布。 该程序将使用Qt4和Qt5很好地工作。 通过将MPLBACKEND环境变量设置为“Qt4Agg”或“Qt5Agg”,或者首先导入所需的PyQt版本,可以选择任一版本的Qt(例如)。 + +```python +import sys +import time + +import numpy as np + +from matplotlib.backends.qt_compat import QtCore, QtWidgets, is_pyqt5 +if is_pyqt5(): + from matplotlib.backends.backend_qt5agg import ( + FigureCanvas, NavigationToolbar2QT as NavigationToolbar) +else: + from matplotlib.backends.backend_qt4agg import ( + FigureCanvas, NavigationToolbar2QT as NavigationToolbar) +from matplotlib.figure import Figure + + +class ApplicationWindow(QtWidgets.QMainWindow): + def __init__(self): + super().__init__() + self._main = QtWidgets.QWidget() + self.setCentralWidget(self._main) + layout = QtWidgets.QVBoxLayout(self._main) + + static_canvas = FigureCanvas(Figure(figsize=(5, 3))) + layout.addWidget(static_canvas) + self.addToolBar(NavigationToolbar(static_canvas, self)) + + dynamic_canvas = FigureCanvas(Figure(figsize=(5, 3))) + layout.addWidget(dynamic_canvas) + self.addToolBar(QtCore.Qt.BottomToolBarArea, + NavigationToolbar(dynamic_canvas, self)) + + self._static_ax = static_canvas.figure.subplots() + t = np.linspace(0, 10, 501) + self._static_ax.plot(t, np.tan(t), ".") + + self._dynamic_ax = dynamic_canvas.figure.subplots() + self._timer = dynamic_canvas.new_timer( + 100, [(self._update_canvas, (), {})]) + self._timer.start() + + def _update_canvas(self): + self._dynamic_ax.clear() + t = np.linspace(0, 10, 101) + # Shift the sinusoid as a function of time. + self._dynamic_ax.plot(t, np.sin(t + time.time())) + self._dynamic_ax.figure.canvas.draw() + + +if __name__ == "__main__": + qapp = QtWidgets.QApplication(sys.argv) + app = ApplicationWindow() + app.show() + qapp.exec_() +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_qt_sgskip.py](https://matplotlib.org/_downloads/embedding_in_qt_sgskip.py) +- [下载Jupyter notebook: embedding_in_qt_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_qt_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_tk_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_tk_sgskip.md new file mode 100644 index 00000000..ae971e0a --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_tk_sgskip.md @@ -0,0 +1,56 @@ +# 嵌入Tk + +```python +import tkinter + +from matplotlib.backends.backend_tkagg import ( + FigureCanvasTkAgg, NavigationToolbar2Tk) +# Implement the default Matplotlib key bindings. +from matplotlib.backend_bases import key_press_handler +from matplotlib.figure import Figure + +import numpy as np + + +root = tkinter.Tk() +root.wm_title("Embedding in Tk") + +fig = Figure(figsize=(5, 4), dpi=100) +t = np.arange(0, 3, .01) +fig.add_subplot(111).plot(t, 2 * np.sin(2 * np.pi * t)) + +canvas = FigureCanvasTkAgg(fig, master=root) # A tk.DrawingArea. +canvas.draw() +canvas.get_tk_widget().pack(side=tkinter.TOP, fill=tkinter.BOTH, expand=1) + +toolbar = NavigationToolbar2Tk(canvas, root) +toolbar.update() +canvas.get_tk_widget().pack(side=tkinter.TOP, fill=tkinter.BOTH, expand=1) + + +def on_key_press(event): + print("you pressed {}".format(event.key)) + key_press_handler(event, canvas, toolbar) + + +canvas.mpl_connect("key_press_event", on_key_press) + + +def _quit(): + root.quit() # stops mainloop + root.destroy() # this is necessary on Windows to prevent + # Fatal Python Error: PyEval_RestoreThread: NULL tstate + + +button = tkinter.Button(master=root, text="Quit", command=_quit) +button.pack(side=tkinter.BOTTOM) + +tkinter.mainloop() +# If you put root.destroy() here, it will cause an error if the window is +# closed with the window manager. +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_tk_sgskip.py](https://matplotlib.org/_downloads/embedding_in_tk_sgskip.py) +- [下载Jupyter notebook: embedding_in_tk_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_tk_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_wx2_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx2_sgskip.md new file mode 100644 index 00000000..2054a9a7 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx2_sgskip.md @@ -0,0 +1,64 @@ +# 嵌入Wx2 + +如何在具有新工具栏的应用程序中使用wxagg的示例 + +```python +from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas +from matplotlib.backends.backend_wx import NavigationToolbar2Wx as NavigationToolbar +from matplotlib.figure import Figure + +import numpy as np + +import wx +import wx.lib.mixins.inspection as WIT + + +class CanvasFrame(wx.Frame): + def __init__(self): + wx.Frame.__init__(self, None, -1, + 'CanvasFrame', size=(550, 350)) + + self.figure = Figure() + self.axes = self.figure.add_subplot(111) + t = np.arange(0.0, 3.0, 0.01) + s = np.sin(2 * np.pi * t) + + self.axes.plot(t, s) + self.canvas = FigureCanvas(self, -1, self.figure) + + self.sizer = wx.BoxSizer(wx.VERTICAL) + self.sizer.Add(self.canvas, 1, wx.LEFT | wx.TOP | wx.EXPAND) + self.SetSizer(self.sizer) + self.Fit() + + self.add_toolbar() # comment this out for no toolbar + + def add_toolbar(self): + self.toolbar = NavigationToolbar(self.canvas) + self.toolbar.Realize() + # By adding toolbar in sizer, we are able to put it at the bottom + # of the frame - so appearance is closer to GTK version. + self.sizer.Add(self.toolbar, 0, wx.LEFT | wx.EXPAND) + # update the axes menu on the toolbar + self.toolbar.update() + + +# alternatively you could use +#class App(wx.App): +class App(WIT.InspectableApp): + def OnInit(self): + 'Create the main window and insert the custom frame' + self.Init() + frame = CanvasFrame() + frame.Show(True) + + return True + +app = App(0) +app.MainLoop() +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_wx2_sgskip.py](https://matplotlib.org/_downloads/embedding_in_wx2_sgskip.py) +- [下载Jupyter notebook: embedding_in_wx2_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_wx2_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_wx3_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx3_sgskip.md new file mode 100644 index 00000000..e27fba00 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx3_sgskip.md @@ -0,0 +1,149 @@ +# 嵌入Wx3 + +版权所有(C)2003-2004 Andrew Straw和Jeremy O'Donoghue等人 + +许可证:此作品根据PSF许可。该文档也应该在 https://docs.python.org/3/license.html 上提供 + +这是使用matplotlib和wx的另一个例子。 希望这是功能齐全的: + +- matplotlib工具栏和WX按钮 +- 完整的wxApp框架,包括小部件交互 +- XRC(XML wxWidgets资源)文件创建GUI(用XRCed制作) + +这是从embedding_in_wx和dynamic_image_wxagg派生的。 + +感谢matplotlib和wx团队创建这样出色的软件! + +```python +import matplotlib +import matplotlib.cm as cm +import matplotlib.cbook as cbook +from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas +from matplotlib.backends.backend_wxagg import NavigationToolbar2WxAgg as NavigationToolbar +from matplotlib.figure import Figure +import numpy as np + +import wx +import wx.xrc as xrc + +ERR_TOL = 1e-5 # floating point slop for peak-detection + + +matplotlib.rc('image', origin='lower') + + +class PlotPanel(wx.Panel): + def __init__(self, parent): + wx.Panel.__init__(self, parent, -1) + + self.fig = Figure((5, 4), 75) + self.canvas = FigureCanvas(self, -1, self.fig) + self.toolbar = NavigationToolbar(self.canvas) # matplotlib toolbar + self.toolbar.Realize() + # self.toolbar.set_active([0,1]) + + # Now put all into a sizer + sizer = wx.BoxSizer(wx.VERTICAL) + # This way of adding to sizer allows resizing + sizer.Add(self.canvas, 1, wx.LEFT | wx.TOP | wx.GROW) + # Best to allow the toolbar to resize! + sizer.Add(self.toolbar, 0, wx.GROW) + self.SetSizer(sizer) + self.Fit() + + def init_plot_data(self): + a = self.fig.add_subplot(111) + + x = np.arange(120.0) * 2 * np.pi / 60.0 + y = np.arange(100.0) * 2 * np.pi / 50.0 + self.x, self.y = np.meshgrid(x, y) + z = np.sin(self.x) + np.cos(self.y) + self.im = a.imshow(z, cmap=cm.RdBu) # , interpolation='nearest') + + zmax = np.max(z) - ERR_TOL + ymax_i, xmax_i = np.nonzero(z >= zmax) + if self.im.origin == 'upper': + ymax_i = z.shape[0] - ymax_i + self.lines = a.plot(xmax_i, ymax_i, 'ko') + + self.toolbar.update() # Not sure why this is needed - ADS + + def GetToolBar(self): + # You will need to override GetToolBar if you are using an + # unmanaged toolbar in your frame + return self.toolbar + + def OnWhiz(self, evt): + self.x += np.pi / 15 + self.y += np.pi / 20 + z = np.sin(self.x) + np.cos(self.y) + self.im.set_array(z) + + zmax = np.max(z) - ERR_TOL + ymax_i, xmax_i = np.nonzero(z >= zmax) + if self.im.origin == 'upper': + ymax_i = z.shape[0] - ymax_i + self.lines[0].set_data(xmax_i, ymax_i) + + self.canvas.draw() + + +class MyApp(wx.App): + def OnInit(self): + xrcfile = cbook.get_sample_data('embedding_in_wx3.xrc', + asfileobj=False) + print('loading', xrcfile) + + self.res = xrc.XmlResource(xrcfile) + + # main frame and panel --------- + + self.frame = self.res.LoadFrame(None, "MainFrame") + self.panel = xrc.XRCCTRL(self.frame, "MainPanel") + + # matplotlib panel ------------- + + # container for matplotlib panel (I like to make a container + # panel for our panel so I know where it'll go when in XRCed.) + plot_container = xrc.XRCCTRL(self.frame, "plot_container_panel") + sizer = wx.BoxSizer(wx.VERTICAL) + + # matplotlib panel itself + self.plotpanel = PlotPanel(plot_container) + self.plotpanel.init_plot_data() + + # wx boilerplate + sizer.Add(self.plotpanel, 1, wx.EXPAND) + plot_container.SetSizer(sizer) + + # whiz button ------------------ + whiz_button = xrc.XRCCTRL(self.frame, "whiz_button") + whiz_button.Bind(wx.EVT_BUTTON, self.plotpanel.OnWhiz) + + # bang button ------------------ + bang_button = xrc.XRCCTRL(self.frame, "bang_button") + bang_button.Bind(wx.EVT_BUTTON, self.OnBang) + + # final setup ------------------ + sizer = self.panel.GetSizer() + self.frame.Show(1) + + self.SetTopWindow(self.frame) + + return True + + def OnBang(self, event): + bang_count = xrc.XRCCTRL(self.frame, "bang_count") + bangs = bang_count.GetValue() + bangs = int(bangs) + 1 + bang_count.SetValue(str(bangs)) + +if __name__ == '__main__': + app = MyApp(0) + app.MainLoop() +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_wx3_sgskip.py](https://matplotlib.org/_downloads/embedding_in_wx3_sgskip.py) +- [下载Jupyter notebook: embedding_in_wx3_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_wx3_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_wx4_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx4_sgskip.md new file mode 100644 index 00000000..1658d6b0 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx4_sgskip.md @@ -0,0 +1,95 @@ +# 嵌入Wx4 + +如何在具有自定义工具栏的应用程序中使用wxagg的示例。 + +```python +from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas +from matplotlib.backends.backend_wxagg import NavigationToolbar2WxAgg as NavigationToolbar +from matplotlib.backends.backend_wx import _load_bitmap +from matplotlib.figure import Figure + +import numpy as np + +import wx + + +class MyNavigationToolbar(NavigationToolbar): + """ + Extend the default wx toolbar with your own event handlers + """ + ON_CUSTOM = wx.NewId() + + def __init__(self, canvas, cankill): + NavigationToolbar.__init__(self, canvas) + + # for simplicity I'm going to reuse a bitmap from wx, you'll + # probably want to add your own. + self.AddTool(self.ON_CUSTOM, 'Click me', _load_bitmap('back.png'), + 'Activate custom contol') + self.Bind(wx.EVT_TOOL, self._on_custom, id=self.ON_CUSTOM) + + def _on_custom(self, evt): + # add some text to the axes in a random location in axes (0,1) + # coords) with a random color + + # get the axes + ax = self.canvas.figure.axes[0] + + # generate a random location can color + x, y = np.random.rand(2) + rgb = np.random.rand(3) + + # add the text and draw + ax.text(x, y, 'You clicked me', + transform=ax.transAxes, + color=rgb) + self.canvas.draw() + evt.Skip() + + +class CanvasFrame(wx.Frame): + def __init__(self): + wx.Frame.__init__(self, None, -1, + 'CanvasFrame', size=(550, 350)) + + self.figure = Figure(figsize=(5, 4), dpi=100) + self.axes = self.figure.add_subplot(111) + t = np.arange(0.0, 3.0, 0.01) + s = np.sin(2 * np.pi * t) + + self.axes.plot(t, s) + + self.canvas = FigureCanvas(self, -1, self.figure) + + self.sizer = wx.BoxSizer(wx.VERTICAL) + self.sizer.Add(self.canvas, 1, wx.TOP | wx.LEFT | wx.EXPAND) + + self.toolbar = MyNavigationToolbar(self.canvas, True) + self.toolbar.Realize() + # By adding toolbar in sizer, we are able to put it at the bottom + # of the frame - so appearance is closer to GTK version. + self.sizer.Add(self.toolbar, 0, wx.LEFT | wx.EXPAND) + + # update the axes menu on the toolbar + self.toolbar.update() + self.SetSizer(self.sizer) + self.Fit() + + +class App(wx.App): + def OnInit(self): + 'Create the main window and insert the custom frame' + frame = CanvasFrame() + frame.Show(True) + + return True + +app = App(0) +app.MainLoop() +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_wx4_sgskip.py](https://matplotlib.org/_downloads/embedding_in_wx4_sgskip.py) +- [下载Jupyter notebook: embedding_in_wx4_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_wx4_sgskip.ipynb) + diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_in_wx5_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx5_sgskip.md new file mode 100644 index 00000000..105ab0ed --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_in_wx5_sgskip.md @@ -0,0 +1,63 @@ +# 嵌入Wx5 + +```python +import wx +import wx.lib.agw.aui as aui +import wx.lib.mixins.inspection as wit + +import matplotlib as mpl +from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas +from matplotlib.backends.backend_wxagg import NavigationToolbar2WxAgg as NavigationToolbar + + +class Plot(wx.Panel): + def __init__(self, parent, id=-1, dpi=None, **kwargs): + wx.Panel.__init__(self, parent, id=id, **kwargs) + self.figure = mpl.figure.Figure(dpi=dpi, figsize=(2, 2)) + self.canvas = FigureCanvas(self, -1, self.figure) + self.toolbar = NavigationToolbar(self.canvas) + self.toolbar.Realize() + + sizer = wx.BoxSizer(wx.VERTICAL) + sizer.Add(self.canvas, 1, wx.EXPAND) + sizer.Add(self.toolbar, 0, wx.LEFT | wx.EXPAND) + self.SetSizer(sizer) + + +class PlotNotebook(wx.Panel): + def __init__(self, parent, id=-1): + wx.Panel.__init__(self, parent, id=id) + self.nb = aui.AuiNotebook(self) + sizer = wx.BoxSizer() + sizer.Add(self.nb, 1, wx.EXPAND) + self.SetSizer(sizer) + + def add(self, name="plot"): + page = Plot(self.nb) + self.nb.AddPage(page, name) + return page.figure + + +def demo(): + # alternatively you could use + #app = wx.App() + # InspectableApp is a great debug tool, see: + # http://wiki.wxpython.org/Widget%20Inspection%20Tool + app = wit.InspectableApp() + frame = wx.Frame(None, -1, 'Plotter') + plotter = PlotNotebook(frame) + axes1 = plotter.add('figure 1').gca() + axes1.plot([1, 2, 3], [2, 1, 4]) + axes2 = plotter.add('figure 2').gca() + axes2.plot([1, 2, 3, 4, 5], [2, 1, 4, 2, 3]) + frame.Show() + app.MainLoop() + +if __name__ == "__main__": + demo() +``` + +## 下载这个示例 + +- [下载python源码: embedding_in_wx5_sgskip.py](https://matplotlib.org/_downloads/embedding_in_wx5_sgskip.py) +- [下载Jupyter notebook: embedding_in_wx5_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_in_wx5_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/embedding_webagg_sgskip.md b/Python/matplotlab/gallery/user_interfaces/embedding_webagg_sgskip.md new file mode 100644 index 00000000..a6297117 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/embedding_webagg_sgskip.md @@ -0,0 +1,249 @@ +# 嵌入WebAgg + +此示例演示如何在您自己的Web应用程序和框架中嵌入matplotlib WebAgg交互式绘图。 +基于龙卷风的服务器“侧面”是基于龙卷风的服务器。 + +使用的框架必须支持Web套接字。 + +``` python +import io + +try: + import tornado +except ImportError: + raise RuntimeError("This example requires tornado.") +import tornado.web +import tornado.httpserver +import tornado.ioloop +import tornado.websocket + + +from matplotlib.backends.backend_webagg_core import ( + FigureManagerWebAgg, new_figure_manager_given_figure) +from matplotlib.figure import Figure + +import numpy as np + +import json + + +def create_figure(): + """ + Creates a simple example figure. + """ + fig = Figure() + a = fig.add_subplot(111) + t = np.arange(0.0, 3.0, 0.01) + s = np.sin(2 * np.pi * t) + a.plot(t, s) + return fig + + +# The following is the content of the web page. You would normally +# generate this using some sort of template facility in your web +# framework, but here we just use Python string formatting. +html_content = """ + + + + + + + + + + + + + + matplotlib + + + +
+
+ + +""" + + +class MyApplication(tornado.web.Application): + class MainPage(tornado.web.RequestHandler): + """ + Serves the main HTML page. + """ + + def get(self): + manager = self.application.manager + ws_uri = "ws://{req.host}/".format(req=self.request) + content = html_content % { + "ws_uri": ws_uri, "fig_id": manager.num} + self.write(content) + + class MplJs(tornado.web.RequestHandler): + """ + Serves the generated matplotlib javascript file. The content + is dynamically generated based on which toolbar functions the + user has defined. Call `FigureManagerWebAgg` to get its + content. + """ + + def get(self): + self.set_header('Content-Type', 'application/javascript') + js_content = FigureManagerWebAgg.get_javascript() + + self.write(js_content) + + class Download(tornado.web.RequestHandler): + """ + Handles downloading of the figure in various file formats. + """ + + def get(self, fmt): + manager = self.application.manager + + mimetypes = { + 'ps': 'application/postscript', + 'eps': 'application/postscript', + 'pdf': 'application/pdf', + 'svg': 'image/svg+xml', + 'png': 'image/png', + 'jpeg': 'image/jpeg', + 'tif': 'image/tiff', + 'emf': 'application/emf' + } + + self.set_header('Content-Type', mimetypes.get(fmt, 'binary')) + + buff = io.BytesIO() + manager.canvas.figure.savefig(buff, format=fmt) + self.write(buff.getvalue()) + + class WebSocket(tornado.websocket.WebSocketHandler): + """ + A websocket for interactive communication between the plot in + the browser and the server. + + In addition to the methods required by tornado, it is required to + have two callback methods: + + - ``send_json(json_content)`` is called by matplotlib when + it needs to send json to the browser. `json_content` is + a JSON tree (Python dictionary), and it is the responsibility + of this implementation to encode it as a string to send over + the socket. + + - ``send_binary(blob)`` is called to send binary image data + to the browser. + """ + supports_binary = True + + def open(self): + # Register the websocket with the FigureManager. + manager = self.application.manager + manager.add_web_socket(self) + if hasattr(self, 'set_nodelay'): + self.set_nodelay(True) + + def on_close(self): + # When the socket is closed, deregister the websocket with + # the FigureManager. + manager = self.application.manager + manager.remove_web_socket(self) + + def on_message(self, message): + # The 'supports_binary' message is relevant to the + # websocket itself. The other messages get passed along + # to matplotlib as-is. + + # Every message has a "type" and a "figure_id". + message = json.loads(message) + if message['type'] == 'supports_binary': + self.supports_binary = message['value'] + else: + manager = self.application.manager + manager.handle_json(message) + + def send_json(self, content): + self.write_message(json.dumps(content)) + + def send_binary(self, blob): + if self.supports_binary: + self.write_message(blob, binary=True) + else: + data_uri = "data:image/png;base64,{0}".format( + blob.encode('base64').replace('\n', '')) + self.write_message(data_uri) + + def __init__(self, figure): + self.figure = figure + self.manager = new_figure_manager_given_figure(id(figure), figure) + + super().__init__([ + # Static files for the CSS and JS + (r'/_static/(.*)', + tornado.web.StaticFileHandler, + {'path': FigureManagerWebAgg.get_static_file_path()}), + + # The page that contains all of the pieces + ('/', self.MainPage), + + ('/mpl.js', self.MplJs), + + # Sends images and events to the browser, and receives + # events from the browser + ('/ws', self.WebSocket), + + # Handles the downloading (i.e., saving) of static images + (r'/download.([a-z0-9.]+)', self.Download), + ]) + + +if __name__ == "__main__": + figure = create_figure() + application = MyApplication(figure) + + http_server = tornado.httpserver.HTTPServer(application) + http_server.listen(8080) + + print("http://127.0.0.1:8080/") + print("Press Ctrl+C to quit") + + tornado.ioloop.IOLoop.instance().start() +``` + +## 下载这个示例 + +- [下载python源码: embedding_webagg_sgskip.py](https://matplotlib.org/_downloads/embedding_webagg_sgskip.py) +- [下载Jupyter notebook: embedding_webagg_sgskip.ipynb](https://matplotlib.org/_downloads/embedding_webagg_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/fourier_demo_wx_sgskip.md b/Python/matplotlab/gallery/user_interfaces/fourier_demo_wx_sgskip.md new file mode 100644 index 00000000..50707f84 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/fourier_demo_wx_sgskip.md @@ -0,0 +1,237 @@ +# 傅立叶演示WX + +```python +import numpy as np + +import wx +from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas +from matplotlib.figure import Figure + + +class Knob(object): + """ + Knob - simple class with a "setKnob" method. + A Knob instance is attached to a Param instance, e.g., param.attach(knob) + Base class is for documentation purposes. + """ + + def setKnob(self, value): + pass + + +class Param(object): + """ + The idea of the "Param" class is that some parameter in the GUI may have + several knobs that both control it and reflect the parameter's state, e.g. + a slider, text, and dragging can all change the value of the frequency in + the waveform of this example. + The class allows a cleaner way to update/"feedback" to the other knobs when + one is being changed. Also, this class handles min/max constraints for all + the knobs. + Idea - knob list - in "set" method, knob object is passed as well + - the other knobs in the knob list have a "set" method which gets + called for the others. + """ + + def __init__(self, initialValue=None, minimum=0., maximum=1.): + self.minimum = minimum + self.maximum = maximum + if initialValue != self.constrain(initialValue): + raise ValueError('illegal initial value') + self.value = initialValue + self.knobs = [] + + def attach(self, knob): + self.knobs += [knob] + + def set(self, value, knob=None): + self.value = value + self.value = self.constrain(value) + for feedbackKnob in self.knobs: + if feedbackKnob != knob: + feedbackKnob.setKnob(self.value) + return self.value + + def constrain(self, value): + if value <= self.minimum: + value = self.minimum + if value >= self.maximum: + value = self.maximum + return value + + +class SliderGroup(Knob): + def __init__(self, parent, label, param): + self.sliderLabel = wx.StaticText(parent, label=label) + self.sliderText = wx.TextCtrl(parent, -1, style=wx.TE_PROCESS_ENTER) + self.slider = wx.Slider(parent, -1) + # self.slider.SetMax(param.maximum*1000) + self.slider.SetRange(0, param.maximum * 1000) + self.setKnob(param.value) + + sizer = wx.BoxSizer(wx.HORIZONTAL) + sizer.Add(self.sliderLabel, 0, + wx.EXPAND | wx.ALIGN_CENTER | wx.ALL, + border=2) + sizer.Add(self.sliderText, 0, + wx.EXPAND | wx.ALIGN_CENTER | wx.ALL, + border=2) + sizer.Add(self.slider, 1, wx.EXPAND) + self.sizer = sizer + + self.slider.Bind(wx.EVT_SLIDER, self.sliderHandler) + self.sliderText.Bind(wx.EVT_TEXT_ENTER, self.sliderTextHandler) + + self.param = param + self.param.attach(self) + + def sliderHandler(self, evt): + value = evt.GetInt() / 1000. + self.param.set(value) + + def sliderTextHandler(self, evt): + value = float(self.sliderText.GetValue()) + self.param.set(value) + + def setKnob(self, value): + self.sliderText.SetValue('%g' % value) + self.slider.SetValue(value * 1000) + + +class FourierDemoFrame(wx.Frame): + def __init__(self, *args, **kwargs): + wx.Frame.__init__(self, *args, **kwargs) + panel = wx.Panel(self) + + # create the GUI elements + self.createCanvas(panel) + self.createSliders(panel) + + # place them in a sizer for the Layout + sizer = wx.BoxSizer(wx.VERTICAL) + sizer.Add(self.canvas, 1, wx.EXPAND) + sizer.Add(self.frequencySliderGroup.sizer, 0, + wx.EXPAND | wx.ALIGN_CENTER | wx.ALL, border=5) + sizer.Add(self.amplitudeSliderGroup.sizer, 0, + wx.EXPAND | wx.ALIGN_CENTER | wx.ALL, border=5) + panel.SetSizer(sizer) + + def createCanvas(self, parent): + self.lines = [] + self.figure = Figure() + self.canvas = FigureCanvas(parent, -1, self.figure) + self.canvas.callbacks.connect('button_press_event', self.mouseDown) + self.canvas.callbacks.connect('motion_notify_event', self.mouseMotion) + self.canvas.callbacks.connect('button_release_event', self.mouseUp) + self.state = '' + self.mouseInfo = (None, None, None, None) + self.f0 = Param(2., minimum=0., maximum=6.) + self.A = Param(1., minimum=0.01, maximum=2.) + self.createPlots() + + # Not sure I like having two params attached to the same Knob, + # but that is what we have here... it works but feels kludgy - + # although maybe it's not too bad since the knob changes both params + # at the same time (both f0 and A are affected during a drag) + self.f0.attach(self) + self.A.attach(self) + + def createSliders(self, panel): + self.frequencySliderGroup = SliderGroup( + panel, + label='Frequency f0:', + param=self.f0) + self.amplitudeSliderGroup = SliderGroup(panel, label=' Amplitude a:', + param=self.A) + + def mouseDown(self, evt): + if self.lines[0].contains(evt)[0]: + self.state = 'frequency' + elif self.lines[1].contains(evt)[0]: + self.state = 'time' + else: + self.state = '' + self.mouseInfo = (evt.xdata, evt.ydata, + max(self.f0.value, .1), + self.A.value) + + def mouseMotion(self, evt): + if self.state == '': + return + x, y = evt.xdata, evt.ydata + if x is None: # outside the axes + return + x0, y0, f0Init, AInit = self.mouseInfo + self.A.set(AInit + (AInit * (y - y0) / y0), self) + if self.state == 'frequency': + self.f0.set(f0Init + (f0Init * (x - x0) / x0)) + elif self.state == 'time': + if (x - x0) / x0 != -1.: + self.f0.set(1. / (1. / f0Init + (1. / f0Init * (x - x0) / x0))) + + def mouseUp(self, evt): + self.state = '' + + def createPlots(self): + # This method creates the subplots, waveforms and labels. + # Later, when the waveforms or sliders are dragged, only the + # waveform data will be updated (not here, but below in setKnob). + self.subplot1, self.subplot2 = self.figure.subplots(2) + x1, y1, x2, y2 = self.compute(self.f0.value, self.A.value) + color = (1., 0., 0.) + self.lines += self.subplot1.plot(x1, y1, color=color, linewidth=2) + self.lines += self.subplot2.plot(x2, y2, color=color, linewidth=2) + # Set some plot attributes + self.subplot1.set_title( + "Click and drag waveforms to change frequency and amplitude", + fontsize=12) + self.subplot1.set_ylabel("Frequency Domain Waveform X(f)", fontsize=8) + self.subplot1.set_xlabel("frequency f", fontsize=8) + self.subplot2.set_ylabel("Time Domain Waveform x(t)", fontsize=8) + self.subplot2.set_xlabel("time t", fontsize=8) + self.subplot1.set_xlim([-6, 6]) + self.subplot1.set_ylim([0, 1]) + self.subplot2.set_xlim([-2, 2]) + self.subplot2.set_ylim([-2, 2]) + self.subplot1.text(0.05, .95, + r'$X(f) = \mathcal{F}\{x(t)\}$', + verticalalignment='top', + transform=self.subplot1.transAxes) + self.subplot2.text(0.05, .95, + r'$x(t) = a \cdot \cos(2\pi f_0 t) e^{-\pi t^2}$', + verticalalignment='top', + transform=self.subplot2.transAxes) + + def compute(self, f0, A): + f = np.arange(-6., 6., 0.02) + t = np.arange(-2., 2., 0.01) + x = A * np.cos(2 * np.pi * f0 * t) * np.exp(-np.pi * t ** 2) + X = A / 2 * \ + (np.exp(-np.pi * (f - f0) ** 2) + np.exp(-np.pi * (f + f0) ** 2)) + return f, X, t, x + + def setKnob(self, value): + # Note, we ignore value arg here and just go by state of the params + x1, y1, x2, y2 = self.compute(self.f0.value, self.A.value) + # update the data of the two waveforms + self.lines[0].set(xdata=x1, ydata=y1) + self.lines[1].set(xdata=x2, ydata=y2) + # make the canvas draw its contents again with the new data + self.canvas.draw() + + +class App(wx.App): + def OnInit(self): + self.frame1 = FourierDemoFrame(parent=None, title="Fourier Demo", + size=(640, 480)) + self.frame1.Show() + return True + +app = App() +app.MainLoop() +``` + +## 下载这个示例 + +- [下载python源码: fourier_demo_wx_sgskip.py](https://matplotlib.org/_downloads/fourier_demo_wx_sgskip.py) +- [下载Jupyter notebook: fourier_demo_wx_sgskip.ipynb](https://matplotlib.org/_downloads/fourier_demo_wx_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/gtk_spreadsheet_sgskip.md b/Python/matplotlab/gallery/user_interfaces/gtk_spreadsheet_sgskip.md new file mode 100644 index 00000000..d76ad862 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/gtk_spreadsheet_sgskip.md @@ -0,0 +1,94 @@ +# GTK电子表格 + +在应用程序中嵌入matplotlib并与treeview交互以存储数据的示例。双击条目以更新绘图数据。 + +```python +import gi +gi.require_version('Gtk', '3.0') +gi.require_version('Gdk', '3.0') +from gi.repository import Gtk, Gdk + +from matplotlib.backends.backend_gtk3agg import FigureCanvas +# from matplotlib.backends.backend_gtk3cairo import FigureCanvas + +from numpy.random import random +from matplotlib.figure import Figure + + +class DataManager(Gtk.Window): + numRows, numCols = 20, 10 + + data = random((numRows, numCols)) + + def __init__(self): + Gtk.Window.__init__(self) + self.set_default_size(600, 600) + self.connect('destroy', lambda win: Gtk.main_quit()) + + self.set_title('GtkListStore demo') + self.set_border_width(8) + + vbox = Gtk.VBox(False, 8) + self.add(vbox) + + label = Gtk.Label('Double click a row to plot the data') + + vbox.pack_start(label, False, False, 0) + + sw = Gtk.ScrolledWindow() + sw.set_shadow_type(Gtk.ShadowType.ETCHED_IN) + sw.set_policy(Gtk.PolicyType.NEVER, + Gtk.PolicyType.AUTOMATIC) + vbox.pack_start(sw, True, True, 0) + + model = self.create_model() + + self.treeview = Gtk.TreeView(model) + self.treeview.set_rules_hint(True) + + # matplotlib stuff + fig = Figure(figsize=(6, 4)) + + self.canvas = FigureCanvas(fig) # a Gtk.DrawingArea + vbox.pack_start(self.canvas, True, True, 0) + ax = fig.add_subplot(111) + self.line, = ax.plot(self.data[0, :], 'go') # plot the first row + + self.treeview.connect('row-activated', self.plot_row) + sw.add(self.treeview) + + self.add_columns() + + self.add_events(Gdk.EventMask.BUTTON_PRESS_MASK | + Gdk.EventMask.KEY_PRESS_MASK | + Gdk.EventMask.KEY_RELEASE_MASK) + + def plot_row(self, treeview, path, view_column): + ind, = path # get the index into data + points = self.data[ind, :] + self.line.set_ydata(points) + self.canvas.draw() + + def add_columns(self): + for i in range(self.numCols): + column = Gtk.TreeViewColumn(str(i), Gtk.CellRendererText(), text=i) + self.treeview.append_column(column) + + def create_model(self): + types = [float]*self.numCols + store = Gtk.ListStore(*types) + + for row in self.data: + store.append(tuple(row)) + return store + + +manager = DataManager() +manager.show_all() +Gtk.main() +``` + +## 下载这个示例 + +- [下载python源码: gtk_spreadsheet_sgskip.py](https://matplotlib.org/_downloads/gtk_spreadsheet_sgskip.py) +- [下载Jupyter notebook: gtk_spreadsheet_sgskip.ipynb](https://matplotlib.org/_downloads/gtk_spreadsheet_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/index.md b/Python/matplotlab/gallery/user_interfaces/index.md new file mode 100644 index 00000000..34357c94 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/index.md @@ -0,0 +1,5 @@ +# 将Matplotlib嵌入图形用户界面中 + +您可以通过此处的embedding_in_SOMEGUI.py示例将Matplotlib直接嵌入到用户界面中。 目前matplotlib支持wxpython,pygtk,tkinter 和 pyqt4 / 5。 + +在将Matplotlib嵌入GUI时,您应该直接使用Matplotlib API而不是pylab / pyplot的继续接口,因此请查看examples / api目录,了解使用API的一些示例代码。 \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/mathtext_wx_sgskip.md b/Python/matplotlab/gallery/user_interfaces/mathtext_wx_sgskip.md new file mode 100644 index 00000000..2a749145 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/mathtext_wx_sgskip.md @@ -0,0 +1,133 @@ +# WX中的数学文本 + +演示如何将数学文本转换为wx.Bitmap,以便在wxPython的各种控件中显示。 + +```python +import matplotlib +matplotlib.use("WxAgg") +from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas +from matplotlib.backends.backend_wx import NavigationToolbar2Wx +from matplotlib.figure import Figure +import numpy as np + +import wx + +IS_GTK = 'wxGTK' in wx.PlatformInfo +IS_WIN = 'wxMSW' in wx.PlatformInfo +``` + +This is where the "magic" happens. + +```python +from matplotlib.mathtext import MathTextParser +mathtext_parser = MathTextParser("Bitmap") + + +def mathtext_to_wxbitmap(s): + ftimage, depth = mathtext_parser.parse(s, 150) + return wx.Bitmap.FromBufferRGBA( + ftimage.get_width(), ftimage.get_height(), + ftimage.as_rgba_str()) +``` + +```python +functions = [ + (r'$\sin(2 \pi x)$', lambda x: np.sin(2*np.pi*x)), + (r'$\frac{4}{3}\pi x^3$', lambda x: (4.0/3.0)*np.pi*x**3), + (r'$\cos(2 \pi x)$', lambda x: np.cos(2*np.pi*x)), + (r'$\log(x)$', lambda x: np.log(x)) +] + + +class CanvasFrame(wx.Frame): + def __init__(self, parent, title): + wx.Frame.__init__(self, parent, -1, title, size=(550, 350)) + + self.figure = Figure() + self.axes = self.figure.add_subplot(111) + + self.canvas = FigureCanvas(self, -1, self.figure) + + self.change_plot(0) + + self.sizer = wx.BoxSizer(wx.VERTICAL) + self.add_buttonbar() + self.sizer.Add(self.canvas, 1, wx.LEFT | wx.TOP | wx.GROW) + self.add_toolbar() # comment this out for no toolbar + + menuBar = wx.MenuBar() + + # File Menu + menu = wx.Menu() + m_exit = menu.Append(wx.ID_EXIT, "E&xit\tAlt-X", "Exit this simple sample") + menuBar.Append(menu, "&File") + self.Bind(wx.EVT_MENU, self.OnClose, m_exit) + + if IS_GTK or IS_WIN: + # Equation Menu + menu = wx.Menu() + for i, (mt, func) in enumerate(functions): + bm = mathtext_to_wxbitmap(mt) + item = wx.MenuItem(menu, 1000 + i, " ") + item.SetBitmap(bm) + menu.Append(item) + self.Bind(wx.EVT_MENU, self.OnChangePlot, item) + menuBar.Append(menu, "&Functions") + + self.SetMenuBar(menuBar) + + self.SetSizer(self.sizer) + self.Fit() + + def add_buttonbar(self): + self.button_bar = wx.Panel(self) + self.button_bar_sizer = wx.BoxSizer(wx.HORIZONTAL) + self.sizer.Add(self.button_bar, 0, wx.LEFT | wx.TOP | wx.GROW) + + for i, (mt, func) in enumerate(functions): + bm = mathtext_to_wxbitmap(mt) + button = wx.BitmapButton(self.button_bar, 1000 + i, bm) + self.button_bar_sizer.Add(button, 1, wx.GROW) + self.Bind(wx.EVT_BUTTON, self.OnChangePlot, button) + + self.button_bar.SetSizer(self.button_bar_sizer) + + def add_toolbar(self): + """Copied verbatim from embedding_wx2.py""" + self.toolbar = NavigationToolbar2Wx(self.canvas) + self.toolbar.Realize() + # By adding toolbar in sizer, we are able to put it at the bottom + # of the frame - so appearance is closer to GTK version. + self.sizer.Add(self.toolbar, 0, wx.LEFT | wx.EXPAND) + # update the axes menu on the toolbar + self.toolbar.update() + + def OnChangePlot(self, event): + self.change_plot(event.GetId() - 1000) + + def change_plot(self, plot_number): + t = np.arange(1.0, 3.0, 0.01) + s = functions[plot_number][1](t) + self.axes.clear() + self.axes.plot(t, s) + self.canvas.draw() + + def OnClose(self, event): + self.Destroy() + + +class MyApp(wx.App): + def OnInit(self): + frame = CanvasFrame(None, "wxPython mathtext demo app") + self.SetTopWindow(frame) + frame.Show(True) + return True + +app = MyApp() +app.MainLoop() +``` + +## 下载这个示例 + +- [下载python源码: mathtext_wx_sgskip.py](https://matplotlib.org/_downloads/mathtext_wx_sgskip.py) +- [下载Jupyter notebook: mathtext_wx_sgskip.ipynb](https://matplotlib.org/_downloads/mathtext_wx_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/mpl_with_glade3_sgskip.md b/Python/matplotlab/gallery/user_interfaces/mpl_with_glade3_sgskip.md new file mode 100644 index 00000000..03d01514 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/mpl_with_glade3_sgskip.md @@ -0,0 +1,55 @@ +# Matplotlib与Glade 3 + +```python +import os + +import gi +gi.require_version('Gtk', '3.0') +from gi.repository import Gtk + +from matplotlib.figure import Figure +from matplotlib.backends.backend_gtk3agg import ( + FigureCanvasGTK3Agg as FigureCanvas) +import numpy as np + + +class Window1Signals(object): + def on_window1_destroy(self, widget): + Gtk.main_quit() + + +def main(): + builder = Gtk.Builder() + builder.add_objects_from_file(os.path.join(os.path.dirname(__file__), + "mpl_with_glade3.glade"), + ("window1", "")) + builder.connect_signals(Window1Signals()) + window = builder.get_object("window1") + sw = builder.get_object("scrolledwindow1") + + # Start of Matplotlib specific code + figure = Figure(figsize=(8, 6), dpi=71) + axis = figure.add_subplot(111) + t = np.arange(0.0, 3.0, 0.01) + s = np.sin(2*np.pi*t) + axis.plot(t, s) + + axis.set_xlabel('time [s]') + axis.set_ylabel('voltage [V]') + + canvas = FigureCanvas(figure) # a Gtk.DrawingArea + canvas.set_size_request(800, 600) + sw.add_with_viewport(canvas) + # End of Matplotlib specific code + + window.show_all() + Gtk.main() + +if __name__ == "__main__": + main() +``` + +## 下载这个示例 + +- [下载python源码: mpl_with_glade3_sgskip.py](https://matplotlib.org/_downloads/mpl_with_glade3_sgskip.py) +- [下载Jupyter notebook: mpl_with_glade3_sgskip.ipynb](https://matplotlib.org/_downloads/mpl_with_glade3_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/pylab_with_gtk_sgskip.md b/Python/matplotlab/gallery/user_interfaces/pylab_with_gtk_sgskip.md new file mode 100644 index 00000000..8a008456 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/pylab_with_gtk_sgskip.md @@ -0,0 +1,55 @@ +# Pyplot与GTK + +```python +import os + +import gi +gi.require_version('Gtk', '3.0') +from gi.repository import Gtk + +from matplotlib.figure import Figure +from matplotlib.backends.backend_gtk3agg import ( + FigureCanvasGTK3Agg as FigureCanvas) +import numpy as np + + +class Window1Signals(object): + def on_window1_destroy(self, widget): + Gtk.main_quit() + + +def main(): + builder = Gtk.Builder() + builder.add_objects_from_file(os.path.join(os.path.dirname(__file__), + "mpl_with_glade3.glade"), + ("window1", "")) + builder.connect_signals(Window1Signals()) + window = builder.get_object("window1") + sw = builder.get_object("scrolledwindow1") + + # Start of Matplotlib specific code + figure = Figure(figsize=(8, 6), dpi=71) + axis = figure.add_subplot(111) + t = np.arange(0.0, 3.0, 0.01) + s = np.sin(2*np.pi*t) + axis.plot(t, s) + + axis.set_xlabel('time [s]') + axis.set_ylabel('voltage [V]') + + canvas = FigureCanvas(figure) # a Gtk.DrawingArea + canvas.set_size_request(800, 600) + sw.add_with_viewport(canvas) + # End of Matplotlib specific code + + window.show_all() + Gtk.main() + +if __name__ == "__main__": + main() +``` + +## 下载这个示例 + +- [下载python源码: mpl_with_glade3_sgskip.py](https://matplotlib.org/_downloads/mpl_with_glade3_sgskip.py) +- [下载Jupyter notebook: mpl_with_glade3_sgskip.ipynb](https://matplotlib.org/_downloads/mpl_with_glade3_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/svg_histogram_sgskip.md b/Python/matplotlab/gallery/user_interfaces/svg_histogram_sgskip.md new file mode 100644 index 00000000..626b83a6 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/svg_histogram_sgskip.md @@ -0,0 +1,150 @@ +# SVG直方图 + +演示如何创建交互式直方图,通过单击图例标记隐藏或显示条形图。 + +交互性以ecmascript(javascript)编码,并在后处理步骤中插入SVG代码中。 要渲染图像,请在Web浏览器中打开它。 大多数Linux Web浏览器和OSX用户都支持SVG。 Windows IE9支持SVG,但早期版本不支持。 + +## 注意 + +matplotlib后端允许我们为每个对象分配id。 这是用于描述在python中创建的matplotlib对象的机制以及在第二步中解析的相应SVG构造。 虽然灵活,但它们很难用于大量物体的收集。 可以使用两种机制来简化事情: + +- 系统地将对象分组为SVG \标签, +- 根据每个SVG对象的来源为每个SVG对象分配类。 + +例如,不是修改每个单独栏的属性,而是可以将列分组到PatchCollection中,或者将列分配给class =“hist _ ##”属性。 + +CSS也可以广泛用于替换整个SVG生成中的重复标记。 + +作者:david.huard@gmail.com + +```python +import numpy as np +import matplotlib.pyplot as plt +import xml.etree.ElementTree as ET +from io import BytesIO +import json + + +plt.rcParams['svg.fonttype'] = 'none' + +# Apparently, this `register_namespace` method works only with +# python 2.7 and up and is necessary to avoid garbling the XML name +# space with ns0. +ET.register_namespace("", "http://www.w3.org/2000/svg") + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# --- Create histogram, legend and title --- +plt.figure() +r = np.random.randn(100) +r1 = r + 1 +labels = ['Rabbits', 'Frogs'] +H = plt.hist([r, r1], label=labels) +containers = H[-1] +leg = plt.legend(frameon=False) +plt.title("From a web browser, click on the legend\n" + "marker to toggle the corresponding histogram.") + + +# --- Add ids to the svg objects we'll modify + +hist_patches = {} +for ic, c in enumerate(containers): + hist_patches['hist_%d' % ic] = [] + for il, element in enumerate(c): + element.set_gid('hist_%d_patch_%d' % (ic, il)) + hist_patches['hist_%d' % ic].append('hist_%d_patch_%d' % (ic, il)) + +# Set ids for the legend patches +for i, t in enumerate(leg.get_patches()): + t.set_gid('leg_patch_%d' % i) + +# Set ids for the text patches +for i, t in enumerate(leg.get_texts()): + t.set_gid('leg_text_%d' % i) + +# Save SVG in a fake file object. +f = BytesIO() +plt.savefig(f, format="svg") + +# Create XML tree from the SVG file. +tree, xmlid = ET.XMLID(f.getvalue()) + + +# --- Add interactivity --- + +# Add attributes to the patch objects. +for i, t in enumerate(leg.get_patches()): + el = xmlid['leg_patch_%d' % i] + el.set('cursor', 'pointer') + el.set('onclick', "toggle_hist(this)") + +# Add attributes to the text objects. +for i, t in enumerate(leg.get_texts()): + el = xmlid['leg_text_%d' % i] + el.set('cursor', 'pointer') + el.set('onclick', "toggle_hist(this)") + +# Create script defining the function `toggle_hist`. +# We create a global variable `container` that stores the patches id +# belonging to each histogram. Then a function "toggle_element" sets the +# visibility attribute of all patches of each histogram and the opacity +# of the marker itself. + +script = """ + +""" % json.dumps(hist_patches) + +# Add a transition effect +css = tree.getchildren()[0][0] +css.text = css.text + "g {-webkit-transition:opacity 0.4s ease-out;" + \ + "-moz-transition:opacity 0.4s ease-out;}" + +# Insert the script and save to file. +tree.insert(0, ET.XML(script)) + +ET.ElementTree(tree).write("svg_histogram.svg") +``` + +## 下载这个示例 + +- [下载python源码: svg_histogram_sgskip.py](https://matplotlib.org/_downloads/svg_histogram_sgskip.py) +- [下载Jupyter notebook: svg_histogram_sgskip.ipynb](https://matplotlib.org/_downloads/svg_histogram_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/svg_tooltip_sgskip.md b/Python/matplotlab/gallery/user_interfaces/svg_tooltip_sgskip.md new file mode 100644 index 00000000..67617b93 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/svg_tooltip_sgskip.md @@ -0,0 +1,100 @@ +# SVG工具提示 + +此示例显示如何创建将在Matplotlib面片上悬停时显示的工具提示。 + +虽然可以从CSS或javascript创建工具提示,但在这里,我们在matplotlib中创建它,并在将其悬停在修补程序上时简单地切换它的可见性。这种方法提供了对工具提示的位置和外观的完全控制,而代价是预先获得更多的代码。 + +另一种方法是将工具提示内容放在SVG对象的Title属性中。然后,使用现有的js/css库,在浏览器中创建工具提示相对简单。内容由title属性决定,外观由CSS决定。 + +作者:David Huard + +```python +import matplotlib.pyplot as plt +import xml.etree.ElementTree as ET +from io import BytesIO + +ET.register_namespace("", "http://www.w3.org/2000/svg") + +fig, ax = plt.subplots() + +# Create patches to which tooltips will be assigned. +rect1 = plt.Rectangle((10, -20), 10, 5, fc='blue') +rect2 = plt.Rectangle((-20, 15), 10, 5, fc='green') + +shapes = [rect1, rect2] +labels = ['This is a blue rectangle.', 'This is a green rectangle'] + +for i, (item, label) in enumerate(zip(shapes, labels)): + patch = ax.add_patch(item) + annotate = ax.annotate(labels[i], xy=item.get_xy(), xytext=(0, 0), + textcoords='offset points', color='w', ha='center', + fontsize=8, bbox=dict(boxstyle='round, pad=.5', + fc=(.1, .1, .1, .92), + ec=(1., 1., 1.), lw=1, + zorder=1)) + + ax.add_patch(patch) + patch.set_gid('mypatch_{:03d}'.format(i)) + annotate.set_gid('mytooltip_{:03d}'.format(i)) + +# Save the figure in a fake file object +ax.set_xlim(-30, 30) +ax.set_ylim(-30, 30) +ax.set_aspect('equal') + +f = BytesIO() +plt.savefig(f, format="svg") + +# --- Add interactivity --- + +# Create XML tree from the SVG file. +tree, xmlid = ET.XMLID(f.getvalue()) +tree.set('onload', 'init(evt)') + +for i in shapes: + # Get the index of the shape + index = shapes.index(i) + # Hide the tooltips + tooltip = xmlid['mytooltip_{:03d}'.format(index)] + tooltip.set('visibility', 'hidden') + # Assign onmouseover and onmouseout callbacks to patches. + mypatch = xmlid['mypatch_{:03d}'.format(index)] + mypatch.set('onmouseover', "ShowTooltip(this)") + mypatch.set('onmouseout', "HideTooltip(this)") + +# This is the script defining the ShowTooltip and HideTooltip functions. +script = """ + + """ + +# Insert the script at the top of the file and save it. +tree.insert(0, ET.XML(script)) +ET.ElementTree(tree).write('svg_tooltip.svg') +``` + +## 下载这个示例 + +- [下载python源码: svg_tooltip_sgskip.py](https://matplotlib.org/_downloads/svg_tooltip_sgskip.py) +- [下载Jupyter notebook: svg_tooltip_sgskip.ipynb](https://matplotlib.org/_downloads/svg_tooltip_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/toolmanager_sgskip.md b/Python/matplotlab/gallery/user_interfaces/toolmanager_sgskip.md new file mode 100644 index 00000000..1bbbf8af --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/toolmanager_sgskip.md @@ -0,0 +1,96 @@ +# 工具管理 + +此示例演示如何: + +- 修改工具栏。 +- 创建工具。 +- 添加工具。 +- 删除工具 + +可以使用 matplotlib.backend_managers.ToolManager + +```python +import matplotlib.pyplot as plt +plt.rcParams['toolbar'] = 'toolmanager' +from matplotlib.backend_tools import ToolBase, ToolToggleBase + + +class ListTools(ToolBase): + '''List all the tools controlled by the `ToolManager`''' + # keyboard shortcut + default_keymap = 'm' + description = 'List Tools' + + def trigger(self, *args, **kwargs): + print('_' * 80) + print("{0:12} {1:45} {2}".format( + 'Name (id)', 'Tool description', 'Keymap')) + print('-' * 80) + tools = self.toolmanager.tools + for name in sorted(tools): + if not tools[name].description: + continue + keys = ', '.join(sorted(self.toolmanager.get_tool_keymap(name))) + print("{0:12} {1:45} {2}".format( + name, tools[name].description, keys)) + print('_' * 80) + print("Active Toggle tools") + print("{0:12} {1:45}".format("Group", "Active")) + print('-' * 80) + for group, active in self.toolmanager.active_toggle.items(): + print("{0:12} {1:45}".format(str(group), str(active))) + + +class GroupHideTool(ToolToggleBase): + '''Show lines with a given gid''' + default_keymap = 'G' + description = 'Show by gid' + default_toggled = True + + def __init__(self, *args, gid, **kwargs): + self.gid = gid + super().__init__(*args, **kwargs) + + def enable(self, *args): + self.set_lines_visibility(True) + + def disable(self, *args): + self.set_lines_visibility(False) + + def set_lines_visibility(self, state): + gr_lines = [] + for ax in self.figure.get_axes(): + for line in ax.get_lines(): + if line.get_gid() == self.gid: + line.set_visible(state) + self.figure.canvas.draw() + + +fig = plt.figure() +plt.plot([1, 2, 3], gid='mygroup') +plt.plot([2, 3, 4], gid='unknown') +plt.plot([3, 2, 1], gid='mygroup') + +# Add the custom tools that we created +fig.canvas.manager.toolmanager.add_tool('List', ListTools) +fig.canvas.manager.toolmanager.add_tool('Show', GroupHideTool, gid='mygroup') + + +# Add an existing tool to new group `foo`. +# It can be added as many times as we want +fig.canvas.manager.toolbar.add_tool('zoom', 'foo') + +# Remove the forward button +fig.canvas.manager.toolmanager.remove_tool('forward') + +# To add a custom tool to the toolbar at specific location inside +# the navigation group +fig.canvas.manager.toolbar.add_tool('Show', 'navigation', 1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: toolmanager_sgskip.py](https://matplotlib.org/_downloads/toolmanager_sgskip.py) +- [下载Jupyter notebook: toolmanager_sgskip.ipynb](https://matplotlib.org/_downloads/toolmanager_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/user_interfaces/wxcursor_demo_sgskip.md b/Python/matplotlab/gallery/user_interfaces/wxcursor_demo_sgskip.md new file mode 100644 index 00000000..f246d009 --- /dev/null +++ b/Python/matplotlab/gallery/user_interfaces/wxcursor_demo_sgskip.md @@ -0,0 +1,70 @@ +# WX光标演示 + +例如,绘制光标并报告WX中的数据坐标。 + +```python +from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas +from matplotlib.backends.backend_wx import NavigationToolbar2Wx +from matplotlib.figure import Figure +import numpy as np + +import wx + + +class CanvasFrame(wx.Frame): + def __init__(self, ): + wx.Frame.__init__(self, None, -1, 'CanvasFrame', size=(550, 350)) + + self.figure = Figure() + self.axes = self.figure.add_subplot(111) + t = np.arange(0.0, 3.0, 0.01) + s = np.sin(2*np.pi*t) + + self.axes.plot(t, s) + self.axes.set_xlabel('t') + self.axes.set_ylabel('sin(t)') + self.figure_canvas = FigureCanvas(self, -1, self.figure) + + # Note that event is a MplEvent + self.figure_canvas.mpl_connect( + 'motion_notify_event', self.UpdateStatusBar) + self.figure_canvas.Bind(wx.EVT_ENTER_WINDOW, self.ChangeCursor) + + self.sizer = wx.BoxSizer(wx.VERTICAL) + self.sizer.Add(self.figure_canvas, 1, wx.LEFT | wx.TOP | wx.GROW) + self.SetSizer(self.sizer) + self.Fit() + + self.statusBar = wx.StatusBar(self, -1) + self.SetStatusBar(self.statusBar) + + self.toolbar = NavigationToolbar2Wx(self.figure_canvas) + self.sizer.Add(self.toolbar, 0, wx.LEFT | wx.EXPAND) + self.toolbar.Show() + + def ChangeCursor(self, event): + self.figure_canvas.SetCursor(wx.Cursor(wx.CURSOR_BULLSEYE)) + + def UpdateStatusBar(self, event): + if event.inaxes: + self.statusBar.SetStatusText( + "x={} y={}".format(event.xdata, event.ydata)) + + +class App(wx.App): + def OnInit(self): + 'Create the main window and insert the custom frame' + frame = CanvasFrame() + self.SetTopWindow(frame) + frame.Show(True) + return True + +if __name__ == '__main__': + app = App(0) + app.MainLoop() +``` + +## 下载这个示例 + +- [下载python源码: wxcursor_demo_sgskip.py](https://matplotlib.org/_downloads/wxcursor_demo_sgskip.py) +- [下载Jupyter notebook: wxcursor_demo_sgskip.ipynb](https://matplotlib.org/_downloads/wxcursor_demo_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/anchored_box01.md b/Python/matplotlab/gallery/userdemo/anchored_box01.md new file mode 100644 index 00000000..b2b4f48d --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/anchored_box01.md @@ -0,0 +1,23 @@ +# 锚定Box01 + +![锚定Box01示例](https://matplotlib.org/_images/sphx_glr_anchored_box01_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.offsetbox import AnchoredText + + +fig, ax = plt.subplots(figsize=(3, 3)) + +at = AnchoredText("Figure 1a", + prop=dict(size=15), frameon=True, loc='upper left') +at.patch.set_boxstyle("round,pad=0.,rounding_size=0.2") +ax.add_artist(at) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: anchored_box01.py](https://matplotlib.org/_downloads/anchored_box01.py) +- [下载Jupyter notebook: anchored_box01.ipynb](https://matplotlib.org/_downloads/anchored_box01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/anchored_box02.md b/Python/matplotlab/gallery/userdemo/anchored_box02.md new file mode 100644 index 00000000..b66b0fa3 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/anchored_box02.md @@ -0,0 +1,28 @@ +# 锚定Box02 + +![锚定Box02示例](https://matplotlib.org/_images/sphx_glr_anchored_box02_001.png) + +```python +from matplotlib.patches import Circle +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1.anchored_artists import AnchoredDrawingArea + + +fig, ax = plt.subplots(figsize=(3, 3)) + +ada = AnchoredDrawingArea(40, 20, 0, 0, + loc='upper right', pad=0., frameon=False) +p1 = Circle((10, 10), 10) +ada.drawing_area.add_artist(p1) +p2 = Circle((30, 10), 5, fc="r") +ada.drawing_area.add_artist(p2) + +ax.add_artist(ada) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: anchored_box02.py](https://matplotlib.org/_downloads/anchored_box02.py) +- [下载Jupyter notebook: anchored_box02.ipynb](https://matplotlib.org/_downloads/anchored_box02.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/anchored_box03.md b/Python/matplotlab/gallery/userdemo/anchored_box03.md new file mode 100644 index 00000000..f3f6f394 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/anchored_box03.md @@ -0,0 +1,25 @@ +# 锚定Box03 + +![锚定Box03示例](https://matplotlib.org/_images/sphx_glr_anchored_box03_001.png) + +```python +from matplotlib.patches import Ellipse +import matplotlib.pyplot as plt +from mpl_toolkits.axes_grid1.anchored_artists import AnchoredAuxTransformBox + + +fig, ax = plt.subplots(figsize=(3, 3)) + +box = AnchoredAuxTransformBox(ax.transData, loc='upper left') +el = Ellipse((0, 0), width=0.1, height=0.4, angle=30) # in data coordinates! +box.drawing_area.add_artist(el) + +ax.add_artist(box) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: anchored_box03.py](https://matplotlib.org/_downloads/anchored_box03.py) +- [下载Jupyter notebook: anchored_box03.ipynb](https://matplotlib.org/_downloads/anchored_box03.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/anchored_box04.md b/Python/matplotlab/gallery/userdemo/anchored_box04.md new file mode 100644 index 00000000..2e8fcbdb --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/anchored_box04.md @@ -0,0 +1,45 @@ +# 锚定Box04 + +![锚定Box04示例](https://matplotlib.org/_images/sphx_glr_anchored_box04_001.png) + +```python +from matplotlib.patches import Ellipse +import matplotlib.pyplot as plt +from matplotlib.offsetbox import (AnchoredOffsetbox, DrawingArea, HPacker, + TextArea) + + +fig, ax = plt.subplots(figsize=(3, 3)) + +box1 = TextArea(" Test : ", textprops=dict(color="k")) + +box2 = DrawingArea(60, 20, 0, 0) +el1 = Ellipse((10, 10), width=16, height=5, angle=30, fc="r") +el2 = Ellipse((30, 10), width=16, height=5, angle=170, fc="g") +el3 = Ellipse((50, 10), width=16, height=5, angle=230, fc="b") +box2.add_artist(el1) +box2.add_artist(el2) +box2.add_artist(el3) + +box = HPacker(children=[box1, box2], + align="center", + pad=0, sep=5) + +anchored_box = AnchoredOffsetbox(loc='lower left', + child=box, pad=0., + frameon=True, + bbox_to_anchor=(0., 1.02), + bbox_transform=ax.transAxes, + borderpad=0., + ) + +ax.add_artist(anchored_box) + +fig.subplots_adjust(top=0.8) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: anchored_box04.py](https://matplotlib.org/_downloads/anchored_box04.py) +- [下载Jupyter notebook: anchored_box04.ipynb](https://matplotlib.org/_downloads/anchored_box04.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_explain.md b/Python/matplotlab/gallery/userdemo/annotate_explain.md new file mode 100644 index 00000000..d1444b35 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_explain.md @@ -0,0 +1,87 @@ +# 注释说明 + +![注释说明示例](https://matplotlib.org/_images/sphx_glr_annotate_explain_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.patches as mpatches + + +fig, axs = plt.subplots(2, 2) +x1, y1 = 0.3, 0.3 +x2, y2 = 0.7, 0.7 + +ax = axs.flat[0] +ax.plot([x1, x2], [y1, y2], ".") +el = mpatches.Ellipse((x1, y1), 0.3, 0.4, angle=30, alpha=0.2) +ax.add_artist(el) +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="-", + color="0.5", + patchB=None, + shrinkB=0, + connectionstyle="arc3,rad=0.3", + ), + ) +ax.text(.05, .95, "connect", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[1] +ax.plot([x1, x2], [y1, y2], ".") +el = mpatches.Ellipse((x1, y1), 0.3, 0.4, angle=30, alpha=0.2) +ax.add_artist(el) +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="-", + color="0.5", + patchB=el, + shrinkB=0, + connectionstyle="arc3,rad=0.3", + ), + ) +ax.text(.05, .95, "clip", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[2] +ax.plot([x1, x2], [y1, y2], ".") +el = mpatches.Ellipse((x1, y1), 0.3, 0.4, angle=30, alpha=0.2) +ax.add_artist(el) +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="-", + color="0.5", + patchB=el, + shrinkB=5, + connectionstyle="arc3,rad=0.3", + ), + ) +ax.text(.05, .95, "shrink", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[3] +ax.plot([x1, x2], [y1, y2], ".") +el = mpatches.Ellipse((x1, y1), 0.3, 0.4, angle=30, alpha=0.2) +ax.add_artist(el) +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="fancy", + color="0.5", + patchB=el, + shrinkB=5, + connectionstyle="arc3,rad=0.3", + ), + ) +ax.text(.05, .95, "mutate", transform=ax.transAxes, ha="left", va="top") + +for ax in axs.flat: + ax.set(xlim=(0, 1), ylim=(0, 1), xticks=[], yticks=[], aspect=1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_explain.py](https://matplotlib.org/_downloads/annotate_explain.py) +- [下载Jupyter notebook: annotate_explain.ipynb](https://matplotlib.org/_downloads/annotate_explain.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_simple01.md b/Python/matplotlab/gallery/userdemo/annotate_simple01.md new file mode 100644 index 00000000..53fd121f --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_simple01.md @@ -0,0 +1,24 @@ +# 注释Simple01 + +![注释Simple01示例](https://matplotlib.org/_images/sphx_glr_annotate_simple01_001.png) + +```python +import matplotlib.pyplot as plt + + +fig, ax = plt.subplots(figsize=(3, 3)) + +ax.annotate("", + xy=(0.2, 0.2), xycoords='data', + xytext=(0.8, 0.8), textcoords='data', + arrowprops=dict(arrowstyle="->", + connectionstyle="arc3"), + ) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_simple01.py](https://matplotlib.org/_downloads/annotate_simple01.py) +- [下载Jupyter notebook: annotate_simple01.ipynb](https://matplotlib.org/_downloads/annotate_simple01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_simple02.md b/Python/matplotlab/gallery/userdemo/annotate_simple02.md new file mode 100644 index 00000000..c9ec19ed --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_simple02.md @@ -0,0 +1,25 @@ +# 注释Simple02 + +![注释Simple02示例](https://matplotlib.org/_images/sphx_glr_annotate_simple02_001.png) + +```python +import matplotlib.pyplot as plt + + +fig, ax = plt.subplots(figsize=(3, 3)) + +ax.annotate("Test", + xy=(0.2, 0.2), xycoords='data', + xytext=(0.8, 0.8), textcoords='data', + size=20, va="center", ha="center", + arrowprops=dict(arrowstyle="simple", + connectionstyle="arc3,rad=-0.2"), + ) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_simple02.py](https://matplotlib.org/_downloads/annotate_simple02.py) +- [下载Jupyter notebook: annotate_simple02.ipynb](https://matplotlib.org/_downloads/annotate_simple02.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_simple03.md b/Python/matplotlab/gallery/userdemo/annotate_simple03.md new file mode 100644 index 00000000..71d6e093 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_simple03.md @@ -0,0 +1,27 @@ +# 注释Simple03 + +![注释Simple03示例](https://matplotlib.org/_images/sphx_glr_annotate_simple03_001.png) + +```python +import matplotlib.pyplot as plt + + +fig, ax = plt.subplots(figsize=(3, 3)) + +ann = ax.annotate("Test", + xy=(0.2, 0.2), xycoords='data', + xytext=(0.8, 0.8), textcoords='data', + size=20, va="center", ha="center", + bbox=dict(boxstyle="round4", fc="w"), + arrowprops=dict(arrowstyle="-|>", + connectionstyle="arc3,rad=-0.2", + fc="w"), + ) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_simple03.py](https://matplotlib.org/_downloads/annotate_simple03.py) +- [下载Jupyter notebook: annotate_simple03.ipynb](https://matplotlib.org/_downloads/annotate_simple03.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_simple04.md b/Python/matplotlab/gallery/userdemo/annotate_simple04.md new file mode 100644 index 00000000..b7a63eff --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_simple04.md @@ -0,0 +1,39 @@ +# 注释Simple04 + +![注释Simple04示例](https://matplotlib.org/_images/sphx_glr_annotate_simple04_001.png) + +```python +import matplotlib.pyplot as plt + + +fig, ax = plt.subplots(figsize=(3, 3)) + +ann = ax.annotate("Test", + xy=(0.2, 0.2), xycoords='data', + xytext=(0.8, 0.8), textcoords='data', + size=20, va="center", ha="center", + bbox=dict(boxstyle="round4", fc="w"), + arrowprops=dict(arrowstyle="-|>", + connectionstyle="arc3,rad=0.2", + relpos=(0., 0.), + fc="w"), + ) + +ann = ax.annotate("Test", + xy=(0.2, 0.2), xycoords='data', + xytext=(0.8, 0.8), textcoords='data', + size=20, va="center", ha="center", + bbox=dict(boxstyle="round4", fc="w"), + arrowprops=dict(arrowstyle="-|>", + connectionstyle="arc3,rad=-0.2", + relpos=(1., 0.), + fc="w"), + ) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_simple04.py](https://matplotlib.org/_downloads/annotate_simple04.py) +- [下载Jupyter notebook: annotate_simple04.ipynb](https://matplotlib.org/_downloads/annotate_simple04.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_simple_coord01.md b/Python/matplotlab/gallery/userdemo/annotate_simple_coord01.md new file mode 100644 index 00000000..1b1c9e19 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_simple_coord01.md @@ -0,0 +1,24 @@ +# 简单Coord01注释示例 + +![简单Coord01注释示例](https://matplotlib.org/_images/sphx_glr_annotate_simple_coord01_001.png) + +```python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots(figsize=(3, 2)) +an1 = ax.annotate("Test 1", xy=(0.5, 0.5), xycoords="data", + va="center", ha="center", + bbox=dict(boxstyle="round", fc="w")) +an2 = ax.annotate("Test 2", xy=(1, 0.5), xycoords=an1, + xytext=(30, 0), textcoords="offset points", + va="center", ha="left", + bbox=dict(boxstyle="round", fc="w"), + arrowprops=dict(arrowstyle="->")) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_simple_coord01.py](https://matplotlib.org/_downloads/annotate_simple_coord01.py) +- [下载Jupyter notebook: annotate_simple_coord01.ipynb](https://matplotlib.org/_downloads/annotate_simple_coord01.ipynb) + diff --git a/Python/matplotlab/gallery/userdemo/annotate_simple_coord02.md b/Python/matplotlab/gallery/userdemo/annotate_simple_coord02.md new file mode 100644 index 00000000..cd45e92f --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_simple_coord02.md @@ -0,0 +1,27 @@ +# 简单Coord02注释示例 + +![简单Coord02注释示例](https://matplotlib.org/_images/sphx_glr_annotate_simple_coord02_001.png) + +```python +import matplotlib.pyplot as plt + + +fig, ax = plt.subplots(figsize=(3, 2)) +an1 = ax.annotate("Test 1", xy=(0.5, 0.5), xycoords="data", + va="center", ha="center", + bbox=dict(boxstyle="round", fc="w")) + +an2 = ax.annotate("Test 2", xy=(0.5, 1.), xycoords=an1, + xytext=(0.5, 1.1), textcoords=(an1, "axes fraction"), + va="bottom", ha="center", + bbox=dict(boxstyle="round", fc="w"), + arrowprops=dict(arrowstyle="->")) + +fig.subplots_adjust(top=0.83) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_simple_coord02.py](https://matplotlib.org/_downloads/annotate_simple_coord02.py) +- [下载Jupyter notebook: annotate_simple_coord02.ipynb](https://matplotlib.org/_downloads/annotate_simple_coord02.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_simple_coord03.md b/Python/matplotlab/gallery/userdemo/annotate_simple_coord03.md new file mode 100644 index 00000000..17d6ff48 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_simple_coord03.md @@ -0,0 +1,28 @@ +# 简单Coord03注释示例 + +![简单Coord03注释示例](https://matplotlib.org/_images/sphx_glr_annotate_simple_coord03_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.text import OffsetFrom + + +fig, ax = plt.subplots(figsize=(3, 2)) +an1 = ax.annotate("Test 1", xy=(0.5, 0.5), xycoords="data", + va="center", ha="center", + bbox=dict(boxstyle="round", fc="w")) + +offset_from = OffsetFrom(an1, (0.5, 0)) +an2 = ax.annotate("Test 2", xy=(0.1, 0.1), xycoords="data", + xytext=(0, -10), textcoords=offset_from, + # xytext is offset points from "xy=(0.5, 0), xycoords=an1" + va="top", ha="center", + bbox=dict(boxstyle="round", fc="w"), + arrowprops=dict(arrowstyle="->")) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_simple_coord03.py](https://matplotlib.org/_downloads/annotate_simple_coord03.py) +- [下载Jupyter notebook: annotate_simple_coord03.ipynb](https://matplotlib.org/_downloads/annotate_simple_coord03.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/annotate_text_arrow.md b/Python/matplotlab/gallery/userdemo/annotate_text_arrow.md new file mode 100644 index 00000000..c38e7fc8 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/annotate_text_arrow.md @@ -0,0 +1,42 @@ +# 注释文本箭头 + +```python +import numpy as np +import matplotlib.pyplot as plt + +fig, ax = plt.subplots(figsize=(5, 5)) +ax.set_aspect(1) + +x1 = -1 + np.random.randn(100) +y1 = -1 + np.random.randn(100) +x2 = 1. + np.random.randn(100) +y2 = 1. + np.random.randn(100) + +ax.scatter(x1, y1, color="r") +ax.scatter(x2, y2, color="g") + +bbox_props = dict(boxstyle="round", fc="w", ec="0.5", alpha=0.9) +ax.text(-2, -2, "Sample A", ha="center", va="center", size=20, + bbox=bbox_props) +ax.text(2, 2, "Sample B", ha="center", va="center", size=20, + bbox=bbox_props) + + +bbox_props = dict(boxstyle="rarrow", fc=(0.8, 0.9, 0.9), ec="b", lw=2) +t = ax.text(0, 0, "Direction", ha="center", va="center", rotation=45, + size=15, + bbox=bbox_props) + +bb = t.get_bbox_patch() +bb.set_boxstyle("rarrow", pad=0.6) + +ax.set_xlim(-4, 4) +ax.set_ylim(-4, 4) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: annotate_text_arrow.py](https://matplotlib.org/_downloads/annotate_text_arrow.py) +- [下载Jupyter notebook: annotate_text_arrow.ipynb](https://matplotlib.org/_downloads/annotate_text_arrow.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/colormap_normalizations.md b/Python/matplotlab/gallery/userdemo/colormap_normalizations.md new file mode 100644 index 00000000..cde4c0ee --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/colormap_normalizations.md @@ -0,0 +1,149 @@ +# 色彩映射规范化 + +演示使用norm以非线性方式映射颜色映射。 + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors +``` + +Lognorm: Instead of pcolor log10(Z1) you can have colorbars that have +the exponential labels using a norm. + +```python +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] + +# A low hump with a spike coming out of the top. Needs to have +# z/colour axis on a log scale so we see both hump and spike. linear +# scale only shows the spike. + +Z1 = np.exp(-(X)**2 - (Y)**2) +Z2 = np.exp(-(X * 10)**2 - (Y * 10)**2) +Z = Z1 + 50 * Z2 + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolor(X, Y, Z, + norm=colors.LogNorm(vmin=Z.min(), vmax=Z.max()), + cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[0], extend='max') + +pcm = ax[1].pcolor(X, Y, Z, cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[1], extend='max') +``` + +![色彩映射规范化示例](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_001.png) + +PowerNorm:这是X中的幂律趋势,部分遮蔽了Y中的整流正弦波。我们可以使用PowerNorm来消除幂律。 + +```python +X, Y = np.mgrid[0:3:complex(0, N), 0:2:complex(0, N)] +Z1 = (1 + np.sin(Y * 10.)) * X**(2.) + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z1, norm=colors.PowerNorm(gamma=1. / 2.), + cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[0], extend='max') + +pcm = ax[1].pcolormesh(X, Y, Z1, cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[1], extend='max') +``` + +![色彩映射规范化示例2](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_002.png) + +SymLogNorm:两个驼峰,一个正面和一个正面,正面具有5倍幅度。 线性地,你看不到负面的驼峰。 在这里,我们分别以对数方式对正负数据进行对数。 + +请注意,颜色条标签看起来不是很好。 + +```python +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] +Z1 = 5 * np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z1, + norm=colors.SymLogNorm(linthresh=0.03, linscale=0.03, + vmin=-1.0, vmax=1.0), + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both') + +pcm = ax[1].pcolormesh(X, Y, Z1, cmap='RdBu_r', vmin=-np.max(Z1)) +fig.colorbar(pcm, ax=ax[1], extend='both') +``` + +![色彩映射规范化示例3](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_003.png) + +自定义范例:自定义规范化的示例。这个使用上面的例子,并从负数标准化负数据。 + +```python +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +# Example of making your own norm. Also see matplotlib.colors. +# From Joe Kington: This one gives two different linear ramps: + + +class MidpointNormalize(colors.Normalize): + def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False): + self.midpoint = midpoint + colors.Normalize.__init__(self, vmin, vmax, clip) + + def __call__(self, value, clip=None): + # I'm ignoring masked values and all kinds of edge cases to make a + # simple example... + x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1] + return np.ma.masked_array(np.interp(value, x, y)) + + +##### +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z, + norm=MidpointNormalize(midpoint=0.), + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both') + +pcm = ax[1].pcolormesh(X, Y, Z, cmap='RdBu_r', vmin=-np.max(Z)) +fig.colorbar(pcm, ax=ax[1], extend='both') +``` + +![色彩映射规范化示例4](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_004.png) + +常规颜色:对于这一种颜色,您提供了颜色的边界,并且Norm将第一种颜色放在第一对之间,第二种颜色放在第二对之间,依此类推。 + +```python +fig, ax = plt.subplots(3, 1, figsize=(8, 8)) +ax = ax.flatten() +# even bounds gives a contour-like effect +bounds = np.linspace(-1, 1, 10) +norm = colors.BoundaryNorm(boundaries=bounds, ncolors=256) +pcm = ax[0].pcolormesh(X, Y, Z, + norm=norm, + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both', orientation='vertical') + +# uneven bounds changes the colormapping: +bounds = np.array([-0.25, -0.125, 0, 0.5, 1]) +norm = colors.BoundaryNorm(boundaries=bounds, ncolors=256) +pcm = ax[1].pcolormesh(X, Y, Z, norm=norm, cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[1], extend='both', orientation='vertical') + +pcm = ax[2].pcolormesh(X, Y, Z, cmap='RdBu_r', vmin=-np.max(Z1)) +fig.colorbar(pcm, ax=ax[2], extend='both', orientation='vertical') + +plt.show() +``` + +![色彩映射规范化示例5](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_005.png) + +## 下载这个示例 + +- [下载python源码: colormap_normalizations.py](https://matplotlib.org/_downloads/colormap_normalizations.py) +- [下载Jupyter notebook: colormap_normalizations.ipynb](https://matplotlib.org/_downloads/colormap_normalizations.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/colormap_normalizations_bounds.md b/Python/matplotlab/gallery/userdemo/colormap_normalizations_bounds.md new file mode 100644 index 00000000..3d91107a --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/colormap_normalizations_bounds.md @@ -0,0 +1,49 @@ +# 色彩映射规范化边界 + +演示使用规范以非线性方式将颜色映射映射到数据上。 + +![色彩映射规范化边界示例](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_bounds_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors + +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +''' +BoundaryNorm: For this one you provide the boundaries for your colors, +and the Norm puts the first color in between the first pair, the +second color between the second pair, etc. +''' + +fig, ax = plt.subplots(3, 1, figsize=(8, 8)) +ax = ax.flatten() +# even bounds gives a contour-like effect +bounds = np.linspace(-1, 1, 10) +norm = colors.BoundaryNorm(boundaries=bounds, ncolors=256) +pcm = ax[0].pcolormesh(X, Y, Z, + norm=norm, + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both', orientation='vertical') + +# uneven bounds changes the colormapping: +bounds = np.array([-0.25, -0.125, 0, 0.5, 1]) +norm = colors.BoundaryNorm(boundaries=bounds, ncolors=256) +pcm = ax[1].pcolormesh(X, Y, Z, norm=norm, cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[1], extend='both', orientation='vertical') + +pcm = ax[2].pcolormesh(X, Y, Z, cmap='RdBu_r', vmin=-np.max(Z)) +fig.colorbar(pcm, ax=ax[2], extend='both', orientation='vertical') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: colormap_normalizations_bounds.py](https://matplotlib.org/_downloads/colormap_normalizations_bounds.py) +- [下载Jupyter notebook: colormap_normalizations_bounds.ipynb](https://matplotlib.org/_downloads/colormap_normalizations_bounds.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/colormap_normalizations_custom.md b/Python/matplotlab/gallery/userdemo/colormap_normalizations_custom.md new file mode 100644 index 00000000..0176d355 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/colormap_normalizations_custom.md @@ -0,0 +1,55 @@ +# 色彩映射标准化自定义 + +演示使用规范以非线性方式将颜色映射映射到数据上。 + +![色彩映射标准化自定义示例](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_custom_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors + +N = 100 +''' +Custom Norm: An example with a customized normalization. This one +uses the example above, and normalizes the negative data differently +from the positive. +''' +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + + +# Example of making your own norm. Also see matplotlib.colors. +# From Joe Kington: This one gives two different linear ramps: +class MidpointNormalize(colors.Normalize): + def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False): + self.midpoint = midpoint + colors.Normalize.__init__(self, vmin, vmax, clip) + + def __call__(self, value, clip=None): + # I'm ignoring masked values and all kinds of edge cases to make a + # simple example... + x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1] + return np.ma.masked_array(np.interp(value, x, y)) + + +##### +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z, + norm=MidpointNormalize(midpoint=0.), + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both') + +pcm = ax[1].pcolormesh(X, Y, Z, cmap='RdBu_r', vmin=-np.max(Z)) +fig.colorbar(pcm, ax=ax[1], extend='both') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: colormap_normalizations_custom.py](https://matplotlib.org/_downloads/colormap_normalizations_custom.py) +- [下载Jupyter notebook: colormap_normalizations_custom.ipynb](https://matplotlib.org/_downloads/colormap_normalizations_custom.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/colormap_normalizations_lognorm.md b/Python/matplotlab/gallery/userdemo/colormap_normalizations_lognorm.md new file mode 100644 index 00000000..b779ff3c --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/colormap_normalizations_lognorm.md @@ -0,0 +1,40 @@ +# 颜色映射规格化 + +演示使用规范以非线性方式将颜色映射映射到数据上。 + +![颜色映射规格化示例](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_lognorm_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors + +''' +Lognorm: Instead of pcolor log10(Z1) you can have colorbars that have +the exponential labels using a norm. +''' +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] + +# A low hump with a spike coming out of the top right. Needs to have +# z/colour axis on a log scale so we see both hump and spike. linear +# scale only shows the spike. +Z = np.exp(-X**2 - Y**2) + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolor(X, Y, Z, + norm=colors.LogNorm(vmin=Z.min(), vmax=Z.max()), + cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[0], extend='max') + +pcm = ax[1].pcolor(X, Y, Z, cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[1], extend='max') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: colormap_normalizations_lognorm.py](https://matplotlib.org/_downloads/colormap_normalizations_lognorm.py) +- [下载Jupyter notebook: colormap_normalizations_lognorm.ipynb](https://matplotlib.org/_downloads/colormap_normalizations_lognorm.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/colormap_normalizations_power.md b/Python/matplotlab/gallery/userdemo/colormap_normalizations_power.md new file mode 100644 index 00000000..afd29a80 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/colormap_normalizations_power.md @@ -0,0 +1,37 @@ +# 颜色映射规格化功能 + +演示使用规范以非线性方式将颜色映射映射到数据上。 + +![颜色映射规格化功能示例](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_power_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors + +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] + +''' +PowerNorm: Here a power-law trend in X partially obscures a rectified +sine wave in Y. We can remove the power law using a PowerNorm. +''' +X, Y = np.mgrid[0:3:complex(0, N), 0:2:complex(0, N)] +Z1 = (1 + np.sin(Y * 10.)) * X**(2.) + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z1, norm=colors.PowerNorm(gamma=1./2.), + cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[0], extend='max') + +pcm = ax[1].pcolormesh(X, Y, Z1, cmap='PuBu_r') +fig.colorbar(pcm, ax=ax[1], extend='max') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: colormap_normalizations_power.py](https://matplotlib.org/_downloads/colormap_normalizations_power.py) +- [下载Jupyter notebook: colormap_normalizations_power.ipynb](https://matplotlib.org/_downloads/colormap_normalizations_power.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/colormap_normalizations_symlognorm.md b/Python/matplotlab/gallery/userdemo/colormap_normalizations_symlognorm.md new file mode 100644 index 00000000..6d3e70a4 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/colormap_normalizations_symlognorm.md @@ -0,0 +1,44 @@ +# 色彩映射规范化Symlognorm + +演示使用规范以非线性方式将颜色映射映射到数据上。 + +![色彩映射规范化Symlognorm示例](https://matplotlib.org/_images/sphx_glr_colormap_normalizations_symlognorm_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.colors as colors + +""" +SymLogNorm: two humps, one negative and one positive, The positive +with 5-times the amplitude. Linearly, you cannot see detail in the +negative hump. Here we logarithmically scale the positive and +negative data separately. + +Note that colorbar labels do not come out looking very good. +""" + +N = 100 +X, Y = np.mgrid[-3:3:complex(0, N), -2:2:complex(0, N)] +Z1 = np.exp(-X**2 - Y**2) +Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2) +Z = (Z1 - Z2) * 2 + +fig, ax = plt.subplots(2, 1) + +pcm = ax[0].pcolormesh(X, Y, Z, + norm=colors.SymLogNorm(linthresh=0.03, linscale=0.03, + vmin=-1.0, vmax=1.0), + cmap='RdBu_r') +fig.colorbar(pcm, ax=ax[0], extend='both') + +pcm = ax[1].pcolormesh(X, Y, Z, cmap='RdBu_r', vmin=-np.max(Z)) +fig.colorbar(pcm, ax=ax[1], extend='both') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: colormap_normalizations_symlognorm.py](https://matplotlib.org/_downloads/colormap_normalizations_symlognorm.py) +- [下载Jupyter notebook: colormap_normalizations_symlognorm.ipynb](https://matplotlib.org/_downloads/colormap_normalizations_symlognorm.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/connect_simple01.md b/Python/matplotlab/gallery/userdemo/connect_simple01.md new file mode 100644 index 00000000..bf4ad81e --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/connect_simple01.md @@ -0,0 +1,40 @@ +# 连接简单例子01 + +![连接简单例子01示例](https://matplotlib.org/_images/sphx_glr_connect_simple01_001.png) + +```python +from matplotlib.patches import ConnectionPatch +import matplotlib.pyplot as plt + +fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(6, 3)) + +xyA = (0.2, 0.2) +xyB = (0.8, 0.8) +coordsA = "data" +coordsB = "data" +con = ConnectionPatch(xyA, xyB, coordsA, coordsB, + arrowstyle="-|>", shrinkA=5, shrinkB=5, + mutation_scale=20, fc="w") +ax1.plot([xyA[0], xyB[0]], [xyA[1], xyB[1]], "o") +ax1.add_artist(con) + +xy = (0.3, 0.2) +coordsA = "data" +coordsB = "data" +con = ConnectionPatch(xyA=xy, xyB=xy, coordsA=coordsA, coordsB=coordsB, + axesA=ax2, axesB=ax1, + arrowstyle="->", shrinkB=5) +ax2.add_artist(con) + +ax1.set_xlim(0, 1) +ax1.set_ylim(0, 1) +ax2.set_xlim(0, .5) +ax2.set_ylim(0, .5) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: connect_simple01.py](https://matplotlib.org/_downloads/connect_simple01.py) +- [下载Jupyter notebook: connect_simple01.ipynb](https://matplotlib.org/_downloads/connect_simple01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/connectionstyle_demo.md b/Python/matplotlab/gallery/userdemo/connectionstyle_demo.md new file mode 100644 index 00000000..549ac5b7 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/connectionstyle_demo.md @@ -0,0 +1,60 @@ +# 连接样式演示 + +![连接样式演示](https://matplotlib.org/_images/sphx_glr_connectionstyle_demo_001.png) + +```python +import matplotlib.pyplot as plt + + +fig, axs = plt.subplots(3, 5, figsize=(8, 4.8)) +x1, y1 = 0.3, 0.3 +x2, y2 = 0.7, 0.7 + + +def demo_con_style(ax, connectionstyle, label=None): + x1, y1 = 0.3, 0.2 + x2, y2 = 0.8, 0.6 + + ax.plot([x1, x2], [y1, y2], ".") + ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="->", + color="0.5", + shrinkA=5, shrinkB=5, + patchA=None, + patchB=None, + connectionstyle=connectionstyle, + ), + ) + + ax.text(.05, .95, connectionstyle.replace(",", ",\n"), + transform=ax.transAxes, ha="left", va="top") + + +demo_con_style(axs[0, 0], "angle3,angleA=90,angleB=0") +demo_con_style(axs[1, 0], "angle3,angleA=0,angleB=90") +demo_con_style(axs[0, 1], "arc3,rad=0.") +demo_con_style(axs[1, 1], "arc3,rad=0.3") +demo_con_style(axs[2, 1], "arc3,rad=-0.3") +demo_con_style(axs[0, 2], "angle,angleA=-90,angleB=180,rad=0") +demo_con_style(axs[1, 2], "angle,angleA=-90,angleB=180,rad=5") +demo_con_style(axs[2, 2], "angle,angleA=-90,angleB=10,rad=5") +demo_con_style(axs[0, 3], "arc,angleA=-90,angleB=0,armA=30,armB=30,rad=0") +demo_con_style(axs[1, 3], "arc,angleA=-90,angleB=0,armA=30,armB=30,rad=5") +demo_con_style(axs[2, 3], "arc,angleA=-90,angleB=0,armA=0,armB=40,rad=0") +demo_con_style(axs[0, 4], "bar,fraction=0.3") +demo_con_style(axs[1, 4], "bar,fraction=-0.3") +demo_con_style(axs[2, 4], "bar,angle=180,fraction=-0.2") + +for ax in axs.flat: + ax.set(xlim=(0, 1), ylim=(0, 1), xticks=[], yticks=[], aspect=1) +fig.tight_layout(pad=0) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: connectionstyle_demo.py](https://matplotlib.org/_downloads/connectionstyle_demo.py) +- [下载Jupyter notebook: connectionstyle_demo.ipynb](https://matplotlib.org/_downloads/connectionstyle_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/custom_boxstyle01.md b/Python/matplotlab/gallery/userdemo/custom_boxstyle01.md new file mode 100644 index 00000000..4471851c --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/custom_boxstyle01.md @@ -0,0 +1,60 @@ +# 自定义Boxstyle01 + +![自定义Boxstyle01示例](https://matplotlib.org/_images/sphx_glr_custom_boxstyle01_001.png) + +```python +from matplotlib.path import Path + + +def custom_box_style(x0, y0, width, height, mutation_size, mutation_aspect=1): + """ + Given the location and size of the box, return the path of + the box around it. + + - *x0*, *y0*, *width*, *height* : location and size of the box + - *mutation_size* : a reference scale for the mutation. + - *aspect_ratio* : aspect-ration for the mutation. + """ + + # note that we are ignoring mutation_aspect. This is okay in general. + + # padding + mypad = 0.3 + pad = mutation_size * mypad + + # width and height with padding added. + width = width + 2 * pad + height = height + 2 * pad + + # boundary of the padded box + x0, y0 = x0 - pad, y0 - pad + x1, y1 = x0 + width, y0 + height + + cp = [(x0, y0), + (x1, y0), (x1, y1), (x0, y1), + (x0-pad, (y0+y1)/2.), (x0, y0), + (x0, y0)] + + com = [Path.MOVETO, + Path.LINETO, Path.LINETO, Path.LINETO, + Path.LINETO, Path.LINETO, + Path.CLOSEPOLY] + + path = Path(cp, com) + + return path + + +import matplotlib.pyplot as plt + +fig, ax = plt.subplots(figsize=(3, 3)) +ax.text(0.5, 0.5, "Test", size=30, va="center", ha="center", + bbox=dict(boxstyle=custom_box_style, alpha=0.2)) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: custom_boxstyle01.py](https://matplotlib.org/_downloads/custom_boxstyle01.py) +- [下载Jupyter notebook: custom_boxstyle01.ipynb](https://matplotlib.org/_downloads/custom_boxstyle01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/custom_boxstyle02.md b/Python/matplotlab/gallery/userdemo/custom_boxstyle02.md new file mode 100644 index 00000000..fb6d5fe7 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/custom_boxstyle02.md @@ -0,0 +1,85 @@ +# 自定义Boxstyle02 + +![自定义Boxstyle02示例](https://matplotlib.org/_images/sphx_glr_custom_boxstyle02_001.png) + +```python +from matplotlib.path import Path +from matplotlib.patches import BoxStyle +import matplotlib.pyplot as plt + + +# we may derive from matplotlib.patches.BoxStyle._Base class. +# You need to override transmute method in this case. +class MyStyle(BoxStyle._Base): + """ + A simple box. + """ + + def __init__(self, pad=0.3): + """ + The arguments need to be floating numbers and need to have + default values. + + *pad* + amount of padding + """ + + self.pad = pad + super().__init__() + + def transmute(self, x0, y0, width, height, mutation_size): + """ + Given the location and size of the box, return the path of + the box around it. + + - *x0*, *y0*, *width*, *height* : location and size of the box + - *mutation_size* : a reference scale for the mutation. + + Often, the *mutation_size* is the font size of the text. + You don't need to worry about the rotation as it is + automatically taken care of. + """ + + # padding + pad = mutation_size * self.pad + + # width and height with padding added. + width, height = width + 2.*pad, \ + height + 2.*pad, + + # boundary of the padded box + x0, y0 = x0-pad, y0-pad, + x1, y1 = x0+width, y0 + height + + cp = [(x0, y0), + (x1, y0), (x1, y1), (x0, y1), + (x0-pad, (y0+y1)/2.), (x0, y0), + (x0, y0)] + + com = [Path.MOVETO, + Path.LINETO, Path.LINETO, Path.LINETO, + Path.LINETO, Path.LINETO, + Path.CLOSEPOLY] + + path = Path(cp, com) + + return path + + +# register the custom style +BoxStyle._style_list["angled"] = MyStyle + +fig, ax = plt.subplots(figsize=(3, 3)) +ax.text(0.5, 0.5, "Test", size=30, va="center", ha="center", rotation=30, + bbox=dict(boxstyle="angled,pad=0.5", alpha=0.2)) + +del BoxStyle._style_list["angled"] + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: custom_boxstyle02.py](https://matplotlib.org/_downloads/custom_boxstyle02.py) +- [下载Jupyter notebook: custom_boxstyle02.ipynb](https://matplotlib.org/_downloads/custom_boxstyle02.ipynb) + diff --git a/Python/matplotlab/gallery/userdemo/demo_gridspec01.md b/Python/matplotlab/gallery/userdemo/demo_gridspec01.md new file mode 100644 index 00000000..83b98445 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/demo_gridspec01.md @@ -0,0 +1,31 @@ +# Gridspec演示01 + +![Gridspec演示01](https://matplotlib.org/_images/sphx_glr_demo_gridspec01_000.png) + +```python +import matplotlib.pyplot as plt + + +def make_ticklabels_invisible(fig): + for i, ax in enumerate(fig.axes): + ax.text(0.5, 0.5, "ax%d" % (i+1), va="center", ha="center") + ax.tick_params(labelbottom=False, labelleft=False) + + +fig = plt.figure(0) +ax1 = plt.subplot2grid((3, 3), (0, 0), colspan=3) +ax2 = plt.subplot2grid((3, 3), (1, 0), colspan=2) +ax3 = plt.subplot2grid((3, 3), (1, 2), rowspan=2) +ax4 = plt.subplot2grid((3, 3), (2, 0)) +ax5 = plt.subplot2grid((3, 3), (2, 1)) + +fig.suptitle("subplot2grid") +make_ticklabels_invisible(fig) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_gridspec01.py](https://matplotlib.org/_downloads/demo_gridspec01.py) +- [下载Jupyter notebook: demo_gridspec01.ipynb](https://matplotlib.org/_downloads/demo_gridspec01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/demo_gridspec03.md b/Python/matplotlab/gallery/userdemo/demo_gridspec03.md new file mode 100644 index 00000000..71c20f4d --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/demo_gridspec03.md @@ -0,0 +1,42 @@ +# Gridspec演示03 + +![Gridspec演示03](https://matplotlib.org/_images/sphx_glr_demo_gridspec03_001.png) + +```python +import matplotlib.pyplot as plt +from matplotlib.gridspec import GridSpec + + +def make_ticklabels_invisible(fig): + for i, ax in enumerate(fig.axes): + ax.text(0.5, 0.5, "ax%d" % (i+1), va="center", ha="center") + ax.tick_params(labelbottom=False, labelleft=False) + + +# demo 3 : gridspec with subplotpars set. + +fig = plt.figure() + +fig.suptitle("GridSpec w/ different subplotpars") + +gs1 = GridSpec(3, 3) +gs1.update(left=0.05, right=0.48, wspace=0.05) +ax1 = plt.subplot(gs1[:-1, :]) +ax2 = plt.subplot(gs1[-1, :-1]) +ax3 = plt.subplot(gs1[-1, -1]) + +gs2 = GridSpec(3, 3) +gs2.update(left=0.55, right=0.98, hspace=0.05) +ax4 = plt.subplot(gs2[:, :-1]) +ax5 = plt.subplot(gs2[:-1, -1]) +ax6 = plt.subplot(gs2[-1, -1]) + +make_ticklabels_invisible(fig) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_gridspec03.py](https://matplotlib.org/_downloads/demo_gridspec03.py) +- [下载Jupyter notebook: demo_gridspec03.ipynb](https://matplotlib.org/_downloads/demo_gridspec03.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/demo_gridspec05.md b/Python/matplotlab/gallery/userdemo/demo_gridspec05.md new file mode 100644 index 00000000..5aef712a --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/demo_gridspec05.md @@ -0,0 +1,33 @@ +# Gridspec演示05 + +![Gridspec演示05](https://matplotlib.org/_images/sphx_glr_demo_gridspec05_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec + + +def make_ticklabels_invisible(fig): + for i, ax in enumerate(fig.axes): + ax.text(0.5, 0.5, "ax%d" % (i+1), va="center", ha="center") + ax.tick_params(labelbottom=False, labelleft=False) + + +f = plt.figure() + +gs = gridspec.GridSpec(2, 2, + width_ratios=[1, 2], height_ratios=[4, 1]) + +ax1 = plt.subplot(gs[0]) +ax2 = plt.subplot(gs[1]) +ax3 = plt.subplot(gs[2]) +ax4 = plt.subplot(gs[3]) + +make_ticklabels_invisible(f) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: demo_gridspec05.py](https://matplotlib.org/_downloads/demo_gridspec05.py) +- [下载Jupyter notebook: demo_gridspec05.ipynb](https://matplotlib.org/_downloads/demo_gridspec05.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/demo_gridspec06.md b/Python/matplotlab/gallery/userdemo/demo_gridspec06.md new file mode 100644 index 00000000..6f9bc801 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/demo_gridspec06.md @@ -0,0 +1,57 @@ +# Gridspec演示06 + +![Gridspec演示06](https://matplotlib.org/_images/sphx_glr_demo_gridspec06_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec +import numpy as np +from itertools import product + + +def squiggle_xy(a, b, c, d): + i = np.arange(0.0, 2*np.pi, 0.05) + return np.sin(i*a)*np.cos(i*b), np.sin(i*c)*np.cos(i*d) + + +fig = plt.figure(figsize=(8, 8)) + +# gridspec inside gridspec +outer_grid = gridspec.GridSpec(4, 4, wspace=0.0, hspace=0.0) + +for i in range(16): + inner_grid = gridspec.GridSpecFromSubplotSpec(3, 3, + subplot_spec=outer_grid[i], wspace=0.0, hspace=0.0) + a = i // 4 + 1 + b = i % 4 + 1 + for j, (c, d) in enumerate(product(range(1, 4), repeat=2)): + ax = plt.Subplot(fig, inner_grid[j]) + ax.plot(*squiggle_xy(a, b, c, d)) + ax.set_xticks([]) + ax.set_yticks([]) + fig.add_subplot(ax) + +all_axes = fig.get_axes() + +#show only the outside spines +for ax in all_axes: + for sp in ax.spines.values(): + sp.set_visible(False) + if ax.is_first_row(): + ax.spines['top'].set_visible(True) + if ax.is_last_row(): + ax.spines['bottom'].set_visible(True) + if ax.is_first_col(): + ax.spines['left'].set_visible(True) + if ax.is_last_col(): + ax.spines['right'].set_visible(True) + +plt.show() +``` + +Total running time of the script: ( 0 minutes 2.041 seconds) + +## 下载这个示例 + +- [下载python源码: demo_gridspec06.py](https://matplotlib.org/_downloads/demo_gridspec06.py) +- [下载Jupyter notebook: demo_gridspec06.ipynb](https://matplotlib.org/_downloads/demo_gridspec06.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/pgf_fonts.md b/Python/matplotlab/gallery/userdemo/pgf_fonts.md new file mode 100644 index 00000000..feeee1f8 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/pgf_fonts.md @@ -0,0 +1,29 @@ +# Pgf字体 + +![Pgf字体](https://matplotlib.org/_images/sphx_glr_pgf_fonts_001.png) + +```python +import matplotlib.pyplot as plt +plt.rcParams.update({ + "font.family": "serif", + "font.serif": [], # use latex default serif font + "font.sans-serif": ["DejaVu Sans"], # use a specific sans-serif font +}) + +plt.figure(figsize=(4.5, 2.5)) +plt.plot(range(5)) +plt.text(0.5, 3., "serif") +plt.text(0.5, 2., "monospace", family="monospace") +plt.text(2.5, 2., "sans-serif", family="sans-serif") +plt.text(2.5, 1., "comic sans", family="Comic Sans MS") +plt.xlabel("µ is not $\\mu$") +plt.tight_layout(.5) + +plt.savefig("pgf_fonts.pdf") +plt.savefig("pgf_fonts.png") +``` + +## 下载这个示例 + +- [下载python源码: pgf_fonts.py](https://matplotlib.org/_downloads/pgf_fonts.py) +- [下载Jupyter notebook: pgf_fonts.ipynb](https://matplotlib.org/_downloads/pgf_fonts.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/pgf_preamble_sgskip.md b/Python/matplotlab/gallery/userdemo/pgf_preamble_sgskip.md new file mode 100644 index 00000000..3634f2f1 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/pgf_preamble_sgskip.md @@ -0,0 +1,34 @@ +# Pgf序言 + +```python +import matplotlib as mpl +mpl.use("pgf") +import matplotlib.pyplot as plt +plt.rcParams.update({ + "font.family": "serif", # use serif/main font for text elements + "text.usetex": True, # use inline math for ticks + "pgf.rcfonts": False, # don't setup fonts from rc parameters + "pgf.preamble": [ + "\\usepackage{units}", # load additional packages + "\\usepackage{metalogo}", + "\\usepackage{unicode-math}", # unicode math setup + r"\setmathfont{xits-math.otf}", + r"\setmainfont{DejaVu Serif}", # serif font via preamble + ] +}) + +plt.figure(figsize=(4.5, 2.5)) +plt.plot(range(5)) +plt.xlabel("unicode text: я, ψ, €, ü, \\unitfrac[10]{°}{µm}") +plt.ylabel("\\XeLaTeX") +plt.legend(["unicode math: $λ=∑_i^∞ μ_i^2$"]) +plt.tight_layout(.5) + +plt.savefig("pgf_preamble.pdf") +plt.savefig("pgf_preamble.png") +``` + +## 下载这个示例 + +- [下载python源码: pgf_preamble_sgskip.py](https://matplotlib.org/_downloads/pgf_preamble_sgskip.py) +- [下载Jupyter notebook: pgf_preamble_sgskip.ipynb](https://matplotlib.org/_downloads/pgf_preamble_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/pgf_texsystem.md b/Python/matplotlab/gallery/userdemo/pgf_texsystem.md new file mode 100644 index 00000000..9922e333 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/pgf_texsystem.md @@ -0,0 +1,31 @@ +# Pgf文本系统 + +![Pgf文本系统示例](https://matplotlib.org/_images/sphx_glr_pgf_texsystem_001.png) + +```python +import matplotlib.pyplot as plt +plt.rcParams.update({ + "pgf.texsystem": "pdflatex", + "pgf.preamble": [ + r"\usepackage[utf8x]{inputenc}", + r"\usepackage[T1]{fontenc}", + r"\usepackage{cmbright}", + ] +}) + +plt.figure(figsize=(4.5, 2.5)) +plt.plot(range(5)) +plt.text(0.5, 3., "serif", family="serif") +plt.text(0.5, 2., "monospace", family="monospace") +plt.text(2.5, 2., "sans-serif", family="sans-serif") +plt.xlabel(r"µ is not $\mu$") +plt.tight_layout(.5) + +plt.savefig("pgf_texsystem.pdf") +plt.savefig("pgf_texsystem.png") +``` + +## 下载这个示例 + +- [下载python源码: pgf_texsystem.py](https://matplotlib.org/_downloads/pgf_texsystem.py) +- [下载Jupyter notebook: pgf_texsystem.ipynb](https://matplotlib.org/_downloads/pgf_texsystem.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/simple_annotate01.md b/Python/matplotlab/gallery/userdemo/simple_annotate01.md new file mode 100644 index 00000000..6e1a881d --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/simple_annotate01.md @@ -0,0 +1,93 @@ +# 简单的Annotate01 + +![简单的Annotate01示例](https://matplotlib.org/_images/sphx_glr_simple_annotate01_001.png) + +```python +import matplotlib.pyplot as plt +import matplotlib.patches as mpatches + + +fig, axs = plt.subplots(2, 4) +x1, y1 = 0.3, 0.3 +x2, y2 = 0.7, 0.7 + +ax = axs.flat[0] +ax.plot([x1, x2], [y1, y2], "o") +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="->")) +ax.text(.05, .95, "A $->$ B", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[2] +ax.plot([x1, x2], [y1, y2], "o") +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=0.3", + shrinkB=5) + ) +ax.text(.05, .95, "shrinkB=5", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[3] +ax.plot([x1, x2], [y1, y2], "o") +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=0.3")) +ax.text(.05, .95, "connectionstyle=arc3", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[4] +ax.plot([x1, x2], [y1, y2], "o") +el = mpatches.Ellipse((x1, y1), 0.3, 0.4, angle=30, alpha=0.5) +ax.add_artist(el) +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=0.2") + ) + +ax = axs.flat[5] +ax.plot([x1, x2], [y1, y2], "o") +el = mpatches.Ellipse((x1, y1), 0.3, 0.4, angle=30, alpha=0.5) +ax.add_artist(el) +ax.annotate("", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=0.2", + patchB=el) + ) +ax.text(.05, .95, "patchB", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[6] +ax.plot([x1], [y1], "o") +ax.annotate("Test", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + ha="center", va="center", + bbox=dict(boxstyle="round", fc="w"), + arrowprops=dict(arrowstyle="->") + ) +ax.text(.05, .95, "annotate", transform=ax.transAxes, ha="left", va="top") + +ax = axs.flat[7] +ax.plot([x1], [y1], "o") +ax.annotate("Test", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='data', + ha="center", va="center", + bbox=dict(boxstyle="round", fc="w", ), + arrowprops=dict(arrowstyle="->", relpos=(0., 0.)) + ) +ax.text(.05, .95, "relpos=(0,0)", transform=ax.transAxes, ha="left", va="top") + +for ax in axs.flat: + ax.set(xlim=(0, 1), ylim=(0, 1), xticks=[], yticks=[], aspect=1) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_annotate01.py](https://matplotlib.org/_downloads/simple_annotate01.py) +- [下载Jupyter notebook: simple_annotate01.ipynb](https://matplotlib.org/_downloads/simple_annotate01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/simple_legend01.md b/Python/matplotlab/gallery/userdemo/simple_legend01.md new file mode 100644 index 00000000..593c0c28 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/simple_legend01.md @@ -0,0 +1,29 @@ +# 简单的Legend01 + +![简单的Legend01示例](https://matplotlib.org/_images/sphx_glr_simple_legend01_001.png) + +```python +import matplotlib.pyplot as plt + + +plt.subplot(211) +plt.plot([1, 2, 3], label="test1") +plt.plot([3, 2, 1], label="test2") +# Place a legend above this subplot, expanding itself to +# fully use the given bounding box. +plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', + ncol=2, mode="expand", borderaxespad=0.) + +plt.subplot(223) +plt.plot([1, 2, 3], label="test1") +plt.plot([3, 2, 1], label="test2") +# Place a legend to the right of this smaller subplot. +plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_legend01.py](https://matplotlib.org/_downloads/simple_legend01.py) +- [下载Jupyter notebook: simple_legend01.ipynb](https://matplotlib.org/_downloads/simple_legend01.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/userdemo/simple_legend02.md b/Python/matplotlab/gallery/userdemo/simple_legend02.md new file mode 100644 index 00000000..cda1ddf7 --- /dev/null +++ b/Python/matplotlab/gallery/userdemo/simple_legend02.md @@ -0,0 +1,28 @@ +# 简单的Legend02 + +![简单的Legend02示例](https://matplotlib.org/_images/sphx_glr_simple_legend02_001.png) + +```python +import matplotlib.pyplot as plt + +fig, ax = plt.subplots() + +line1, = ax.plot([1, 2, 3], label="Line 1", linestyle='--') +line2, = ax.plot([3, 2, 1], label="Line 2", linewidth=4) + +# Create a legend for the first line. +first_legend = ax.legend(handles=[line1], loc='upper right') + +# Add the legend manually to the current Axes. +ax.add_artist(first_legend) + +# Create another legend for the second line. +ax.legend(handles=[line2], loc='lower right') + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: simple_legend02.py](https://matplotlib.org/_downloads/simple_legend02.py) +- [下载Jupyter notebook: simple_legend02.ipynb](https://matplotlib.org/_downloads/simple_legend02.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/buttons.md b/Python/matplotlab/gallery/widgets/buttons.md new file mode 100644 index 00000000..848d25df --- /dev/null +++ b/Python/matplotlab/gallery/widgets/buttons.md @@ -0,0 +1,54 @@ +# 按钮 + +构建一个简单的按钮GUI来修改正弦波。 + +``下一个``和``上一个``按钮小部件有助于以新频率显示波形。 + +![按钮示例](https://matplotlib.org/_images/sphx_glr_buttons_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.widgets import Button + +freqs = np.arange(2, 20, 3) + +fig, ax = plt.subplots() +plt.subplots_adjust(bottom=0.2) +t = np.arange(0.0, 1.0, 0.001) +s = np.sin(2*np.pi*freqs[0]*t) +l, = plt.plot(t, s, lw=2) + + +class Index(object): + ind = 0 + + def next(self, event): + self.ind += 1 + i = self.ind % len(freqs) + ydata = np.sin(2*np.pi*freqs[i]*t) + l.set_ydata(ydata) + plt.draw() + + def prev(self, event): + self.ind -= 1 + i = self.ind % len(freqs) + ydata = np.sin(2*np.pi*freqs[i]*t) + l.set_ydata(ydata) + plt.draw() + +callback = Index() +axprev = plt.axes([0.7, 0.05, 0.1, 0.075]) +axnext = plt.axes([0.81, 0.05, 0.1, 0.075]) +bnext = Button(axnext, 'Next') +bnext.on_clicked(callback.next) +bprev = Button(axprev, 'Previous') +bprev.on_clicked(callback.prev) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: buttons.py](https://matplotlib.org/_downloads/buttons.py) +- [下载Jupyter notebook: buttons.ipynb](https://matplotlib.org/_downloads/buttons.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/check_buttons.md b/Python/matplotlab/gallery/widgets/check_buttons.md new file mode 100644 index 00000000..3ecc3ec5 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/check_buttons.md @@ -0,0 +1,47 @@ +# 复选按钮 + +使用复选按钮打开和关闭视觉元素。 + +该程序显示了“检查按钮”的使用,类似于复选框。 显示了3种不同的正弦波,我们可以选择使用复选按钮显示哪些波形。 + +![复选按钮示例](https://matplotlib.org/_images/sphx_glr_check_buttons_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.widgets import CheckButtons + +t = np.arange(0.0, 2.0, 0.01) +s0 = np.sin(2*np.pi*t) +s1 = np.sin(4*np.pi*t) +s2 = np.sin(6*np.pi*t) + +fig, ax = plt.subplots() +l0, = ax.plot(t, s0, visible=False, lw=2, color='k', label='2 Hz') +l1, = ax.plot(t, s1, lw=2, color='r', label='4 Hz') +l2, = ax.plot(t, s2, lw=2, color='g', label='6 Hz') +plt.subplots_adjust(left=0.2) + +lines = [l0, l1, l2] + +# Make checkbuttons with all plotted lines with correct visibility +rax = plt.axes([0.05, 0.4, 0.1, 0.15]) +labels = [str(line.get_label()) for line in lines] +visibility = [line.get_visible() for line in lines] +check = CheckButtons(rax, labels, visibility) + + +def func(label): + index = labels.index(label) + lines[index].set_visible(not lines[index].get_visible()) + plt.draw() + +check.on_clicked(func) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: check_buttons.py](https://matplotlib.org/_downloads/check_buttons.py) +- [下载Jupyter notebook: check_buttons.ipynb](https://matplotlib.org/_downloads/check_buttons.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/cursor.md b/Python/matplotlab/gallery/widgets/cursor.md new file mode 100644 index 00000000..4ebc85b9 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/cursor.md @@ -0,0 +1,31 @@ +# 光标 + +![光标示例](https://matplotlib.org/_images/sphx_glr_cursor_001.png) + +```python +from matplotlib.widgets import Cursor +import numpy as np +import matplotlib.pyplot as plt + + +# Fixing random state for reproducibility +np.random.seed(19680801) + +fig = plt.figure(figsize=(8, 6)) +ax = fig.add_subplot(111, facecolor='#FFFFCC') + +x, y = 4*(np.random.rand(2, 100) - .5) +ax.plot(x, y, 'o') +ax.set_xlim(-2, 2) +ax.set_ylim(-2, 2) + +# Set useblit=True on most backends for enhanced performance. +cursor = Cursor(ax, useblit=True, color='red', linewidth=2) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: cursor.py](https://matplotlib.org/_downloads/cursor.py) +- [下载Jupyter notebook: cursor.ipynb](https://matplotlib.org/_downloads/cursor.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/index.md b/Python/matplotlab/gallery/widgets/index.md new file mode 100644 index 00000000..154d7f09 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/index.md @@ -0,0 +1,3 @@ +# 小部件 + +如何在matplotlib中编写原始的、但与GUI无关的小部件的示例 \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/lasso_selector_demo_sgskip.md b/Python/matplotlab/gallery/widgets/lasso_selector_demo_sgskip.md new file mode 100644 index 00000000..9008a9fc --- /dev/null +++ b/Python/matplotlab/gallery/widgets/lasso_selector_demo_sgskip.md @@ -0,0 +1,102 @@ +# 套索选择器演示 + +使用套索工具以交互方式选择数据点。 + +此示例绘制散点图。 然后,您可以通过在图表上的点周围绘制套索循环来选择几个点。 要绘制,只需单击图形,按住,然后将其拖动到需要选择的点周围。 + +```python +import numpy as np + +from matplotlib.widgets import LassoSelector +from matplotlib.path import Path + + +class SelectFromCollection(object): + """Select indices from a matplotlib collection using `LassoSelector`. + + Selected indices are saved in the `ind` attribute. This tool fades out the + points that are not part of the selection (i.e., reduces their alpha + values). If your collection has alpha < 1, this tool will permanently + alter the alpha values. + + Note that this tool selects collection objects based on their *origins* + (i.e., `offsets`). + + Parameters + ---------- + ax : :class:`~matplotlib.axes.Axes` + Axes to interact with. + + collection : :class:`matplotlib.collections.Collection` subclass + Collection you want to select from. + + alpha_other : 0 <= float <= 1 + To highlight a selection, this tool sets all selected points to an + alpha value of 1 and non-selected points to `alpha_other`. + """ + + def __init__(self, ax, collection, alpha_other=0.3): + self.canvas = ax.figure.canvas + self.collection = collection + self.alpha_other = alpha_other + + self.xys = collection.get_offsets() + self.Npts = len(self.xys) + + # Ensure that we have separate colors for each object + self.fc = collection.get_facecolors() + if len(self.fc) == 0: + raise ValueError('Collection must have a facecolor') + elif len(self.fc) == 1: + self.fc = np.tile(self.fc, (self.Npts, 1)) + + self.lasso = LassoSelector(ax, onselect=self.onselect) + self.ind = [] + + def onselect(self, verts): + path = Path(verts) + self.ind = np.nonzero(path.contains_points(self.xys))[0] + self.fc[:, -1] = self.alpha_other + self.fc[self.ind, -1] = 1 + self.collection.set_facecolors(self.fc) + self.canvas.draw_idle() + + def disconnect(self): + self.lasso.disconnect_events() + self.fc[:, -1] = 1 + self.collection.set_facecolors(self.fc) + self.canvas.draw_idle() + + +if __name__ == '__main__': + import matplotlib.pyplot as plt + + # Fixing random state for reproducibility + np.random.seed(19680801) + + data = np.random.rand(100, 2) + + subplot_kw = dict(xlim=(0, 1), ylim=(0, 1), autoscale_on=False) + fig, ax = plt.subplots(subplot_kw=subplot_kw) + + pts = ax.scatter(data[:, 0], data[:, 1], s=80) + selector = SelectFromCollection(ax, pts) + + def accept(event): + if event.key == "enter": + print("Selected points:") + print(selector.xys[selector.ind]) + selector.disconnect() + ax.set_title("") + fig.canvas.draw() + + fig.canvas.mpl_connect("key_press_event", accept) + ax.set_title("Press enter to accept selected points.") + + plt.show() +``` + +## 下载这个示例 + +- [下载python源码: lasso_selector_demo_sgskip.py](https://matplotlib.org/_downloads/lasso_selector_demo_sgskip.py) +- [下载Jupyter notebook: lasso_selector_demo_sgskip.ipynb](https://matplotlib.org/_downloads/lasso_selector_demo_sgskip.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/menu.md b/Python/matplotlab/gallery/widgets/menu.md new file mode 100644 index 00000000..e78c7419 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/menu.md @@ -0,0 +1,194 @@ +# 菜单 + +![菜单示例](https://matplotlib.org/_images/sphx_glr_menu_001.png) + +输出: + +```python +100 371 91 29 +100 342 91 29 +100 313 91 29 +100 284 91 29 +100 255 91 29 +``` + +```python +import numpy as np +import matplotlib.colors as colors +import matplotlib.patches as patches +import matplotlib.mathtext as mathtext +import matplotlib.pyplot as plt +import matplotlib.artist as artist +import matplotlib.image as image + + +class ItemProperties(object): + def __init__(self, fontsize=14, labelcolor='black', bgcolor='yellow', + alpha=1.0): + self.fontsize = fontsize + self.labelcolor = labelcolor + self.bgcolor = bgcolor + self.alpha = alpha + + self.labelcolor_rgb = colors.to_rgba(labelcolor)[:3] + self.bgcolor_rgb = colors.to_rgba(bgcolor)[:3] + + +class MenuItem(artist.Artist): + parser = mathtext.MathTextParser("Bitmap") + padx = 5 + pady = 5 + + def __init__(self, fig, labelstr, props=None, hoverprops=None, + on_select=None): + artist.Artist.__init__(self) + + self.set_figure(fig) + self.labelstr = labelstr + + if props is None: + props = ItemProperties() + + if hoverprops is None: + hoverprops = ItemProperties() + + self.props = props + self.hoverprops = hoverprops + + self.on_select = on_select + + x, self.depth = self.parser.to_mask( + labelstr, fontsize=props.fontsize, dpi=fig.dpi) + + if props.fontsize != hoverprops.fontsize: + raise NotImplementedError( + 'support for different font sizes not implemented') + + self.labelwidth = x.shape[1] + self.labelheight = x.shape[0] + + self.labelArray = np.zeros((x.shape[0], x.shape[1], 4)) + self.labelArray[:, :, -1] = x/255. + + self.label = image.FigureImage(fig, origin='upper') + self.label.set_array(self.labelArray) + + # we'll update these later + self.rect = patches.Rectangle((0, 0), 1, 1) + + self.set_hover_props(False) + + fig.canvas.mpl_connect('button_release_event', self.check_select) + + def check_select(self, event): + over, junk = self.rect.contains(event) + if not over: + return + + if self.on_select is not None: + self.on_select(self) + + def set_extent(self, x, y, w, h): + print(x, y, w, h) + self.rect.set_x(x) + self.rect.set_y(y) + self.rect.set_width(w) + self.rect.set_height(h) + + self.label.ox = x + self.padx + self.label.oy = y - self.depth + self.pady/2. + + self.hover = False + + def draw(self, renderer): + self.rect.draw(renderer) + self.label.draw(renderer) + + def set_hover_props(self, b): + if b: + props = self.hoverprops + else: + props = self.props + + r, g, b = props.labelcolor_rgb + self.labelArray[:, :, 0] = r + self.labelArray[:, :, 1] = g + self.labelArray[:, :, 2] = b + self.label.set_array(self.labelArray) + self.rect.set(facecolor=props.bgcolor, alpha=props.alpha) + + def set_hover(self, event): + 'check the hover status of event and return true if status is changed' + b, junk = self.rect.contains(event) + + changed = (b != self.hover) + + if changed: + self.set_hover_props(b) + + self.hover = b + return changed + + +class Menu(object): + def __init__(self, fig, menuitems): + self.figure = fig + fig.suppressComposite = True + + self.menuitems = menuitems + self.numitems = len(menuitems) + + maxw = max(item.labelwidth for item in menuitems) + maxh = max(item.labelheight for item in menuitems) + + totalh = self.numitems*maxh + (self.numitems + 1)*2*MenuItem.pady + + x0 = 100 + y0 = 400 + + width = maxw + 2*MenuItem.padx + height = maxh + MenuItem.pady + + for item in menuitems: + left = x0 + bottom = y0 - maxh - MenuItem.pady + + item.set_extent(left, bottom, width, height) + + fig.artists.append(item) + y0 -= maxh + MenuItem.pady + + fig.canvas.mpl_connect('motion_notify_event', self.on_move) + + def on_move(self, event): + draw = False + for item in self.menuitems: + draw = item.set_hover(event) + if draw: + self.figure.canvas.draw() + break + + +fig = plt.figure() +fig.subplots_adjust(left=0.3) +props = ItemProperties(labelcolor='black', bgcolor='yellow', + fontsize=15, alpha=0.2) +hoverprops = ItemProperties(labelcolor='white', bgcolor='blue', + fontsize=15, alpha=0.2) + +menuitems = [] +for label in ('open', 'close', 'save', 'save as', 'quit'): + def on_select(item): + print('you selected %s' % item.labelstr) + item = MenuItem(fig, label, props=props, hoverprops=hoverprops, + on_select=on_select) + menuitems.append(item) + +menu = Menu(fig, menuitems) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: menu.py](https://matplotlib.org/_downloads/menu.py) +- [下载Jupyter notebook: menu.ipynb](https://matplotlib.org/_downloads/menu.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/multicursor.md b/Python/matplotlab/gallery/widgets/multicursor.md new file mode 100644 index 00000000..45071e67 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/multicursor.md @@ -0,0 +1,29 @@ +# 多光标 + +同时在多个图上显示光标。 + +此示例生成两个子图,并将光标悬停在一个子图中的数据上,该数据点的值分别显示在两个子图中。 + +![多光标示例](https://matplotlib.org/_images/sphx_glr_multicursor_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.widgets import MultiCursor + +t = np.arange(0.0, 2.0, 0.01) +s1 = np.sin(2*np.pi*t) +s2 = np.sin(4*np.pi*t) + +fig, (ax1, ax2) = plt.subplots(2, sharex=True) +ax1.plot(t, s1) +ax2.plot(t, s2) + +multi = MultiCursor(fig.canvas, (ax1, ax2), color='r', lw=1) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: multicursor.py](https://matplotlib.org/_downloads/multicursor.py) +- [下载Jupyter notebook: multicursor.ipynb](https://matplotlib.org/_downloads/multicursor.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/polygon_selector_demo.md b/Python/matplotlab/gallery/widgets/polygon_selector_demo.md new file mode 100644 index 00000000..bd502a8c --- /dev/null +++ b/Python/matplotlab/gallery/widgets/polygon_selector_demo.md @@ -0,0 +1,111 @@ +# 多边形选择器演示 + +显示如何以交互方式选择多边形的索引。 + +![多边形选择器演示](https://matplotlib.org/_images/sphx_glr_polygon_selector_demo_001.png) + +输出: + +```python +Select points in the figure by enclosing them within a polygon. +Press the 'esc' key to start a new polygon. +Try holding the 'shift' key to move all of the vertices. +Try holding the 'ctrl' key to move a single vertex. + +Selected points: +[] +``` + +```python +import numpy as np + +from matplotlib.widgets import PolygonSelector +from matplotlib.path import Path + + +class SelectFromCollection(object): + """Select indices from a matplotlib collection using `PolygonSelector`. + + Selected indices are saved in the `ind` attribute. This tool fades out the + points that are not part of the selection (i.e., reduces their alpha + values). If your collection has alpha < 1, this tool will permanently + alter the alpha values. + + Note that this tool selects collection objects based on their *origins* + (i.e., `offsets`). + + Parameters + ---------- + ax : :class:`~matplotlib.axes.Axes` + Axes to interact with. + + collection : :class:`matplotlib.collections.Collection` subclass + Collection you want to select from. + + alpha_other : 0 <= float <= 1 + To highlight a selection, this tool sets all selected points to an + alpha value of 1 and non-selected points to `alpha_other`. + """ + + def __init__(self, ax, collection, alpha_other=0.3): + self.canvas = ax.figure.canvas + self.collection = collection + self.alpha_other = alpha_other + + self.xys = collection.get_offsets() + self.Npts = len(self.xys) + + # Ensure that we have separate colors for each object + self.fc = collection.get_facecolors() + if len(self.fc) == 0: + raise ValueError('Collection must have a facecolor') + elif len(self.fc) == 1: + self.fc = np.tile(self.fc, (self.Npts, 1)) + + self.poly = PolygonSelector(ax, self.onselect) + self.ind = [] + + def onselect(self, verts): + path = Path(verts) + self.ind = np.nonzero(path.contains_points(self.xys))[0] + self.fc[:, -1] = self.alpha_other + self.fc[self.ind, -1] = 1 + self.collection.set_facecolors(self.fc) + self.canvas.draw_idle() + + def disconnect(self): + self.poly.disconnect_events() + self.fc[:, -1] = 1 + self.collection.set_facecolors(self.fc) + self.canvas.draw_idle() + + +if __name__ == '__main__': + import matplotlib.pyplot as plt + + fig, ax = plt.subplots() + grid_size = 5 + grid_x = np.tile(np.arange(grid_size), grid_size) + grid_y = np.repeat(np.arange(grid_size), grid_size) + pts = ax.scatter(grid_x, grid_y) + + selector = SelectFromCollection(ax, pts) + + print("Select points in the figure by enclosing them within a polygon.") + print("Press the 'esc' key to start a new polygon.") + print("Try holding the 'shift' key to move all of the vertices.") + print("Try holding the 'ctrl' key to move a single vertex.") + + plt.show() + + selector.disconnect() + + # After figure is closed print the coordinates of the selected points + print('\nSelected points:') + print(selector.xys[selector.ind]) +``` + +## 下载这个示例 + +- [下载python源码: polygon_selector_demo.py](https://matplotlib.org/_downloads/polygon_selector_demo.py) +- [下载Jupyter notebook: polygon_selector_demo.ipynb](https://matplotlib.org/_downloads/polygon_selector_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/radio_buttons.md b/Python/matplotlab/gallery/widgets/radio_buttons.md new file mode 100644 index 00000000..b65f99bf --- /dev/null +++ b/Python/matplotlab/gallery/widgets/radio_buttons.md @@ -0,0 +1,59 @@ +# 单选按钮 + +使用单选按钮选择绘图的属性。 + +单选按钮允许您在可视化中选择多个选项。在这种情况下,按钮允许用户选择要在图中显示的三种不同正弦波中的一种。 + +![单选按钮示例](https://matplotlib.org/_images/sphx_glr_radio_buttons_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.widgets import RadioButtons + +t = np.arange(0.0, 2.0, 0.01) +s0 = np.sin(2*np.pi*t) +s1 = np.sin(4*np.pi*t) +s2 = np.sin(8*np.pi*t) + +fig, ax = plt.subplots() +l, = ax.plot(t, s0, lw=2, color='red') +plt.subplots_adjust(left=0.3) + +axcolor = 'lightgoldenrodyellow' +rax = plt.axes([0.05, 0.7, 0.15, 0.15], facecolor=axcolor) +radio = RadioButtons(rax, ('2 Hz', '4 Hz', '8 Hz')) + + +def hzfunc(label): + hzdict = {'2 Hz': s0, '4 Hz': s1, '8 Hz': s2} + ydata = hzdict[label] + l.set_ydata(ydata) + plt.draw() +radio.on_clicked(hzfunc) + +rax = plt.axes([0.05, 0.4, 0.15, 0.15], facecolor=axcolor) +radio2 = RadioButtons(rax, ('red', 'blue', 'green')) + + +def colorfunc(label): + l.set_color(label) + plt.draw() +radio2.on_clicked(colorfunc) + +rax = plt.axes([0.05, 0.1, 0.15, 0.15], facecolor=axcolor) +radio3 = RadioButtons(rax, ('-', '--', '-.', 'steps', ':')) + + +def stylefunc(label): + l.set_linestyle(label) + plt.draw() +radio3.on_clicked(stylefunc) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: radio_buttons.py](https://matplotlib.org/_downloads/radio_buttons.py) +- [下载Jupyter notebook: radio_buttons.ipynb](https://matplotlib.org/_downloads/radio_buttons.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/rectangle_selector.md b/Python/matplotlab/gallery/widgets/rectangle_selector.md new file mode 100644 index 00000000..0c2f5406 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/rectangle_selector.md @@ -0,0 +1,61 @@ +# 矩形选择器 + +在某处鼠标点击,将鼠标移动到某个目的地,然后释放按钮。 此类提供单击和释放事件,并从点击点到实际鼠标位置(在相同轴内)绘制一条线或一个框,直到释放按钮。 在方法'self.ignore()'中,检查来自eventpress和eventrelease的按钮是否相同。 + +![矩形选择器示例](https://matplotlib.org/_images/sphx_glr_rectangle_selector_001.png) + +输出: + +```python +click --> release +``` + +```python +from matplotlib.widgets import RectangleSelector +import numpy as np +import matplotlib.pyplot as plt + + +def line_select_callback(eclick, erelease): + 'eclick and erelease are the press and release events' + x1, y1 = eclick.xdata, eclick.ydata + x2, y2 = erelease.xdata, erelease.ydata + print("(%3.2f, %3.2f) --> (%3.2f, %3.2f)" % (x1, y1, x2, y2)) + print(" The button you used were: %s %s" % (eclick.button, erelease.button)) + + +def toggle_selector(event): + print(' Key pressed.') + if event.key in ['Q', 'q'] and toggle_selector.RS.active: + print(' RectangleSelector deactivated.') + toggle_selector.RS.set_active(False) + if event.key in ['A', 'a'] and not toggle_selector.RS.active: + print(' RectangleSelector activated.') + toggle_selector.RS.set_active(True) + + +fig, current_ax = plt.subplots() # make a new plotting range +N = 100000 # If N is large one can see +x = np.linspace(0.0, 10.0, N) # improvement by use blitting! + +plt.plot(x, +np.sin(.2*np.pi*x), lw=3.5, c='b', alpha=.7) # plot something +plt.plot(x, +np.cos(.2*np.pi*x), lw=3.5, c='r', alpha=.5) +plt.plot(x, -np.sin(.2*np.pi*x), lw=3.5, c='g', alpha=.3) + +print("\n click --> release") + +# drawtype is 'box' or 'line' or 'none' +toggle_selector.RS = RectangleSelector(current_ax, line_select_callback, + drawtype='box', useblit=True, + button=[1, 3], # don't use middle button + minspanx=5, minspany=5, + spancoords='pixels', + interactive=True) +plt.connect('key_press_event', toggle_selector) +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: rectangle_selector.py](https://matplotlib.org/_downloads/rectangle_selector.py) +- [下载Jupyter notebook: rectangle_selector.ipynb](https://matplotlib.org/_downloads/rectangle_selector.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/slider_demo.md b/Python/matplotlab/gallery/widgets/slider_demo.md new file mode 100644 index 00000000..a53680f4 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/slider_demo.md @@ -0,0 +1,64 @@ +# 滑块演示 + +使用滑块小部件来控制绘图的可视属性。 + +在此示例中,滑块用于选择正弦波的频率。 您可以通过这种方式控制绘图的许多连续变化属性。 + +![滑块演示](https://matplotlib.org/_images/sphx_glr_slider_demo_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.widgets import Slider, Button, RadioButtons + +fig, ax = plt.subplots() +plt.subplots_adjust(left=0.25, bottom=0.25) +t = np.arange(0.0, 1.0, 0.001) +a0 = 5 +f0 = 3 +delta_f = 5.0 +s = a0*np.sin(2*np.pi*f0*t) +l, = plt.plot(t, s, lw=2, color='red') +plt.axis([0, 1, -10, 10]) + +axcolor = 'lightgoldenrodyellow' +axfreq = plt.axes([0.25, 0.1, 0.65, 0.03], facecolor=axcolor) +axamp = plt.axes([0.25, 0.15, 0.65, 0.03], facecolor=axcolor) + +sfreq = Slider(axfreq, 'Freq', 0.1, 30.0, valinit=f0, valstep=delta_f) +samp = Slider(axamp, 'Amp', 0.1, 10.0, valinit=a0) + + +def update(val): + amp = samp.val + freq = sfreq.val + l.set_ydata(amp*np.sin(2*np.pi*freq*t)) + fig.canvas.draw_idle() +sfreq.on_changed(update) +samp.on_changed(update) + +resetax = plt.axes([0.8, 0.025, 0.1, 0.04]) +button = Button(resetax, 'Reset', color=axcolor, hovercolor='0.975') + + +def reset(event): + sfreq.reset() + samp.reset() +button.on_clicked(reset) + +rax = plt.axes([0.025, 0.5, 0.15, 0.15], facecolor=axcolor) +radio = RadioButtons(rax, ('red', 'blue', 'green'), active=0) + + +def colorfunc(label): + l.set_color(label) + fig.canvas.draw_idle() +radio.on_clicked(colorfunc) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: slider_demo.py](https://matplotlib.org/_downloads/slider_demo.py) +- [下载Jupyter notebook: slider_demo.ipynb](https://matplotlib.org/_downloads/slider_demo.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/span_selector.md b/Python/matplotlab/gallery/widgets/span_selector.md new file mode 100644 index 00000000..eab6d029 --- /dev/null +++ b/Python/matplotlab/gallery/widgets/span_selector.md @@ -0,0 +1,51 @@ +# 跨度选择器 + +SpanSelector是一个鼠标小部件,用于选择xmin / xmax范围并绘制下轴中所选区域的详细视图 + +![跨度选择器示例](https://matplotlib.org/_images/sphx_glr_span_selector_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.widgets import SpanSelector + +# Fixing random state for reproducibility +np.random.seed(19680801) + +fig, (ax1, ax2) = plt.subplots(2, figsize=(8, 6)) +ax1.set(facecolor='#FFFFCC') + +x = np.arange(0.0, 5.0, 0.01) +y = np.sin(2*np.pi*x) + 0.5*np.random.randn(len(x)) + +ax1.plot(x, y, '-') +ax1.set_ylim(-2, 2) +ax1.set_title('Press left mouse button and drag to test') + +ax2.set(facecolor='#FFFFCC') +line2, = ax2.plot(x, y, '-') + + +def onselect(xmin, xmax): + indmin, indmax = np.searchsorted(x, (xmin, xmax)) + indmax = min(len(x) - 1, indmax) + + thisx = x[indmin:indmax] + thisy = y[indmin:indmax] + line2.set_data(thisx, thisy) + ax2.set_xlim(thisx[0], thisx[-1]) + ax2.set_ylim(thisy.min(), thisy.max()) + fig.canvas.draw() + +# Set useblit=True on most backends for enhanced performance. +span = SpanSelector(ax1, onselect, 'horizontal', useblit=True, + rectprops=dict(alpha=0.5, facecolor='red')) + + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: span_selector.py](https://matplotlib.org/_downloads/span_selector.py) +- [下载Jupyter notebook: span_selector.ipynb](https://matplotlib.org/_downloads/span_selector.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/gallery/widgets/textbox.md b/Python/matplotlab/gallery/widgets/textbox.md new file mode 100644 index 00000000..832aeffd --- /dev/null +++ b/Python/matplotlab/gallery/widgets/textbox.md @@ -0,0 +1,37 @@ +# 文本框 + +允许使用Textbox小部件输入文本。 + +您可以使用“文本框”小部件让用户提供需要显示的任何文本,包括公式。 您可以使用提交按钮创建具有给定输入的绘图。 + +![文本框示例](https://matplotlib.org/_images/sphx_glr_textbox_001.png) + +```python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.widgets import TextBox +fig, ax = plt.subplots() +plt.subplots_adjust(bottom=0.2) +t = np.arange(-2.0, 2.0, 0.001) +s = t ** 2 +initial_text = "t ** 2" +l, = plt.plot(t, s, lw=2) + + +def submit(text): + ydata = eval(text) + l.set_ydata(ydata) + ax.set_ylim(np.min(ydata), np.max(ydata)) + plt.draw() + +axbox = plt.axes([0.1, 0.05, 0.8, 0.075]) +text_box = TextBox(axbox, 'Evaluate', initial=initial_text) +text_box.on_submit(submit) + +plt.show() +``` + +## 下载这个示例 + +- [下载python源码: textbox.py](https://matplotlib.org/_downloads/textbox.py) +- [下载Jupyter notebook: textbox.ipynb](https://matplotlib.org/_downloads/textbox.ipynb) \ No newline at end of file diff --git a/Python/matplotlab/intermediate/artists.md b/Python/matplotlab/intermediate/artists.md new file mode 100644 index 00000000..1aba501a --- /dev/null +++ b/Python/matplotlab/intermediate/artists.md @@ -0,0 +1,830 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Artist tutorial + +Using Artist objects to render on the canvas. + +There are three layers to the matplotlib API. + +- the ``matplotlib.backend_bases.FigureCanvas`` is the area onto which +the figure is drawn +- the ``matplotlib.backend_bases.Renderer`` is +the object which knows how to draw on the +``FigureCanvas`` +- and the [``matplotlib.artist.Artist``](https://matplotlib.orgapi/artist_api.html#matplotlib.artist.Artist) is the object that knows how to use +a renderer to paint onto the canvas. + +The ``FigureCanvas`` and +``Renderer`` handle all the details of +talking to user interface toolkits like [wxPython](https://www.wxpython.org) or drawing languages like PostScript®, and +the ``Artist`` handles all the high level constructs like representing +and laying out the figure, text, and lines. The typical user will +spend 95% of their time working with the ``Artists``. + +There are two types of ``Artists``: primitives and containers. The primitives +represent the standard graphical objects we want to paint onto our canvas: +[``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D), [``Rectangle``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle), +[``Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text), [``AxesImage``](https://matplotlib.orgapi/image_api.html#matplotlib.image.AxesImage), etc., and +the containers are places to put them ([``Axis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Axis), +[``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) and [``Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure)). The +standard use is to create a [``Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) instance, use +the ``Figure`` to create one or more [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) or +``Subplot`` instances, and use the ``Axes`` instance +helper methods to create the primitives. In the example below, we create a +``Figure`` instance using [``matplotlib.pyplot.figure()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure), which is a +convenience method for instantiating ``Figure`` instances and connecting them +with your user interface or drawing toolkit ``FigureCanvas``. As we will +discuss below, this is not necessary -- you can work directly with PostScript, +PDF Gtk+, or wxPython ``FigureCanvas`` instances, instantiate your ``Figures`` +directly and connect them yourselves -- but since we are focusing here on the +``Artist`` API we'll let [``pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) handle some of those details +for us: + +``` python +import matplotlib.pyplot as plt +fig = plt.figure() +ax = fig.add_subplot(2, 1, 1) # two rows, one column, first plot +``` + +The [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) is probably the most important +class in the matplotlib API, and the one you will be working with most +of the time. This is because the ``Axes`` is the plotting area into +which most of the objects go, and the ``Axes`` has many special helper +methods ([``plot()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot), +[``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.text.html#matplotlib.axes.Axes.text), +[``hist()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.hist.html#matplotlib.axes.Axes.hist), +[``imshow()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow)) to create the most common +graphics primitives ([``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D), +[``Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text), +[``Rectangle``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle), +``Image``, respectively). These helper methods +will take your data (e.g., ``numpy`` arrays and strings) and create +primitive ``Artist`` instances as needed (e.g., ``Line2D``), add them to +the relevant containers, and draw them when requested. Most of you +are probably familiar with the ``Subplot``, +which is just a special case of an ``Axes`` that lives on a regular +rows by columns grid of ``Subplot`` instances. If you want to create +an ``Axes`` at an arbitrary location, simply use the +[``add_axes()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_axes) method which takes a list +of ``[left, bottom, width, height]`` values in 0-1 relative figure +coordinates: + +``` python +fig2 = plt.figure() +ax2 = fig2.add_axes([0.15, 0.1, 0.7, 0.3]) +``` + +Continuing with our example: + +``` python +import numpy as np +t = np.arange(0.0, 1.0, 0.01) +s = np.sin(2*np.pi*t) +line, = ax.plot(t, s, color='blue', lw=2) +``` + +In this example, ``ax`` is the ``Axes`` instance created by the +``fig.add_subplot`` call above (remember ``Subplot`` is just a +subclass of ``Axes``) and when you call ``ax.plot``, it creates a +``Line2D`` instance and adds it to the ``Axes.lines`` list. In the interactive [ipython](http://ipython.org/) session below, you can see that the +``Axes.lines`` list is length one and contains the same line that was +returned by the ``line, = ax.plot...`` call: + +``` python +In [101]: ax.lines[0] +Out[101]: + +In [102]: line +Out[102]: +``` + +If you make subsequent calls to ``ax.plot`` (and the hold state is "on" +which is the default) then additional lines will be added to the list. +You can remove lines later simply by calling the list methods; either +of these will work: + +``` python +del ax.lines[0] +ax.lines.remove(line) # one or the other, not both! +``` + +The Axes also has helper methods to configure and decorate the x-axis +and y-axis tick, tick labels and axis labels: + +``` python +xtext = ax.set_xlabel('my xdata') # returns a Text instance +ytext = ax.set_ylabel('my ydata') +``` + +When you call [``ax.set_xlabel``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_xlabel.html#matplotlib.axes.Axes.set_xlabel), +it passes the information on the [``Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text) +instance of the [``XAxis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.XAxis). Each ``Axes`` +instance contains an [``XAxis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.XAxis) and a +[``YAxis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.YAxis) instance, which handle the layout and +drawing of the ticks, tick labels and axis labels. + +Try creating the figure below. + +``` python +import numpy as np +import matplotlib.pyplot as plt + +fig = plt.figure() +fig.subplots_adjust(top=0.8) +ax1 = fig.add_subplot(211) +ax1.set_ylabel('volts') +ax1.set_title('a sine wave') + +t = np.arange(0.0, 1.0, 0.01) +s = np.sin(2*np.pi*t) +line, = ax1.plot(t, s, color='blue', lw=2) + +# Fixing random state for reproducibility +np.random.seed(19680801) + +ax2 = fig.add_axes([0.15, 0.1, 0.7, 0.3]) +n, bins, patches = ax2.hist(np.random.randn(1000), 50, + facecolor='yellow', edgecolor='yellow') +ax2.set_xlabel('time (s)') + +plt.show() +``` + +![sphx_glr_artists_001](https://matplotlib.org/_images/sphx_glr_artists_001.png) + +## Customizing your objects + +Every element in the figure is represented by a matplotlib +[``Artist``](https://matplotlib.orgapi/artist_api.html#matplotlib.artist.Artist), and each has an extensive list of +properties to configure its appearance. The figure itself contains a +[``Rectangle``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle) exactly the size of the figure, +which you can use to set the background color and transparency of the +figures. Likewise, each [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) bounding box +(the standard white box with black edges in the typical matplotlib +plot, has a ``Rectangle`` instance that determines the color, +transparency, and other properties of the Axes. These instances are +stored as member variables ``Figure.patch`` and ``Axes.patch`` ("Patch" is a name inherited from +MATLAB, and is a 2D "patch" of color on the figure, e.g., rectangles, +circles and polygons). Every matplotlib ``Artist`` has the following +properties + + +--- + + + + + +Property +Description + + + +alpha +The transparency - a scalar from 0-1 + +animated +A boolean that is used to facilitate animated drawing + +axes +The axes that the Artist lives in, possibly None + +clip_box +The bounding box that clips the Artist + +clip_on +Whether clipping is enabled + +clip_path +The path the artist is clipped to + +contains +A picking function to test whether the artist contains the pick point + +figure +The figure instance the artist lives in, possibly None + +label +A text label (e.g., for auto-labeling) + +picker +A python object that controls object picking + +transform +The transformation + +visible +A boolean whether the artist should be drawn + +zorder +A number which determines the drawing order + +rasterized +Boolean; Turns vectors into raster graphics (for compression & eps transparency) + + + + +Each of the properties is accessed with an old-fashioned setter or +getter (yes we know this irritates Pythonistas and we plan to support +direct access via properties or traits but it hasn't been done yet). +For example, to multiply the current alpha by a half: + +``` python +a = o.get_alpha() +o.set_alpha(0.5*a) +``` + +If you want to set a number of properties at once, you can also use +the ``set`` method with keyword arguments. For example: + +``` python +o.set(alpha=0.5, zorder=2) +``` + +If you are working interactively at the python shell, a handy way to +inspect the ``Artist`` properties is to use the +[``matplotlib.artist.getp()``](https://matplotlib.orgapi/_as_gen/matplotlib.artist.getp.html#matplotlib.artist.getp) function (simply +``getp()`` in pyplot), which lists the properties +and their values. This works for classes derived from ``Artist`` as +well, e.g., ``Figure`` and ``Rectangle``. Here are the ``Figure`` rectangle +properties mentioned above: + +``` python +In [149]: matplotlib.artist.getp(fig.patch) + alpha = 1.0 + animated = False + antialiased or aa = True + axes = None + clip_box = None + clip_on = False + clip_path = None + contains = None + edgecolor or ec = w + facecolor or fc = 0.75 + figure = Figure(8.125x6.125) + fill = 1 + hatch = None + height = 1 + label = + linewidth or lw = 1.0 + picker = None + transform = + verts = ((0, 0), (0, 1), (1, 1), (1, 0)) + visible = True + width = 1 + window_extent = + x = 0 + y = 0 + zorder = 1 +``` + +The docstrings for all of the classes also contain the ``Artist`` +properties, so you can consult the interactive "help" or the +[matplotlib.artist](https://matplotlib.orgapi/artist_api.html#artist-api) for a listing of properties for a given object. + +## Object containers + +Now that we know how to inspect and set the properties of a given +object we want to configure, we need to know how to get at that object. +As mentioned in the introduction, there are two kinds of objects: +primitives and containers. The primitives are usually the things you +want to configure (the font of a [``Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text) +instance, the width of a [``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D)) although +the containers also have some properties as well -- for example the +[``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) [``Artist``](https://matplotlib.orgapi/artist_api.html#matplotlib.artist.Artist) is a +container that contains many of the primitives in your plot, but it +also has properties like the ``xscale`` to control whether the xaxis +is 'linear' or 'log'. In this section we'll review where the various +container objects store the ``Artists`` that you want to get at. + +### Figure container + +The top level container ``Artist`` is the +[``matplotlib.figure.Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure), and it contains everything in the +figure. The background of the figure is a +[``Rectangle``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle) which is stored in +``Figure.patch``. As +you add subplots ([``add_subplot()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_subplot)) and +axes ([``add_axes()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_axes)) to the figure +these will be appended to the [``Figure.axes``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.axes). These are also returned by the +methods that create them: + +``` python +In [156]: fig = plt.figure() + +In [157]: ax1 = fig.add_subplot(211) + +In [158]: ax2 = fig.add_axes([0.1, 0.1, 0.7, 0.3]) + +In [159]: ax1 +Out[159]: + +In [160]: print(fig.axes) +[, ] +``` + +Because the figure maintains the concept of the "current axes" (see +[``Figure.gca``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.gca) and +[``Figure.sca``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.sca)) to support the +pylab/pyplot state machine, you should not insert or remove axes +directly from the axes list, but rather use the +[``add_subplot()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_subplot) and +[``add_axes()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_axes) methods to insert, and the +[``delaxes()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.delaxes) method to delete. You are +free however, to iterate over the list of axes or index into it to get +access to ``Axes`` instances you want to customize. Here is an +example which turns all the axes grids on: + +``` python +for ax in fig.axes: + ax.grid(True) +``` + +The figure also has its own text, lines, patches and images, which you +can use to add primitives directly. The default coordinate system for +the ``Figure`` will simply be in pixels (which is not usually what you +want) but you can control this by setting the transform property of +the ``Artist`` you are adding to the figure. + +More useful is "figure coordinates" where (0, 0) is the bottom-left of +the figure and (1, 1) is the top-right of the figure which you can +obtain by setting the ``Artist`` transform to ``fig.transFigure``: + +``` python +import matplotlib.lines as lines + +fig = plt.figure() + +l1 = lines.Line2D([0, 1], [0, 1], transform=fig.transFigure, figure=fig) +l2 = lines.Line2D([0, 1], [1, 0], transform=fig.transFigure, figure=fig) +fig.lines.extend([l1, l2]) + +plt.show() +``` + +![sphx_glr_artists_002](https://matplotlib.org/_images/sphx_glr_artists_002.png) + +Here is a summary of the Artists the figure contains + + +--- + + + + + +Figure attribute +Description + + + +axes +A list of Axes instances (includes Subplot) + +patch +The Rectangle background + +images +A list of FigureImages patches - useful for raw pixel display + +legends +A list of Figure Legend instances (different from Axes.legends) + +lines +A list of Figure Line2D instances (rarely used, see Axes.lines) + +patches +A list of Figure patches (rarely used, see Axes.patches) + +texts +A list Figure Text instances + + + + +### Axes container + +The [``matplotlib.axes.Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) is the center of the matplotlib +universe -- it contains the vast majority of all the ``Artists`` used +in a figure with many helper methods to create and add these +``Artists`` to itself, as well as helper methods to access and +customize the ``Artists`` it contains. Like the +[``Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure), it contains a +[``Patch``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch) +``patch`` which is a +[``Rectangle``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle) for Cartesian coordinates and a +[``Circle``](https://matplotlib.orgapi/_as_gen/matplotlib.patches.Circle.html#matplotlib.patches.Circle) for polar coordinates; this patch +determines the shape, background and border of the plotting region: + +``` python +ax = fig.add_subplot(111) +rect = ax.patch # a Rectangle instance +rect.set_facecolor('green') +``` + +When you call a plotting method, e.g., the canonical +[``plot()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot) and pass in arrays or lists of +values, the method will create a [``matplotlib.lines.Line2D()``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D) +instance, update the line with all the ``Line2D`` properties passed as +keyword arguments, add the line to the ``Axes.lines`` container, and returns it to you: + +``` python +In [213]: x, y = np.random.rand(2, 100) + +In [214]: line, = ax.plot(x, y, '-', color='blue', linewidth=2) +``` + +``plot`` returns a list of lines because you can pass in multiple x, y +pairs to plot, and we are unpacking the first element of the length +one list into the line variable. The line has been added to the +``Axes.lines`` list: + +``` python +In [229]: print(ax.lines) +[] +``` + +Similarly, methods that create patches, like +[``bar()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.bar.html#matplotlib.axes.Axes.bar) creates a list of rectangles, will +add the patches to the ``Axes.patches`` list: + +``` python +In [233]: n, bins, rectangles = ax.hist(np.random.randn(1000), 50, facecolor='yellow') + +In [234]: rectangles +Out[234]: + +In [235]: print(len(ax.patches)) +``` + +You should not add objects directly to the ``Axes.lines`` or +``Axes.patches`` lists unless you know exactly what you are doing, +because the ``Axes`` needs to do a few things when it creates and adds +an object. It sets the figure and axes property of the ``Artist``, as +well as the default ``Axes`` transformation (unless a transformation +is set). It also inspects the data contained in the ``Artist`` to +update the data structures controlling auto-scaling, so that the view +limits can be adjusted to contain the plotted data. You can, +nonetheless, create objects yourself and add them directly to the +``Axes`` using helper methods like +[``add_line()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.add_line.html#matplotlib.axes.Axes.add_line) and +[``add_patch()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.add_patch.html#matplotlib.axes.Axes.add_patch). Here is an annotated +interactive session illustrating what is going on: + +``` python +In [262]: fig, ax = plt.subplots() + +# create a rectangle instance +In [263]: rect = matplotlib.patches.Rectangle( (1,1), width=5, height=12) + +# by default the axes instance is None +In [264]: print(rect.get_axes()) +None + +# and the transformation instance is set to the "identity transform" +In [265]: print(rect.get_transform()) + + +# now we add the Rectangle to the Axes +In [266]: ax.add_patch(rect) + +# and notice that the ax.add_patch method has set the axes +# instance +In [267]: print(rect.get_axes()) +Axes(0.125,0.1;0.775x0.8) + +# and the transformation has been set too +In [268]: print(rect.get_transform()) + + +# the default axes transformation is ax.transData +In [269]: print(ax.transData) + + +# notice that the xlimits of the Axes have not been changed +In [270]: print(ax.get_xlim()) +(0.0, 1.0) + +# but the data limits have been updated to encompass the rectangle +In [271]: print(ax.dataLim.bounds) +(1.0, 1.0, 5.0, 12.0) + +# we can manually invoke the auto-scaling machinery +In [272]: ax.autoscale_view() + +# and now the xlim are updated to encompass the rectangle +In [273]: print(ax.get_xlim()) +(1.0, 6.0) + +# we have to manually force a figure draw +In [274]: ax.figure.canvas.draw() +``` + +There are many, many ``Axes`` helper methods for creating primitive +``Artists`` and adding them to their respective containers. The table +below summarizes a small sampling of them, the kinds of ``Artist`` they +create, and where they store them + + +--- + + + + + + +Helper method +Artist +Container + + + +ax.annotate - text annotations +Annotate +ax.texts + +ax.bar - bar charts +Rectangle +ax.patches + +ax.errorbar - error bar plots +Line2D and Rectangle +ax.lines and ax.patches + +ax.fill - shared area +Polygon +ax.patches + +ax.hist - histograms +Rectangle +ax.patches + +ax.imshow - image data +AxesImage +ax.images + +ax.legend - axes legends +Legend +ax.legends + +ax.plot - xy plots +Line2D +ax.lines + +ax.scatter - scatter charts +PolygonCollection +ax.collections + +ax.text - text +Text +ax.texts + + + + +In addition to all of these ``Artists``, the ``Axes`` contains two +important ``Artist`` containers: the [``XAxis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.XAxis) +and [``YAxis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.YAxis), which handle the drawing of the +ticks and labels. These are stored as instance variables +``xaxis`` and +``yaxis``. The ``XAxis`` and ``YAxis`` +containers will be detailed below, but note that the ``Axes`` contains +many helper methods which forward calls on to the +[``Axis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Axis) instances so you often do not need to +work with them directly unless you want to. For example, you can set +the font color of the ``XAxis`` ticklabels using the ``Axes`` helper +method: + +``` python +for label in ax.get_xticklabels(): + label.set_color('orange') +``` + +Below is a summary of the Artists that the Axes contains + + +--- + + + + + +Axes attribute +Description + + + +artists +A list of Artist instances + +patch +Rectangle instance for Axes background + +collections +A list of Collection instances + +images +A list of AxesImage + +legends +A list of Legend instances + +lines +A list of Line2D instances + +patches +A list of Patch instances + +texts +A list of Text instances + +xaxis +matplotlib.axis.XAxis instance + +yaxis +matplotlib.axis.YAxis instance + + + + +### Axis containers + +The [``matplotlib.axis.Axis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Axis) instances handle the drawing of the +tick lines, the grid lines, the tick labels and the axis label. You +can configure the left and right ticks separately for the y-axis, and +the upper and lower ticks separately for the x-axis. The ``Axis`` +also stores the data and view intervals used in auto-scaling, panning +and zooming, as well as the [``Locator``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.Locator) and +[``Formatter``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.Formatter) instances which control where +the ticks are placed and how they are represented as strings. + +Each ``Axis`` object contains a ``label`` attribute +(this is what [``pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) modifies in calls to +[``xlabel()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.xlabel.html#matplotlib.pyplot.xlabel) and [``ylabel()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.ylabel.html#matplotlib.pyplot.ylabel)) as +well as a list of major and minor ticks. The ticks are +[``XTick``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.XTick) and [``YTick``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.YTick) instances, +which contain the actual line and text primitives that render the ticks and +ticklabels. Because the ticks are dynamically created as needed (e.g., when +panning and zooming), you should access the lists of major and minor ticks +through their accessor methods [``get_major_ticks()``](https://matplotlib.orgapi/_as_gen/matplotlib.axis.Axis.get_major_ticks.html#matplotlib.axis.Axis.get_major_ticks) +and [``get_minor_ticks()``](https://matplotlib.orgapi/_as_gen/matplotlib.axis.Axis.get_minor_ticks.html#matplotlib.axis.Axis.get_minor_ticks). Although the ticks contain +all the primitives and will be covered below, ``Axis`` instances have accessor +methods that return the tick lines, tick labels, tick locations etc.: + +``` python +fig, ax = plt.subplots() +axis = ax.xaxis +axis.get_ticklocs() +``` + +![sphx_glr_artists_003](https://matplotlib.org/_images/sphx_glr_artists_003.png) + +``` python +axis.get_ticklabels() +``` + +note there are twice as many ticklines as labels because by + +``` python +axis.get_ticklines() +``` + +by default you get the major ticks back + +``` python +axis.get_ticklines() +``` + +but you can also ask for the minor ticks + +``` python +axis.get_ticklines(minor=True) + +# Here is a summary of some of the useful accessor methods of the ``Axis`` +# (these have corresponding setters where useful, such as +# set_major_formatter) +# +# ====================== ========================================================= +# Accessor method Description +# ====================== ========================================================= +# get_scale The scale of the axis, e.g., 'log' or 'linear' +# get_view_interval The interval instance of the axis view limits +# get_data_interval The interval instance of the axis data limits +# get_gridlines A list of grid lines for the Axis +# get_label The axis label - a Text instance +# get_ticklabels A list of Text instances - keyword minor=True|False +# get_ticklines A list of Line2D instances - keyword minor=True|False +# get_ticklocs A list of Tick locations - keyword minor=True|False +# get_major_locator The matplotlib.ticker.Locator instance for major ticks +# get_major_formatter The matplotlib.ticker.Formatter instance for major ticks +# get_minor_locator The matplotlib.ticker.Locator instance for minor ticks +# get_minor_formatter The matplotlib.ticker.Formatter instance for minor ticks +# get_major_ticks A list of Tick instances for major ticks +# get_minor_ticks A list of Tick instances for minor ticks +# grid Turn the grid on or off for the major or minor ticks +# ====================== ========================================================= +# +# Here is an example, not recommended for its beauty, which customizes +# the axes and tick properties + +# plt.figure creates a matplotlib.figure.Figure instance +fig = plt.figure() +rect = fig.patch # a rectangle instance +rect.set_facecolor('lightgoldenrodyellow') + +ax1 = fig.add_axes([0.1, 0.3, 0.4, 0.4]) +rect = ax1.patch +rect.set_facecolor('lightslategray') + + +for label in ax1.xaxis.get_ticklabels(): + # label is a Text instance + label.set_color('red') + label.set_rotation(45) + label.set_fontsize(16) + +for line in ax1.yaxis.get_ticklines(): + # line is a Line2D instance + line.set_color('green') + line.set_markersize(25) + line.set_markeredgewidth(3) + +plt.show() +``` + +![sphx_glr_artists_004](https://matplotlib.org/_images/sphx_glr_artists_004.png) + +### Tick containers + +The [``matplotlib.axis.Tick``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Tick) is the final container object in our +descent from the [``Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) to the +[``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) to the [``Axis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Axis) +to the [``Tick``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Tick). The ``Tick`` contains the tick +and grid line instances, as well as the label instances for the upper +and lower ticks. Each of these is accessible directly as an attribute +of the ``Tick``. + + +--- + + + + + +Tick attribute +Description + + + +tick1line +Line2D instance + +tick2line +Line2D instance + +gridline +Line2D instance + +label1 +Text instance + +label2 +Text instance + + + + +Here is an example which sets the formatter for the right side ticks with +dollar signs and colors them green on the right side of the yaxis + +``` python +import matplotlib.ticker as ticker + +# Fixing random state for reproducibility +np.random.seed(19680801) + +fig, ax = plt.subplots() +ax.plot(100*np.random.rand(20)) + +formatter = ticker.FormatStrFormatter('$%1.2f') +ax.yaxis.set_major_formatter(formatter) + +for tick in ax.yaxis.get_major_ticks(): + tick.label1.set_visible(False) + tick.label2.set_visible(True) + tick.label2.set_color('green') + +plt.show() +``` + +![sphx_glr_artists_005](https://matplotlib.org/_images/sphx_glr_artists_005.png) + +## Download + +- [Download Python source code: artists.py](https://matplotlib.org/_downloads/a7b58a13e5ee2b59b31d49c2baa9f139/artists.py) +- [Download Jupyter notebook: artists.ipynb](https://matplotlib.org/_downloads/c328d03dcd3b9dae9a8b3f008c82073b/artists.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/intermediate/color_cycle.md b/Python/matplotlab/intermediate/color_cycle.md new file mode 100644 index 00000000..cc44d6a4 --- /dev/null +++ b/Python/matplotlab/intermediate/color_cycle.md @@ -0,0 +1,148 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Styling with cycler + +Demo of custom property-cycle settings to control colors and other style +properties for multi-line plots. + +::: tip Note + +More complete documentation of the ``cycler`` API can be found +[here](http://matplotlib.org/cycler/). + +::: + +This example demonstrates two different APIs: + +1. Setting the default rc parameter specifying the property cycle. +This affects all subsequent axes (but not axes already created). +1. Setting the property cycle for a single pair of axes. + +``` python +from cycler import cycler +import numpy as np +import matplotlib.pyplot as plt +``` + +First we'll generate some sample data, in this case, four offset sine +curves. + +``` python +x = np.linspace(0, 2 * np.pi, 50) +offsets = np.linspace(0, 2 * np.pi, 4, endpoint=False) +yy = np.transpose([np.sin(x + phi) for phi in offsets]) +``` + +Now ``yy`` has shape + +``` python +print(yy.shape) +``` + +Out: + +``` +(50, 4) +``` + +So ``yy[:, i]`` will give you the ``i``-th offset sine curve. Let's set the +default ``prop_cycle`` using [``matplotlib.pyplot.rc()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.rc.html#matplotlib.pyplot.rc). We'll combine a +color cycler and a linestyle cycler by adding (``+``) two ``cycler``'s +together. See the bottom of this tutorial for more information about +combining different cyclers. + +``` python +default_cycler = (cycler(color=['r', 'g', 'b', 'y']) + + cycler(linestyle=['-', '--', ':', '-.'])) + +plt.rc('lines', linewidth=4) +plt.rc('axes', prop_cycle=default_cycler) +``` + +Now we'll generate a figure with two axes, one on top of the other. On the +first axis, we'll plot with the default cycler. On the second axis, we'll +set the ``prop_cycle`` using [``matplotlib.axes.Axes.set_prop_cycle()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_prop_cycle.html#matplotlib.axes.Axes.set_prop_cycle), +which will only set the ``prop_cycle`` for this [``matplotlib.axes.Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) +instance. We'll use a second ``cycler`` that combines a color cycler and a +linewidth cycler. + +``` python +custom_cycler = (cycler(color=['c', 'm', 'y', 'k']) + + cycler(lw=[1, 2, 3, 4])) + +fig, (ax0, ax1) = plt.subplots(nrows=2) +ax0.plot(yy) +ax0.set_title('Set default color cycle to rgby') +ax1.set_prop_cycle(custom_cycler) +ax1.plot(yy) +ax1.set_title('Set axes color cycle to cmyk') + +# Add a bit more space between the two plots. +fig.subplots_adjust(hspace=0.3) +plt.show() +``` + +![sphx_glr_color_cycle_001](https://matplotlib.org/_images/sphx_glr_color_cycle_001.png) + +## Setting ``prop_cycle`` in the ``matplotlibrc`` file or style files + +Remember, if you want to set a custom cycler in your +``.matplotlibrc`` file or a style file (``style.mplstyle``), you can set the +``axes.prop_cycle`` property: + +``` python +axes.prop_cycle : cycler(color='bgrcmyk') +``` + +## Cycling through multiple properties + +You can add cyclers: + +``` python +from cycler import cycler +cc = (cycler(color=list('rgb')) + + cycler(linestyle=['-', '--', '-.'])) +for d in cc: + print(d) +``` + +Results in: + +``` python +{'color': 'r', 'linestyle': '-'} +{'color': 'g', 'linestyle': '--'} +{'color': 'b', 'linestyle': '-.'} +``` + +You can multiply cyclers: + +``` python +from cycler import cycler +cc = (cycler(color=list('rgb')) * + cycler(linestyle=['-', '--', '-.'])) +for d in cc: + print(d) +``` + +Results in: + +``` python +{'color': 'r', 'linestyle': '-'} +{'color': 'r', 'linestyle': '--'} +{'color': 'r', 'linestyle': '-.'} +{'color': 'g', 'linestyle': '-'} +{'color': 'g', 'linestyle': '--'} +{'color': 'g', 'linestyle': '-.'} +{'color': 'b', 'linestyle': '-'} +{'color': 'b', 'linestyle': '--'} +{'color': 'b', 'linestyle': '-.'} +``` + +## Download + +- [Download Python source code: color_cycle.py](https://matplotlib.org/_downloads/6d214f31d57999a93c8a6e18f0ce6aab/color_cycle.py) +- [Download Jupyter notebook: color_cycle.ipynb](https://matplotlib.org/_downloads/e2174f7bdc06ad628a756f14967811ee/color_cycle.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/intermediate/constrainedlayout_guide.md b/Python/matplotlab/intermediate/constrainedlayout_guide.md new file mode 100644 index 00000000..eb6ca8bc --- /dev/null +++ b/Python/matplotlab/intermediate/constrainedlayout_guide.md @@ -0,0 +1,900 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Constrained Layout Guide + +How to use constrained-layout to fit plots within your figure cleanly. + +*constrained_layout* automatically adjusts subplots and decorations like +legends and colorbars so that they fit in the figure window while still +preserving, as best they can, the logical layout requested by the user. + +*constrained_layout* is similar to +[tight_layout](tight_layout_guide.html), +but uses a constraint solver to determine the size of axes that allows +them to fit. + +*constrained_layout* needs to be activated before any axes are added to +a figure. Two ways of doing so are + +- using the respective argument to [``subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) or +[``figure()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure), e.g.: + +``` python +plt.subplots(constrained_layout=True) +``` +- activate it via [rcParams](https://matplotlib.org/introductory/customizing.html#matplotlib-rcparams), like: + +``` python +plt.rcParams['figure.constrained_layout.use'] = True +``` + +Those are described in detail throughout the following sections. + +::: danger Warning + +Currently Constrained Layout is **experimental**. The +behaviour and API are subject to change, or the whole functionality +may be removed without a deprecation period. If you *require* your +plots to be absolutely reproducible, get the Axes positions after +running Constrained Layout and use ``ax.set_position()`` in your code +with ``constrained_layout=False``. + +::: + +## Simple Example + +In Matplotlib, the location of axes (including subplots) are specified in +normalized figure coordinates. It can happen that your axis labels or +titles (or sometimes even ticklabels) go outside the figure area, and are thus +clipped. + +``` python +# sphinx_gallery_thumbnail_number = 18 + + +import matplotlib.pyplot as plt +import matplotlib.colors as mcolors +import matplotlib.gridspec as gridspec +import numpy as np + + +plt.rcParams['savefig.facecolor'] = "0.8" +plt.rcParams['figure.figsize'] = 4.5, 4. + + +def example_plot(ax, fontsize=12, nodec=False): + ax.plot([1, 2]) + + ax.locator_params(nbins=3) + if not nodec: + ax.set_xlabel('x-label', fontsize=fontsize) + ax.set_ylabel('y-label', fontsize=fontsize) + ax.set_title('Title', fontsize=fontsize) + else: + ax.set_xticklabels('') + ax.set_yticklabels('') + + +fig, ax = plt.subplots(constrained_layout=False) +example_plot(ax, fontsize=24) +``` + +![sphx_glr_constrainedlayout_guide_001](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_001.png) + +To prevent this, the location of axes needs to be adjusted. For +subplots, this can be done by adjusting the subplot params +([Move the edge of an axes to make room for tick labels](https://matplotlib.orgfaq/howto_faq.html#howto-subplots-adjust)). However, specifying your figure with the +``constrained_layout=True`` kwarg will do the adjusting automatically. + +``` python +fig, ax = plt.subplots(constrained_layout=True) +example_plot(ax, fontsize=24) +``` + +![sphx_glr_constrainedlayout_guide_002](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_002.png) + +When you have multiple subplots, often you see labels of different +axes overlapping each other. + +``` python +fig, axs = plt.subplots(2, 2, constrained_layout=False) +for ax in axs.flat: + example_plot(ax) +``` + +![sphx_glr_constrainedlayout_guide_003](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_003.png) + +Specifying ``constrained_layout=True`` in the call to ``plt.subplots`` +causes the layout to be properly constrained. + +``` python +fig, axs = plt.subplots(2, 2, constrained_layout=True) +for ax in axs.flat: + example_plot(ax) +``` + +![sphx_glr_constrainedlayout_guide_004](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_004.png) + +## Colorbars + +If you create a colorbar with the [``colorbar()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.colorbar.html#matplotlib.pyplot.colorbar) +command you need to make room for it. ``constrained_layout`` does this +automatically. Note that if you specify ``use_gridspec=True`` it will be +ignored because this option is made for improving the layout via +``tight_layout``. + +::: tip Note + +For the [``pcolormesh``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.pcolormesh.html#matplotlib.axes.Axes.pcolormesh) kwargs (``pc_kwargs``) we use a +dictionary. Below we will assign one colorbar to a number of axes each +containing a [``ScalarMappable``](https://matplotlib.orgapi/cm_api.html#matplotlib.cm.ScalarMappable); specifying the norm and colormap +ensures the colorbar is accurate for all the axes. + +::: + +``` python +arr = np.arange(100).reshape((10, 10)) +norm = mcolors.Normalize(vmin=0., vmax=100.) +# see note above: this makes all pcolormesh calls consistent: +pc_kwargs = {'rasterized': True, 'cmap': 'viridis', 'norm': norm} +fig, ax = plt.subplots(figsize=(4, 4), constrained_layout=True) +im = ax.pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=ax, shrink=0.6) +``` + +![sphx_glr_constrainedlayout_guide_005](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_005.png) + +If you specify a list of axes (or other iterable container) to the +``ax`` argument of ``colorbar``, constrained_layout will take space from +the specified axes. + +``` python +fig, axs = plt.subplots(2, 2, figsize=(4, 4), constrained_layout=True) +for ax in axs.flat: + im = ax.pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=axs, shrink=0.6) +``` + +![sphx_glr_constrainedlayout_guide_006](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_006.png) + +If you specify a list of axes from inside a grid of axes, the colorbar +will steal space appropriately, and leave a gap, but all subplots will +still be the same size. + +``` python +fig, axs = plt.subplots(3, 3, figsize=(4, 4), constrained_layout=True) +for ax in axs.flat: + im = ax.pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=axs[1:, ][:, 1], shrink=0.8) +fig.colorbar(im, ax=axs[:, -1], shrink=0.6) +``` + +![sphx_glr_constrainedlayout_guide_007](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_007.png) + +Note that there is a bit of a subtlety when specifying a single axes +as the parent. In the following, it might be desirable and expected +for the colorbars to line up, but they don't because the colorbar paired +with the bottom axes is tied to the subplotspec of the axes, and hence +shrinks when the gridspec-level colorbar is added. + +``` python +fig, axs = plt.subplots(3, 1, figsize=(4, 4), constrained_layout=True) +for ax in axs[:2]: + im = ax.pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=axs[:2], shrink=0.6) +im = axs[2].pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=axs[2], shrink=0.6) +``` + +![sphx_glr_constrainedlayout_guide_008](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_008.png) + +The API to make a single-axes behave like a list of axes is to specify +it as a list (or other iterable container), as below: + +``` python +fig, axs = plt.subplots(3, 1, figsize=(4, 4), constrained_layout=True) +for ax in axs[:2]: + im = ax.pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=axs[:2], shrink=0.6) +im = axs[2].pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=[axs[2]], shrink=0.6) +``` + +![sphx_glr_constrainedlayout_guide_009](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_009.png) + +## Suptitle + +``constrained_layout`` can also make room for [``suptitle``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.suptitle). + +``` python +fig, axs = plt.subplots(2, 2, figsize=(4, 4), constrained_layout=True) +for ax in axs.flat: + im = ax.pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=axs, shrink=0.6) +fig.suptitle('Big Suptitle') +``` + +![sphx_glr_constrainedlayout_guide_010](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_010.png) + +## Legends + +Legends can be placed outside of their parent axis. +Constrained-layout is designed to handle this for [``Axes.legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.legend.html#matplotlib.axes.Axes.legend). +However, constrained-layout does *not* handle legends being created via +[``Figure.legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.legend) (yet). + +``` python +fig, ax = plt.subplots(constrained_layout=True) +ax.plot(np.arange(10), label='This is a plot') +ax.legend(loc='center left', bbox_to_anchor=(0.8, 0.5)) +``` + +![sphx_glr_constrainedlayout_guide_011](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_011.png) + +However, this will steal space from a subplot layout: + +``` python +fig, axs = plt.subplots(1, 2, figsize=(4, 2), constrained_layout=True) +axs[0].plot(np.arange(10)) +axs[1].plot(np.arange(10), label='This is a plot') +axs[1].legend(loc='center left', bbox_to_anchor=(0.8, 0.5)) +``` + +![sphx_glr_constrainedlayout_guide_012](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_012.png) + +In order for a legend or other artist to *not* steal space +from the subplot layout, we can ``leg.set_in_layout(False)``. +Of course this can mean the legend ends up +cropped, but can be useful if the plot is subsequently called +with ``fig.savefig('outname.png', bbox_inches='tight')``. Note, +however, that the legend's ``get_in_layout`` status will have to be +toggled again to make the saved file work, and we must manually +trigger a draw if we want constrained_layout to adjust the size +of the axes before printing. + +``` python +fig, axs = plt.subplots(1, 2, figsize=(4, 2), constrained_layout=True) + +axs[0].plot(np.arange(10)) +axs[1].plot(np.arange(10), label='This is a plot') +leg = axs[1].legend(loc='center left', bbox_to_anchor=(0.8, 0.5)) +leg.set_in_layout(False) +# trigger a draw so that constrained_layout is executed once +# before we turn it off when printing.... +fig.canvas.draw() +# we want the legend included in the bbox_inches='tight' calcs. +leg.set_in_layout(True) +# we don't want the layout to change at this point. +fig.set_constrained_layout(False) +fig.savefig('CL01.png', bbox_inches='tight', dpi=100) +``` + +![sphx_glr_constrainedlayout_guide_013](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_013.png) + +The saved file looks like: + +![CL01](https://matplotlib.org/_images/CL01.png) + +A better way to get around this awkwardness is to simply +use the legend method provided by [``Figure.legend``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.legend): + +``` python +fig, axs = plt.subplots(1, 2, figsize=(4, 2), constrained_layout=True) +axs[0].plot(np.arange(10)) +lines = axs[1].plot(np.arange(10), label='This is a plot') +labels = [l.get_label() for l in lines] +leg = fig.legend(lines, labels, loc='center left', + bbox_to_anchor=(0.8, 0.5), bbox_transform=axs[1].transAxes) +fig.savefig('CL02.png', bbox_inches='tight', dpi=100) +``` + +![sphx_glr_constrainedlayout_guide_014](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_014.png) + +The saved file looks like: + +![CL02](https://matplotlib.org/_images/CL02.png) + +## Padding and Spacing + +For constrained_layout, we have implemented a padding around the edge of +each axes. This padding sets the distance from the edge of the plot, +and the minimum distance between adjacent plots. It is specified in +inches by the keyword arguments ``w_pad`` and ``h_pad`` to the function +[``set_constrained_layout_pads``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.set_constrained_layout_pads): + +``` python +fig, axs = plt.subplots(2, 2, constrained_layout=True) +for ax in axs.flat: + example_plot(ax, nodec=True) + ax.set_xticklabels('') + ax.set_yticklabels('') +fig.set_constrained_layout_pads(w_pad=4./72., h_pad=4./72., + hspace=0., wspace=0.) + +fig, axs = plt.subplots(2, 2, constrained_layout=True) +for ax in axs.flat: + example_plot(ax, nodec=True) + ax.set_xticklabels('') + ax.set_yticklabels('') +fig.set_constrained_layout_pads(w_pad=2./72., h_pad=2./72., + hspace=0., wspace=0.) +``` + +- ![sphx_glr_constrainedlayout_guide_015](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_015.png) +- ![sphx_glr_constrainedlayout_guide_016](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_016.png) + +Spacing between subplots is set by ``wspace`` and ``hspace``. There are +specified as a fraction of the size of the subplot group as a whole. +If the size of the figure is changed, then these spaces change in +proportion. Note in the blow how the space at the edges doesn't change from +the above, but the space between subplots does. + +``` python +fig, axs = plt.subplots(2, 2, constrained_layout=True) +for ax in axs.flat: + example_plot(ax, nodec=True) + ax.set_xticklabels('') + ax.set_yticklabels('') +fig.set_constrained_layout_pads(w_pad=2./72., h_pad=2./72., + hspace=0.2, wspace=0.2) +``` + +![sphx_glr_constrainedlayout_guide_017](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_017.png) + +### Spacing with colorbars + +Colorbars will be placed ``wspace`` and ``hsapce`` apart from other +subplots. The padding between the colorbar and the axis it is +attached to will never be less than ``w_pad`` (for a vertical colorbar) +or ``h_pad`` (for a horizontal colorbar). Note the use of the ``pad`` kwarg +here in the ``colorbar`` call. It defaults to 0.02 of the size +of the axis it is attached to. + +``` python +fig, axs = plt.subplots(2, 2, constrained_layout=True) +for ax in axs.flat: + pc = ax.pcolormesh(arr, **pc_kwargs) + fig.colorbar(pc, ax=ax, shrink=0.6, pad=0) + ax.set_xticklabels('') + ax.set_yticklabels('') +fig.set_constrained_layout_pads(w_pad=2./72., h_pad=2./72., + hspace=0.2, wspace=0.2) +``` + +![sphx_glr_constrainedlayout_guide_018](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_018.png) + +In the above example, the colorbar will not ever be closer than 2 pts to +the plot, but if we want it a bit further away, we can specify its value +for ``pad`` to be non-zero. + +``` python +fig, axs = plt.subplots(2, 2, constrained_layout=True) +for ax in axs.flat: + pc = ax.pcolormesh(arr, **pc_kwargs) + fig.colorbar(im, ax=ax, shrink=0.6, pad=0.05) + ax.set_xticklabels('') + ax.set_yticklabels('') +fig.set_constrained_layout_pads(w_pad=2./72., h_pad=2./72., + hspace=0.2, wspace=0.2) +``` + +![sphx_glr_constrainedlayout_guide_019](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_019.png) + +## rcParams + +There are five [rcParams](https://matplotlib.org/introductory/customizing.html#matplotlib-rcparams) that can be set, +either in a script or in the ``matplotlibrc`` file. +They all have the prefix ``figure.constrained_layout``: + +- ``use``: Whether to use constrained_layout. Default is False +- ``w_pad``, ``h_pad``: Padding around axes objects. +Float representing inches. Default is 3./72. inches (3 pts) +- ``wspace``, ``hspace``: Space between subplot groups. +Float representing a fraction of the subplot widths being separated. +Default is 0.02. + +``` python +plt.rcParams['figure.constrained_layout.use'] = True +fig, axs = plt.subplots(2, 2, figsize=(3, 3)) +for ax in axs.flat: + example_plot(ax) +``` + +![sphx_glr_constrainedlayout_guide_020](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_020.png) + +## Use with GridSpec + +constrained_layout is meant to be used +with [``subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.subplots) or +[``GridSpec()``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec) and +[``add_subplot()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_subplot). + +Note that in what follows ``constrained_layout=True`` + +``` python +fig = plt.figure() + +gs1 = gridspec.GridSpec(2, 1, figure=fig) +ax1 = fig.add_subplot(gs1[0]) +ax2 = fig.add_subplot(gs1[1]) + +example_plot(ax1) +example_plot(ax2) +``` + +![sphx_glr_constrainedlayout_guide_021](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_021.png) + +More complicated gridspec layouts are possible. Note here we use the +convenience functions ``add_gridspec`` and ``subgridspec``. + +``` python +fig = plt.figure() + +gs0 = fig.add_gridspec(1, 2) + +gs1 = gs0[0].subgridspec(2, 1) +ax1 = fig.add_subplot(gs1[0]) +ax2 = fig.add_subplot(gs1[1]) + +example_plot(ax1) +example_plot(ax2) + +gs2 = gs0[1].subgridspec(3, 1) + +for ss in gs2: + ax = fig.add_subplot(ss) + example_plot(ax) + ax.set_title("") + ax.set_xlabel("") + +ax.set_xlabel("x-label", fontsize=12) +``` + +![sphx_glr_constrainedlayout_guide_022](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_022.png) + +Note that in the above the left and columns don't have the same vertical +extent. If we want the top and bottom of the two grids to line up then +they need to be in the same gridspec: + +``` python +fig = plt.figure() + +gs0 = fig.add_gridspec(6, 2) + +ax1 = fig.add_subplot(gs0[:3, 0]) +ax2 = fig.add_subplot(gs0[3:, 0]) + +example_plot(ax1) +example_plot(ax2) + +ax = fig.add_subplot(gs0[0:2, 1]) +example_plot(ax) +ax = fig.add_subplot(gs0[2:4, 1]) +example_plot(ax) +ax = fig.add_subplot(gs0[4:, 1]) +example_plot(ax) +``` + +![sphx_glr_constrainedlayout_guide_023](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_023.png) + +This example uses two gridspecs to have the colorbar only pertain to +one set of pcolors. Note how the left column is wider than the +two right-hand columns because of this. Of course, if you wanted the +subplots to be the same size you only needed one gridspec. + +``` python +def docomplicated(suptitle=None): + fig = plt.figure() + gs0 = fig.add_gridspec(1, 2, figure=fig, width_ratios=[1., 2.]) + gsl = gs0[0].subgridspec(2, 1) + gsr = gs0[1].subgridspec(2, 2) + + for gs in gsl: + ax = fig.add_subplot(gs) + example_plot(ax) + axs = [] + for gs in gsr: + ax = fig.add_subplot(gs) + pcm = ax.pcolormesh(arr, **pc_kwargs) + ax.set_xlabel('x-label') + ax.set_ylabel('y-label') + ax.set_title('title') + + axs += [ax] + fig.colorbar(pcm, ax=axs) + if suptitle is not None: + fig.suptitle(suptitle) + +docomplicated() +``` + +![sphx_glr_constrainedlayout_guide_024](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_024.png) + +## Manually setting axes positions + +There can be good reasons to manually set an axes position. A manual call +to [``set_position``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_position.html#matplotlib.axes.Axes.set_position) will set the axes so constrained_layout has +no effect on it anymore. (Note that constrained_layout still leaves the +space for the axes that is moved). + +``` python +fig, axs = plt.subplots(1, 2) +example_plot(axs[0], fontsize=12) +axs[1].set_position([0.2, 0.2, 0.4, 0.4]) +``` + +![sphx_glr_constrainedlayout_guide_025](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_025.png) + +If you want an inset axes in data-space, you need to manually execute the +layout using ``fig.execute_constrained_layout()`` call. The inset figure +will then be properly positioned. However, it will not be properly +positioned if the size of the figure is subsequently changed. Similarly, +if the figure is printed to another backend, there may be slight changes +of location due to small differences in how the backends render fonts. + +``` python +from matplotlib.transforms import Bbox + +fig, axs = plt.subplots(1, 2) +example_plot(axs[0], fontsize=12) +fig.execute_constrained_layout() +# put into data-space: +bb_data_ax2 = Bbox.from_bounds(0.5, 1., 0.2, 0.4) +disp_coords = axs[0].transData.transform(bb_data_ax2) +fig_coords_ax2 = fig.transFigure.inverted().transform(disp_coords) +bb_ax2 = Bbox(fig_coords_ax2) +ax2 = fig.add_axes(bb_ax2) +``` + +![sphx_glr_constrainedlayout_guide_026](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_026.png) + +## Manually turning off ``constrained_layout`` + +``constrained_layout`` usually adjusts the axes positions on each draw +of the figure. If you want to get the spacing provided by +``constrained_layout`` but not have it update, then do the initial +draw and then call ``fig.set_constrained_layout(False)``. +This is potentially useful for animations where the tick labels may +change length. + +Note that ``constrained_layout`` is turned off for ``ZOOM`` and ``PAN`` +GUI events for the backends that use the toolbar. This prevents the +axes from changing position during zooming and panning. + +## Limitations + +### Incompatible functions + +``constrained_layout`` will not work on subplots +created via the ``subplot`` command. The reason is that each of these +commands creates a separate ``GridSpec`` instance and ``constrained_layout`` +uses (nested) gridspecs to carry out the layout. So the following fails +to yield a nice layout: + +``` python +fig = plt.figure() + +ax1 = plt.subplot(221) +ax2 = plt.subplot(223) +ax3 = plt.subplot(122) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +``` + +![sphx_glr_constrainedlayout_guide_027](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_027.png) + +Of course that layout is possible using a gridspec: + +``` python +fig = plt.figure() +gs = fig.add_gridspec(2, 2) + +ax1 = fig.add_subplot(gs[0, 0]) +ax2 = fig.add_subplot(gs[1, 0]) +ax3 = fig.add_subplot(gs[:, 1]) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +``` + +![sphx_glr_constrainedlayout_guide_028](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_028.png) + +Similarly, +[``subplot2grid()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplot2grid.html#matplotlib.pyplot.subplot2grid) doesn't work for the same reason: +each call creates a different parent gridspec. + +``` python +fig = plt.figure() + +ax1 = plt.subplot2grid((3, 3), (0, 0)) +ax2 = plt.subplot2grid((3, 3), (0, 1), colspan=2) +ax3 = plt.subplot2grid((3, 3), (1, 0), colspan=2, rowspan=2) +ax4 = plt.subplot2grid((3, 3), (1, 2), rowspan=2) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) +``` + +![sphx_glr_constrainedlayout_guide_029](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_029.png) + +The way to make this plot compatible with ``constrained_layout`` is again +to use ``gridspec`` directly + +``` python +fig = plt.figure() +gs = fig.add_gridspec(3, 3) + +ax1 = fig.add_subplot(gs[0, 0]) +ax2 = fig.add_subplot(gs[0, 1:]) +ax3 = fig.add_subplot(gs[1:, 0:2]) +ax4 = fig.add_subplot(gs[1:, -1]) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) +``` + +![sphx_glr_constrainedlayout_guide_030](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_030.png) + +### Other Caveats + +- ``constrained_layout`` only considers ticklabels, axis labels, titles, and +legends. Thus, other artists may be clipped and also may overlap. +- It assumes that the extra space needed for ticklabels, axis labels, +and titles is independent of original location of axes. This is +often true, but there are rare cases where it is not. +- There are small differences in how the backends handle rendering fonts, +so the results will not be pixel-identical. + +## Debugging + +Constrained-layout can fail in somewhat unexpected ways. Because it uses +a constraint solver the solver can find solutions that are mathematically +correct, but that aren't at all what the user wants. The usual failure +mode is for all sizes to collapse to their smallest allowable value. If +this happens, it is for one of two reasons: + +1. There was not enough room for the elements you were requesting to draw. +1. There is a bug - in which case open an issue at +[https://github.com/matplotlib/matplotlib/issues](https://github.com/matplotlib/matplotlib/issues). + +If there is a bug, please report with a self-contained example that does +not require outside data or dependencies (other than numpy). + +## Notes on the algorithm + +The algorithm for the constraint is relatively straightforward, but +has some complexity due to the complex ways we can layout a figure. + +### Figure layout + +Figures are laid out in a hierarchy: + +1. Figure: ``fig = plt.figure()`` + +Each item has a layoutbox associated with it. The nesting of gridspecs +created with [``GridSpecFromSubplotSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpecFromSubplotSpec.html#matplotlib.gridspec.GridSpecFromSubplotSpec) can be arbitrarily deep. + +Each [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) has *two* layoutboxes. The first one, +``ax._layoutbox`` represents the outside of the Axes and all its +decorations (i.e. ticklabels,axis labels, etc.). +The second layoutbox corresponds to the Axes' ``ax.position``, which sets +where in the figure the spines are placed. + +Why so many stacked containers? Ideally, all that would be needed are the +Axes layout boxes. For the Gridspec case, a container is +needed if the Gridspec is nested via [``GridSpecFromSubplotSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpecFromSubplotSpec.html#matplotlib.gridspec.GridSpecFromSubplotSpec). At the +top level, it is desirable for symmetry, but it also makes room for +[``suptitle``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.suptitle). + +For the Subplotspec/Axes case, Axes often have colorbars or other +annotations that need to be packaged inside the Subplotspec, hence the +need for the outer layer. + +### Simple case: one Axes + +For a single Axes the layout is straight forward. The Figure and +outer Gridspec layoutboxes coincide. The Subplotspec and Axes +boxes also coincide because the Axes has no colorbar. Note +the difference between the red ``pos`` box and the green ``ax`` box +is set by the size of the decorations around the Axes. + +In the code, this is accomplished by the entries in +``do_constrained_layout()`` like: + +``` python +ax._poslayoutbox.edit_left_margin_min(-bbox.x0 + pos.x0 + w_padt) +``` + +``` python +from matplotlib._layoutbox import plot_children + +fig, ax = plt.subplots(constrained_layout=True) +example_plot(ax, fontsize=24) +plot_children(fig, fig._layoutbox, printit=False) +``` + +![sphx_glr_constrainedlayout_guide_031](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_031.png) + +### Simple case: two Axes + +For this case, the Axes layoutboxes and the Subplotspec boxes still +co-incide. However, because the decorations in the right-hand plot are so +much smaller than the left-hand, so the right-hand layoutboxes are smaller. + +The Subplotspec boxes are laid out in the code in the subroutine +``arange_subplotspecs()``, which simply checks the subplotspecs in the code +against one another and stacks them appropriately. + +The two ``pos`` axes are lined up. Because they have the same +minimum row, they are lined up at the top. Because +they have the same maximum row they are lined up at the bottom. In the +code this is accomplished via the calls to ``layoutbox.align``. If +there was more than one row, then the same horizontal alignment would +occur between the rows. + +The two ``pos`` axes are given the same width because the subplotspecs +occupy the same number of columns. This is accomplished in the code where +``dcols0`` is compared to ``dcolsC``. If they are equal, then their widths +are constrained to be equal. + +While it is a bit subtle in this case, note that the division between the +Subplotspecs is *not* centered, but has been moved to the right to make +space for the larger labels on the left-hand plot. + +``` python +fig, ax = plt.subplots(1, 2, constrained_layout=True) +example_plot(ax[0], fontsize=32) +example_plot(ax[1], fontsize=8) +plot_children(fig, fig._layoutbox, printit=False) +``` + +![sphx_glr_constrainedlayout_guide_032](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_032.png) + +### Two Axes and colorbar + +Adding a colorbar makes it clear why the Subplotspec layoutboxes must +be different from the axes layoutboxes. Here we see the left-hand +subplotspec has more room to accommodate the [``colorbar``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.colorbar), and +that there are two green ``ax`` boxes inside the ``ss`` box. + +Note that the width of the ``pos`` boxes is still the same because of the +constraint on their widths because their subplotspecs occupy the same +number of columns (one in this example). + +The colorbar layout logic is contained in [``make_axes``](https://matplotlib.orgapi/colorbar_api.html#matplotlib.colorbar.make_axes) +which calls ``_constrained_layout.layoutcolorbarsingle()`` +for cbars attached to a single axes, and +``_constrained_layout.layoutcolorbargridspec()`` if the colorbar is +associated with a gridspec. + +``` python +fig, ax = plt.subplots(1, 2, constrained_layout=True) +im = ax[0].pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=ax[0], shrink=0.6) +im = ax[1].pcolormesh(arr, **pc_kwargs) +plot_children(fig, fig._layoutbox, printit=False) +``` + +![sphx_glr_constrainedlayout_guide_033](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_033.png) + +### Colorbar associated with a Gridspec + +This example shows the Subplotspec layoutboxes being made smaller by +a colorbar layoutbox. The size of the colorbar layoutbox is +set to be ``shrink`` smaller than the vertical extent of the ``pos`` +layoutboxes in the gridspec, and it is made to be centered between +those two points. + +``` python +fig, axs = plt.subplots(2, 2, constrained_layout=True) +for ax in axs.flat: + im = ax.pcolormesh(arr, **pc_kwargs) +fig.colorbar(im, ax=axs, shrink=0.6) +plot_children(fig, fig._layoutbox, printit=False) +``` + +![sphx_glr_constrainedlayout_guide_034](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_034.png) + +### Uneven sized Axes + +There are two ways to make axes have an uneven size in a +Gridspec layout, either by specifying them to cross Gridspecs rows +or columns, or by specifying width and height ratios. + +The first method is used here. The constraint that makes the heights +be correct is in the code where ``drowsC < drows0`` which in +this case would be 1 is less than 2. So we constrain the +height of the 1-row Axes to be less than half the height of the +2-row Axes. + +::: tip Note + +This algorithm can be wrong if the decorations attached to the smaller +axes are very large, so there is an unaccounted-for edge case. + +::: + +``` python +fig = plt.figure(constrained_layout=True) +gs = gridspec.GridSpec(2, 2, figure=fig) +ax = fig.add_subplot(gs[:, 0]) +im = ax.pcolormesh(arr, **pc_kwargs) +ax = fig.add_subplot(gs[0, 1]) +im = ax.pcolormesh(arr, **pc_kwargs) +ax = fig.add_subplot(gs[1, 1]) +im = ax.pcolormesh(arr, **pc_kwargs) +plot_children(fig, fig._layoutbox, printit=False) +``` + +![sphx_glr_constrainedlayout_guide_035](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_035.png) + +Height and width ratios are accommodated with the same part of +the code with the smaller axes always constrained to be less in size +than the larger. + +``` python +fig = plt.figure(constrained_layout=True) +gs = gridspec.GridSpec(3, 2, figure=fig, + height_ratios=[1., 0.5, 1.5], + width_ratios=[1.2, 0.8]) +ax = fig.add_subplot(gs[:2, 0]) +im = ax.pcolormesh(arr, **pc_kwargs) +ax = fig.add_subplot(gs[2, 0]) +im = ax.pcolormesh(arr, **pc_kwargs) +ax = fig.add_subplot(gs[0, 1]) +im = ax.pcolormesh(arr, **pc_kwargs) +ax = fig.add_subplot(gs[1:, 1]) +im = ax.pcolormesh(arr, **pc_kwargs) +plot_children(fig, fig._layoutbox, printit=False) +``` + +![sphx_glr_constrainedlayout_guide_036](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_036.png) + +### Empty gridspec slots + +The final piece of the code that has not been explained is what happens if +there is an empty gridspec opening. In that case a fake invisible axes is +added and we proceed as before. The empty gridspec has no decorations, but +the axes position in made the same size as the occupied Axes positions. + +This is done at the start of +``_constrained_layout.do_constrained_layout()`` (``hassubplotspec``). + +``` python +fig = plt.figure(constrained_layout=True) +gs = gridspec.GridSpec(1, 3, figure=fig) +ax = fig.add_subplot(gs[0]) +im = ax.pcolormesh(arr, **pc_kwargs) +ax = fig.add_subplot(gs[-1]) +im = ax.pcolormesh(arr, **pc_kwargs) +plot_children(fig, fig._layoutbox, printit=False) +plt.show() +``` + +![sphx_glr_constrainedlayout_guide_037](https://matplotlib.org/_images/sphx_glr_constrainedlayout_guide_037.png) + +### Other notes + +The layout is called only once. This is OK if the original layout was +pretty close (which it should be in most cases). However, if the layout +changes a lot from the default layout then the decorators can change size. +In particular the x and ytick labels can change. If this happens, then +we should probably call the whole routine twice. + +**Total running time of the script:** ( 0 minutes 15.551 seconds) + +## Download + +- [Download Python source code: constrainedlayout_guide.py](https://matplotlib.org/_downloads/20cdf5d6a41b563e2ad7f13d2f8eb742/constrainedlayout_guide.py) +- [Download Jupyter notebook: constrainedlayout_guide.ipynb](https://matplotlib.org/_downloads/22d57b5ff690950502e071d423750e4a/constrainedlayout_guide.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/intermediate/gridspec.md b/Python/matplotlab/intermediate/gridspec.md new file mode 100644 index 00000000..14da0c66 --- /dev/null +++ b/Python/matplotlab/intermediate/gridspec.md @@ -0,0 +1,305 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Customizing Figure Layouts Using GridSpec and Other Functions + +How to create grid-shaped combinations of axes. + +[``subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) + +[``GridSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec) + +[``SubplotSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.SubplotSpec.html#matplotlib.gridspec.SubplotSpec) + +[``subplot2grid()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplot2grid.html#matplotlib.pyplot.subplot2grid) + +``` python +import matplotlib +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec +``` + +## Basic Quickstart Guide + +These first two examples show how to create a basic 2-by-2 grid using +both [``subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) and [``gridspec``](https://matplotlib.orgapi/gridspec_api.html#module-matplotlib.gridspec). + +Using [``subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) is quite simple. +It returns a [``Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) instance and an array of +[``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) objects. + +``` python +fig1, f1_axes = plt.subplots(ncols=2, nrows=2, constrained_layout=True) +``` + +![sphx_glr_gridspec_001](https://matplotlib.org/_images/sphx_glr_gridspec_001.png) + +For a simple use case such as this, [``gridspec``](https://matplotlib.orgapi/gridspec_api.html#module-matplotlib.gridspec) is +perhaps overly verbose. +You have to create the figure and [``GridSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec) +instance separately, then pass elements of gridspec instance to the +[``add_subplot()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_subplot) method to create the axes +objects. +The elements of the gridspec are accessed in generally the same manner as +numpy arrays. + +``` python +fig2 = plt.figure(constrained_layout=True) +spec2 = gridspec.GridSpec(ncols=2, nrows=2, figure=fig2) +f2_ax1 = fig2.add_subplot(spec2[0, 0]) +f2_ax2 = fig2.add_subplot(spec2[0, 1]) +f2_ax3 = fig2.add_subplot(spec2[1, 0]) +f2_ax4 = fig2.add_subplot(spec2[1, 1]) +``` + +![sphx_glr_gridspec_002](https://matplotlib.org/_images/sphx_glr_gridspec_002.png) + +The power of gridspec comes in being able to create subplots that span +rows and columns. Note the +[Numpy slice](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) +syntax for selecting the part of the gridspec each subplot will occupy. + +Note that we have also used the convenience method [``Figure.add_gridspec``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_gridspec) +instead of [``gridspec.GridSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec), potentially saving the user an import, +and keeping the namespace cleaner. + +``` python +fig3 = plt.figure(constrained_layout=True) +gs = fig3.add_gridspec(3, 3) +f3_ax1 = fig3.add_subplot(gs[0, :]) +f3_ax1.set_title('gs[0, :]') +f3_ax2 = fig3.add_subplot(gs[1, :-1]) +f3_ax2.set_title('gs[1, :-1]') +f3_ax3 = fig3.add_subplot(gs[1:, -1]) +f3_ax3.set_title('gs[1:, -1]') +f3_ax4 = fig3.add_subplot(gs[-1, 0]) +f3_ax4.set_title('gs[-1, 0]') +f3_ax5 = fig3.add_subplot(gs[-1, -2]) +f3_ax5.set_title('gs[-1, -2]') +``` + +![sphx_glr_gridspec_003](https://matplotlib.org/_images/sphx_glr_gridspec_003.png) + +[``gridspec``](https://matplotlib.orgapi/gridspec_api.html#module-matplotlib.gridspec) is also indispensable for creating subplots +of different widths via a couple of methods. + +The method shown here is similar to the one above and initializes a +uniform grid specification, +and then uses numpy indexing and slices to allocate multiple +"cells" for a given subplot. + +``` python +fig4 = plt.figure(constrained_layout=True) +spec4 = fig4.add_gridspec(ncols=2, nrows=2) +anno_opts = dict(xy=(0.5, 0.5), xycoords='axes fraction', + va='center', ha='center') + +f4_ax1 = fig4.add_subplot(spec4[0, 0]) +f4_ax1.annotate('GridSpec[0, 0]', **anno_opts) +fig4.add_subplot(spec4[0, 1]).annotate('GridSpec[0, 1:]', **anno_opts) +fig4.add_subplot(spec4[1, 0]).annotate('GridSpec[1:, 0]', **anno_opts) +fig4.add_subplot(spec4[1, 1]).annotate('GridSpec[1:, 1:]', **anno_opts) +``` + +![sphx_glr_gridspec_004](https://matplotlib.org/_images/sphx_glr_gridspec_004.png) + +Another option is to use the ``width_ratios`` and ``height_ratios`` +parameters. These keyword arguments are lists of numbers. +Note that absolute values are meaningless, only their relative ratios +matter. That means that ``width_ratios=[2, 4, 8]`` is equivalent to +``width_ratios=[1, 2, 4]`` within equally wide figures. +For the sake of demonstration, we'll blindly create the axes within +``for`` loops since we won't need them later. + +``` python +fig5 = plt.figure(constrained_layout=True) +widths = [2, 3, 1.5] +heights = [1, 3, 2] +spec5 = fig5.add_gridspec(ncols=3, nrows=3, width_ratios=widths, + height_ratios=heights) +for row in range(3): + for col in range(3): + ax = fig5.add_subplot(spec5[row, col]) + label = 'Width: {}\nHeight: {}'.format(widths[col], heights[row]) + ax.annotate(label, (0.1, 0.5), xycoords='axes fraction', va='center') +``` + +![sphx_glr_gridspec_005](https://matplotlib.org/_images/sphx_glr_gridspec_005.png) + +Learning to use ``width_ratios`` and ``height_ratios`` is particularly +useful since the top-level function [``subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) +accepts them within the ``gridspec_kw`` parameter. +For that matter, any parameter accepted by +[``GridSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec) can be passed to +[``subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) via the ``gridspec_kw`` parameter. +This example recreates the previous figure without directly using a +gridspec instance. + +``` python +gs_kw = dict(width_ratios=widths, height_ratios=heights) +fig6, f6_axes = plt.subplots(ncols=3, nrows=3, constrained_layout=True, + gridspec_kw=gs_kw) +for r, row in enumerate(f6_axes): + for c, ax in enumerate(row): + label = 'Width: {}\nHeight: {}'.format(widths[c], heights[r]) + ax.annotate(label, (0.1, 0.5), xycoords='axes fraction', va='center') +``` + +![sphx_glr_gridspec_006](https://matplotlib.org/_images/sphx_glr_gridspec_006.png) + +The ``subplots`` and ``gridspec`` methods can be combined since it is +sometimes more convenient to make most of the subplots using ``subplots`` +and then remove some and combine them. Here we create a layout with +the bottom two axes in the last column combined. + +``` python +fig7, f7_axs = plt.subplots(ncols=3, nrows=3) +gs = f7_axs[1, 2].get_gridspec() +# remove the underlying axes +for ax in f7_axs[1:, -1]: + ax.remove() +axbig = fig7.add_subplot(gs[1:, -1]) +axbig.annotate('Big Axes \nGridSpec[1:, -1]', (0.1, 0.5), + xycoords='axes fraction', va='center') + +fig7.tight_layout() +``` + +![sphx_glr_gridspec_007](https://matplotlib.org/_images/sphx_glr_gridspec_007.png) + +## Fine Adjustments to a Gridspec Layout + +When a GridSpec is explicitly used, you can adjust the layout +parameters of subplots that are created from the GridSpec. Note this +option is not compatible with ``constrained_layout`` or +[``Figure.tight_layout``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.tight_layout) which both adjust subplot sizes to fill the +figure. + +``` python +fig8 = plt.figure(constrained_layout=False) +gs1 = fig8.add_gridspec(nrows=3, ncols=3, left=0.05, right=0.48, wspace=0.05) +f8_ax1 = fig8.add_subplot(gs1[:-1, :]) +f8_ax2 = fig8.add_subplot(gs1[-1, :-1]) +f8_ax3 = fig8.add_subplot(gs1[-1, -1]) +``` + +![sphx_glr_gridspec_008](https://matplotlib.org/_images/sphx_glr_gridspec_008.png) + +This is similar to [``subplots_adjust()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots_adjust.html#matplotlib.pyplot.subplots_adjust), but it only +affects the subplots that are created from the given GridSpec. + +For example, compare the left and right sides of this figure: + +``` python +fig9 = plt.figure(constrained_layout=False) +gs1 = fig9.add_gridspec(nrows=3, ncols=3, left=0.05, right=0.48, + wspace=0.05) +f9_ax1 = fig9.add_subplot(gs1[:-1, :]) +f9_ax2 = fig9.add_subplot(gs1[-1, :-1]) +f9_ax3 = fig9.add_subplot(gs1[-1, -1]) + +gs2 = fig9.add_gridspec(nrows=3, ncols=3, left=0.55, right=0.98, + hspace=0.05) +f9_ax4 = fig9.add_subplot(gs2[:, :-1]) +f9_ax5 = fig9.add_subplot(gs2[:-1, -1]) +f9_ax6 = fig9.add_subplot(gs2[-1, -1]) +``` + +![sphx_glr_gridspec_009](https://matplotlib.org/_images/sphx_glr_gridspec_009.png) + +## GridSpec using SubplotSpec + +You can create GridSpec from the [``SubplotSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.SubplotSpec.html#matplotlib.gridspec.SubplotSpec), +in which case its layout parameters are set to that of the location of +the given SubplotSpec. + +Note this is also available from the more verbose +[``gridspec.GridSpecFromSubplotSpec``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpecFromSubplotSpec.html#matplotlib.gridspec.GridSpecFromSubplotSpec). + +``` python +fig10 = plt.figure(constrained_layout=True) +gs0 = fig10.add_gridspec(1, 2) + +gs00 = gs0[0].subgridspec(2, 3) +gs01 = gs0[1].subgridspec(3, 2) + +for a in range(2): + for b in range(3): + fig10.add_subplot(gs00[a, b]) + fig10.add_subplot(gs01[b, a]) +``` + +![sphx_glr_gridspec_010](https://matplotlib.org/_images/sphx_glr_gridspec_010.png) + +## A Complex Nested GridSpec using SubplotSpec + +Here's a more sophisticated example of nested GridSpec where we put +a box around each cell of the outer 4x4 grid, by hiding appropriate +spines in each of the inner 3x3 grids. + +``` python +import numpy as np +from itertools import product + + +def squiggle_xy(a, b, c, d, i=np.arange(0.0, 2*np.pi, 0.05)): + return np.sin(i*a)*np.cos(i*b), np.sin(i*c)*np.cos(i*d) + + +fig11 = plt.figure(figsize=(8, 8), constrained_layout=False) + +# gridspec inside gridspec +outer_grid = fig11.add_gridspec(4, 4, wspace=0.0, hspace=0.0) + +for i in range(16): + inner_grid = outer_grid[i].subgridspec(3, 3, wspace=0.0, hspace=0.0) + a, b = int(i/4)+1, i % 4+1 + for j, (c, d) in enumerate(product(range(1, 4), repeat=2)): + ax = fig11.add_subplot(inner_grid[j]) + ax.plot(*squiggle_xy(a, b, c, d)) + ax.set_xticks([]) + ax.set_yticks([]) + fig11.add_subplot(ax) + +all_axes = fig11.get_axes() + +# show only the outside spines +for ax in all_axes: + for sp in ax.spines.values(): + sp.set_visible(False) + if ax.is_first_row(): + ax.spines['top'].set_visible(True) + if ax.is_last_row(): + ax.spines['bottom'].set_visible(True) + if ax.is_first_col(): + ax.spines['left'].set_visible(True) + if ax.is_last_col(): + ax.spines['right'].set_visible(True) + +plt.show() +``` + +![sphx_glr_gridspec_011](https://matplotlib.org/_images/sphx_glr_gridspec_011.png) + +### References + +The usage of the following functions and methods is shown in this example: + +``` python +matplotlib.pyplot.subplots +matplotlib.figure.Figure.add_gridspec +matplotlib.figure.Figure.add_subplot +matplotlib.gridspec.GridSpec +matplotlib.gridspec.SubplotSpec.subgridspec +matplotlib.gridspec.GridSpecFromSubplotSpec +``` + +**Total running time of the script:** ( 0 minutes 8.732 seconds) + +## Download + +- [Download Python source code: gridspec.py](https://matplotlib.org/_downloads/54501e30d0a29665618afe715673cb41/gridspec.py) +- [Download Jupyter notebook: gridspec.ipynb](https://matplotlib.org/_downloads/0eaf234b06f4f7a6a52fa9ca11b63755/gridspec.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/intermediate/imshow_extent.md b/Python/matplotlab/intermediate/imshow_extent.md new file mode 100644 index 00000000..310db0f2 --- /dev/null +++ b/Python/matplotlab/intermediate/imshow_extent.md @@ -0,0 +1,295 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# origin and extent in ``imshow`` + +[``imshow()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.imshow.html#matplotlib.axes.Axes.imshow) allows you to render an image (either a 2D array +which will be color-mapped (based on *norm* and *cmap*) or and 3D RGB(A) +array which will be used as-is) to a rectangular region in dataspace. +The orientation of the image in the final rendering is controlled by +the *origin* and *extent* kwargs (and attributes on the resulting +[``AxesImage``](https://matplotlib.orgapi/image_api.html#matplotlib.image.AxesImage) instance) and the data limits of the axes. + +The *extent* kwarg controls the bounding box in data coordinates that +the image will fill specified as ``(left, right, bottom, top)`` in +**data coordinates**, the *origin* kwarg controls how the image fills +that bounding box, and the orientation in the final rendered image is +also affected by the axes limits. + +Most of the code below is used for adding labels and informative +text to the plots. The described effects of *origin* and *extent* can be +seen in the plots without the need to follow all code details. + +For a quick understanding, you may want to skip the code details below and +directly continue with the discussion of the results. + +``` python +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.gridspec import GridSpec + + +def index_to_coordinate(index, extent, origin): + """Return the pixel center of an index.""" + left, right, bottom, top = extent + + hshift = 0.5 * np.sign(right - left) + left, right = left + hshift, right - hshift + vshift = 0.5 * np.sign(top - bottom) + bottom, top = bottom + vshift, top - vshift + + if origin == 'upper': + bottom, top = top, bottom + + return { + "[0, 0]": (left, bottom), + "[M', 0]": (left, top), + "[0, N']": (right, bottom), + "[M', N']": (right, top), + }[index] + + +def get_index_label_pos(index, extent, origin, inverted_xindex): + """ + Return the desired position and horizontal alignment of an index label. + """ + if extent is None: + extent = lookup_extent(origin) + left, right, bottom, top = extent + x, y = index_to_coordinate(index, extent, origin) + + is_x0 = index[-2:] == "0]" + halign = 'left' if is_x0 ^ inverted_xindex else 'right' + hshift = 0.5 * np.sign(left - right) + x += hshift * (1 if is_x0 else -1) + return x, y, halign + + +def get_color(index, data, cmap): + """Return the data color of an index.""" + val = { + "[0, 0]": data[0, 0], + "[0, N']": data[0, -1], + "[M', 0]": data[-1, 0], + "[M', N']": data[-1, -1], + }[index] + return cmap(val / data.max()) + + +def lookup_extent(origin): + """Return extent for label positioning when not given explicitly.""" + if origin == 'lower': + return (-0.5, 6.5, -0.5, 5.5) + else: + return (-0.5, 6.5, 5.5, -0.5) + + +def set_extent_None_text(ax): + ax.text(3, 2.5, 'equals\nextent=None', size='large', + ha='center', va='center', color='w') + + +def plot_imshow_with_labels(ax, data, extent, origin, xlim, ylim): + """Actually run ``imshow()`` and add extent and index labels.""" + im = ax.imshow(data, origin=origin, extent=extent) + + # extent labels (left, right, bottom, top) + left, right, bottom, top = im.get_extent() + if xlim is None or top > bottom: + upper_string, lower_string = 'top', 'bottom' + else: + upper_string, lower_string = 'bottom', 'top' + if ylim is None or left < right: + port_string, starboard_string = 'left', 'right' + inverted_xindex = False + else: + port_string, starboard_string = 'right', 'left' + inverted_xindex = True + bbox_kwargs = {'fc': 'w', 'alpha': .75, 'boxstyle': "round4"} + ann_kwargs = {'xycoords': 'axes fraction', + 'textcoords': 'offset points', + 'bbox': bbox_kwargs} + ax.annotate(upper_string, xy=(.5, 1), xytext=(0, -1), + ha='center', va='top', **ann_kwargs) + ax.annotate(lower_string, xy=(.5, 0), xytext=(0, 1), + ha='center', va='bottom', **ann_kwargs) + ax.annotate(port_string, xy=(0, .5), xytext=(1, 0), + ha='left', va='center', rotation=90, + **ann_kwargs) + ax.annotate(starboard_string, xy=(1, .5), xytext=(-1, 0), + ha='right', va='center', rotation=-90, + **ann_kwargs) + ax.set_title('origin: {origin}'.format(origin=origin)) + + # index labels + for index in ["[0, 0]", "[0, N']", "[M', 0]", "[M', N']"]: + tx, ty, halign = get_index_label_pos(index, extent, origin, + inverted_xindex) + facecolor = get_color(index, data, im.get_cmap()) + ax.text(tx, ty, index, color='white', ha=halign, va='center', + bbox={'boxstyle': 'square', 'facecolor': facecolor}) + if xlim: + ax.set_xlim(*xlim) + if ylim: + ax.set_ylim(*ylim) + + +def generate_imshow_demo_grid(extents, xlim=None, ylim=None): + N = len(extents) + fig = plt.figure(tight_layout=True) + fig.set_size_inches(6, N * (11.25) / 5) + gs = GridSpec(N, 5, figure=fig) + + columns = {'label': [fig.add_subplot(gs[j, 0]) for j in range(N)], + 'upper': [fig.add_subplot(gs[j, 1:3]) for j in range(N)], + 'lower': [fig.add_subplot(gs[j, 3:5]) for j in range(N)]} + x, y = np.ogrid[0:6, 0:7] + data = x + y + + for origin in ['upper', 'lower']: + for ax, extent in zip(columns[origin], extents): + plot_imshow_with_labels(ax, data, extent, origin, xlim, ylim) + + for ax, extent in zip(columns['label'], extents): + text_kwargs = {'ha': 'right', + 'va': 'center', + 'xycoords': 'axes fraction', + 'xy': (1, .5)} + if extent is None: + ax.annotate('None', **text_kwargs) + ax.set_title('extent=') + else: + left, right, bottom, top = extent + text = ('left: {left:0.1f}\nright: {right:0.1f}\n' + + 'bottom: {bottom:0.1f}\ntop: {top:0.1f}\n').format( + left=left, right=right, bottom=bottom, top=top) + + ax.annotate(text, **text_kwargs) + ax.axis('off') + return columns +``` + +## Default extent + +First, let's have a look at the default ``extent=None`` + +``` python +generate_imshow_demo_grid(extents=[None]) +``` + +![sphx_glr_imshow_extent_001](https://matplotlib.org/_images/sphx_glr_imshow_extent_001.png) + +Generally, for an array of shape (M, N), the first index runs along the +vertical, the second index runs along the horizontal. +The pixel centers are at integer positions ranging from 0 to ``N' = N - 1`` +horizontally and from 0 to ``M' = M - 1`` vertically. +*origin* determines how to the data is filled in the bounding box. + +For ``origin='lower'``: + +- [0, 0] is at (left, bottom) +- [M', 0] is at (left, top) +- [0, N'] is at (right, bottom) +- [M', N'] is at (right, top) + +``origin='upper'`` reverses the vertical axes direction and filling: + +- [0, 0] is at (left, top) +- [M', 0] is at (left, bottom) +- [0, N'] is at (right, top) +- [M', N'] is at (right, bottom) + +In summary, the position of the [0, 0] index as well as the extent are +influenced by *origin*: + + +--- + + + + + + +origin +[0, 0] position +extent + + + +upper +top left +(-0.5, numcols-0.5, numrows-0.5, -0.5) + +lower +bottom left +(-0.5, numcols-0.5, -0.5, numrows-0.5) + + + + +The default value of *origin* is set by ``rcParams["image.origin"] = 'upper'`` which defaults +to ``'upper'`` to match the matrix indexing conventions in math and +computer graphics image indexing conventions. + +## Explicit extent + +By setting *extent* we define the coordinates of the image area. The +underlying image data is interpolated/resampled to fill that area. + +If the axes is set to autoscale, then the view limits of the axes are set +to match the *extent* which ensures that the coordinate set by +``(left, bottom)`` is at the bottom left of the axes! However, this +may invert the axis so they do not increase in the 'natural' direction. + +``` python +extents = [(-0.5, 6.5, -0.5, 5.5), + (-0.5, 6.5, 5.5, -0.5), + (6.5, -0.5, -0.5, 5.5), + (6.5, -0.5, 5.5, -0.5)] + +columns = generate_imshow_demo_grid(extents) +set_extent_None_text(columns['upper'][1]) +set_extent_None_text(columns['lower'][0]) +``` + +![sphx_glr_imshow_extent_002](https://matplotlib.org/_images/sphx_glr_imshow_extent_002.png) + +## Explicit extent and axes limits + +If we fix the axes limits by explicitly setting ``set_xlim`` / ``set_ylim``, we +force a certain size and orientation of the axes. +This can decouple the 'left-right' and 'top-bottom' sense of the image from +the orientation on the screen. + +In the example below we have chosen the limits slightly larger than the +extent (note the white areas within the Axes). + +While we keep the extents as in the examples before, the coordinate (0, 0) +is now explicitly put at the bottom left and values increase to up and to +the right (from the viewer point of view). +We can see that: + +- The coordinate ``(left, bottom)`` anchors the image which then fills the +box going towards the ``(right, top)`` point in data space. +- The first column is always closest to the 'left'. +- *origin* controls if the first row is closest to 'top' or 'bottom'. +- The image may be inverted along either direction. +- The 'left-right' and 'top-bottom' sense of the image may be uncoupled from +the orientation on the screen. + +``` python +generate_imshow_demo_grid(extents=[None] + extents, + xlim=(-2, 8), ylim=(-1, 6)) +``` + +![sphx_glr_imshow_extent_003](https://matplotlib.org/_images/sphx_glr_imshow_extent_003.png) + +**Total running time of the script:** ( 0 minutes 2.056 seconds) + +## Download + +- [Download Python source code: imshow_extent.py](https://matplotlib.org/_downloads/1b073a3f2fab4eae80964340b65629bc/imshow_extent.py) +- [Download Jupyter notebook: imshow_extent.ipynb](https://matplotlib.org/_downloads/e7e77a6502f9e28a843cccc17c2dfd89/imshow_extent.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/intermediate/legend_guide.md b/Python/matplotlab/intermediate/legend_guide.md new file mode 100644 index 00000000..10db18ac --- /dev/null +++ b/Python/matplotlab/intermediate/legend_guide.md @@ -0,0 +1,314 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Legend guide + +Generating legends flexibly in Matplotlib. + +This legend guide is an extension of the documentation available at +[``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend) - please ensure you are familiar with +contents of that documentation before proceeding with this guide. + +This guide makes use of some common terms, which are documented here for clarity: + +legend entry + +legend key + +legend label + +legend handle + +## Controlling the legend entries + +Calling [``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend) with no arguments automatically fetches the legend +handles and their associated labels. This functionality is equivalent to: + +``` python +handles, labels = ax.get_legend_handles_labels() +ax.legend(handles, labels) +``` + +The [``get_legend_handles_labels()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.get_legend_handles_labels.html#matplotlib.axes.Axes.get_legend_handles_labels) function returns +a list of handles/artists which exist on the Axes which can be used to +generate entries for the resulting legend - it is worth noting however that +not all artists can be added to a legend, at which point a "proxy" will have +to be created (see [Creating artists specifically for adding to the legend (aka. Proxy artists)](#proxy-legend-handles) for further details). + +For full control of what is being added to the legend, it is common to pass +the appropriate handles directly to [``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend): + +``` python +line_up, = plt.plot([1,2,3], label='Line 2') +line_down, = plt.plot([3,2,1], label='Line 1') +plt.legend(handles=[line_up, line_down]) +``` + +In some cases, it is not possible to set the label of the handle, so it is +possible to pass through the list of labels to [``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend): + +``` python +line_up, = plt.plot([1,2,3], label='Line 2') +line_down, = plt.plot([3,2,1], label='Line 1') +plt.legend([line_up, line_down], ['Line Up', 'Line Down']) +``` + +## Creating artists specifically for adding to the legend (aka. Proxy artists) + +Not all handles can be turned into legend entries automatically, +so it is often necessary to create an artist which *can*. Legend handles +don't have to exists on the Figure or Axes in order to be used. + +Suppose we wanted to create a legend which has an entry for some data which +is represented by a red color: + +``` python +import matplotlib.patches as mpatches +import matplotlib.pyplot as plt + +red_patch = mpatches.Patch(color='red', label='The red data') +plt.legend(handles=[red_patch]) + +plt.show() +``` + +![sphx_glr_legend_guide_001](https://matplotlib.org/_images/sphx_glr_legend_guide_001.png) + +There are many supported legend handles, instead of creating a patch of color +we could have created a line with a marker: + +``` python +import matplotlib.lines as mlines + +blue_line = mlines.Line2D([], [], color='blue', marker='*', + markersize=15, label='Blue stars') +plt.legend(handles=[blue_line]) + +plt.show() +``` + +![sphx_glr_legend_guide_002](https://matplotlib.org/_images/sphx_glr_legend_guide_002.png) + +## Legend location + +The location of the legend can be specified by the keyword argument +*loc*. Please see the documentation at [``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend) for more details. + +The ``bbox_to_anchor`` keyword gives a great degree of control for manual +legend placement. For example, if you want your axes legend located at the +figure's top right-hand corner instead of the axes' corner, simply specify +the corner's location, and the coordinate system of that location: + +``` python +plt.legend(bbox_to_anchor=(1, 1), + bbox_transform=plt.gcf().transFigure) +``` + +More examples of custom legend placement: + +``` python +plt.subplot(211) +plt.plot([1, 2, 3], label="test1") +plt.plot([3, 2, 1], label="test2") + +# Place a legend above this subplot, expanding itself to +# fully use the given bounding box. +plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc='lower left', + ncol=2, mode="expand", borderaxespad=0.) + +plt.subplot(223) +plt.plot([1, 2, 3], label="test1") +plt.plot([3, 2, 1], label="test2") +# Place a legend to the right of this smaller subplot. +plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.) + +plt.show() +``` + +![sphx_glr_legend_guide_003](https://matplotlib.org/_images/sphx_glr_legend_guide_003.png) + +## Multiple legends on the same Axes + +Sometimes it is more clear to split legend entries across multiple +legends. Whilst the instinctive approach to doing this might be to call +the [``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend) function multiple times, you will find that only one +legend ever exists on the Axes. This has been done so that it is possible +to call [``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend) repeatedly to update the legend to the latest +handles on the Axes, so to persist old legend instances, we must add them +manually to the Axes: + +``` python +line1, = plt.plot([1, 2, 3], label="Line 1", linestyle='--') +line2, = plt.plot([3, 2, 1], label="Line 2", linewidth=4) + +# Create a legend for the first line. +first_legend = plt.legend(handles=[line1], loc='upper right') + +# Add the legend manually to the current Axes. +ax = plt.gca().add_artist(first_legend) + +# Create another legend for the second line. +plt.legend(handles=[line2], loc='lower right') + +plt.show() +``` + +![sphx_glr_legend_guide_004](https://matplotlib.org/_images/sphx_glr_legend_guide_004.png) + +## Legend Handlers + +In order to create legend entries, handles are given as an argument to an +appropriate [``HandlerBase``](https://matplotlib.orgapi/legend_handler_api.html#matplotlib.legend_handler.HandlerBase) subclass. +The choice of handler subclass is determined by the following rules: + +1. Update [``get_legend_handler_map()``](https://matplotlib.orgapi/legend_api.html#matplotlib.legend.Legend.get_legend_handler_map) +with the value in the ``handler_map`` keyword. +1. Check if the ``handle`` is in the newly created ``handler_map``. +1. Check if the type of ``handle`` is in the newly created +``handler_map``. +1. Check if any of the types in the ``handle``'s mro is in the newly +created ``handler_map``. + +For completeness, this logic is mostly implemented in +[``get_legend_handler()``](https://matplotlib.orgapi/legend_api.html#matplotlib.legend.Legend.get_legend_handler). + +All of this flexibility means that we have the necessary hooks to implement +custom handlers for our own type of legend key. + +The simplest example of using custom handlers is to instantiate one of the +existing [``HandlerBase``](https://matplotlib.orgapi/legend_handler_api.html#matplotlib.legend_handler.HandlerBase) subclasses. For the +sake of simplicity, let's choose [``matplotlib.legend_handler.HandlerLine2D``](https://matplotlib.orgapi/legend_handler_api.html#matplotlib.legend_handler.HandlerLine2D) +which accepts a ``numpoints`` argument (note numpoints is a keyword +on the [``legend()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend) function for convenience). We can then pass the mapping +of instance to Handler as a keyword to legend. + +``` python +from matplotlib.legend_handler import HandlerLine2D + +line1, = plt.plot([3, 2, 1], marker='o', label='Line 1') +line2, = plt.plot([1, 2, 3], marker='o', label='Line 2') + +plt.legend(handler_map={line1: HandlerLine2D(numpoints=4)}) +``` + +![sphx_glr_legend_guide_005](https://matplotlib.org/_images/sphx_glr_legend_guide_005.png) + +As you can see, "Line 1" now has 4 marker points, where "Line 2" has 2 (the +default). Try the above code, only change the map's key from ``line1`` to +``type(line1)``. Notice how now both [``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D) instances +get 4 markers. + +Along with handlers for complex plot types such as errorbars, stem plots +and histograms, the default ``handler_map`` has a special ``tuple`` handler +([``HandlerTuple``](https://matplotlib.orgapi/legend_handler_api.html#matplotlib.legend_handler.HandlerTuple)) which simply plots +the handles on top of one another for each item in the given tuple. The +following example demonstrates combining two legend keys on top of one another: + +``` python +from numpy.random import randn + +z = randn(10) + +red_dot, = plt.plot(z, "ro", markersize=15) +# Put a white cross over some of the data. +white_cross, = plt.plot(z[:5], "w+", markeredgewidth=3, markersize=15) + +plt.legend([red_dot, (red_dot, white_cross)], ["Attr A", "Attr A+B"]) +``` + +![sphx_glr_legend_guide_006](https://matplotlib.org/_images/sphx_glr_legend_guide_006.png) + +The [``HandlerTuple``](https://matplotlib.orgapi/legend_handler_api.html#matplotlib.legend_handler.HandlerTuple) class can also be used to +assign several legend keys to the same entry: + +``` python +from matplotlib.legend_handler import HandlerLine2D, HandlerTuple + +p1, = plt.plot([1, 2.5, 3], 'r-d') +p2, = plt.plot([3, 2, 1], 'k-o') + +l = plt.legend([(p1, p2)], ['Two keys'], numpoints=1, + handler_map={tuple: HandlerTuple(ndivide=None)}) +``` + +![sphx_glr_legend_guide_007](https://matplotlib.org/_images/sphx_glr_legend_guide_007.png) + +### Implementing a custom legend handler + +A custom handler can be implemented to turn any handle into a legend key (handles +don't necessarily need to be matplotlib artists). +The handler must implement a "legend_artist" method which returns a +single artist for the legend to use. Signature details about the "legend_artist" +are documented at [``legend_artist()``](https://matplotlib.orgapi/legend_handler_api.html#matplotlib.legend_handler.HandlerBase.legend_artist). + +``` python +import matplotlib.patches as mpatches + + +class AnyObject(object): + pass + + +class AnyObjectHandler(object): + def legend_artist(self, legend, orig_handle, fontsize, handlebox): + x0, y0 = handlebox.xdescent, handlebox.ydescent + width, height = handlebox.width, handlebox.height + patch = mpatches.Rectangle([x0, y0], width, height, facecolor='red', + edgecolor='black', hatch='xx', lw=3, + transform=handlebox.get_transform()) + handlebox.add_artist(patch) + return patch + + +plt.legend([AnyObject()], ['My first handler'], + handler_map={AnyObject: AnyObjectHandler()}) +``` + +![sphx_glr_legend_guide_008](https://matplotlib.org/_images/sphx_glr_legend_guide_008.png) + +Alternatively, had we wanted to globally accept ``AnyObject`` instances without +needing to manually set the ``handler_map`` keyword all the time, we could have +registered the new handler with: + +``` python +from matplotlib.legend import Legend +Legend.update_default_handler_map({AnyObject: AnyObjectHandler()}) +``` + +Whilst the power here is clear, remember that there are already many handlers +implemented and what you want to achieve may already be easily possible with +existing classes. For example, to produce elliptical legend keys, rather than +rectangular ones: + +``` python +from matplotlib.legend_handler import HandlerPatch + + +class HandlerEllipse(HandlerPatch): + def create_artists(self, legend, orig_handle, + xdescent, ydescent, width, height, fontsize, trans): + center = 0.5 * width - 0.5 * xdescent, 0.5 * height - 0.5 * ydescent + p = mpatches.Ellipse(xy=center, width=width + xdescent, + height=height + ydescent) + self.update_prop(p, orig_handle, legend) + p.set_transform(trans) + return [p] + + +c = mpatches.Circle((0.5, 0.5), 0.25, facecolor="green", + edgecolor="red", linewidth=3) +plt.gca().add_patch(c) + +plt.legend([c], ["An ellipse, not a rectangle"], + handler_map={mpatches.Circle: HandlerEllipse()}) +``` + +![sphx_glr_legend_guide_009](https://matplotlib.org/_images/sphx_glr_legend_guide_009.png) + +## Download + +- [Download Python source code: legend_guide.py](https://matplotlib.org/_downloads/65714cd51723c032709ddf40dd43e3cf/legend_guide.py) +- [Download Jupyter notebook: legend_guide.ipynb](https://matplotlib.org/_downloads/89b61becf2e0c701373e39916f7b5428/legend_guide.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/intermediate/tight_layout_guide.md b/Python/matplotlab/intermediate/tight_layout_guide.md new file mode 100644 index 00000000..c218c2cc --- /dev/null +++ b/Python/matplotlab/intermediate/tight_layout_guide.md @@ -0,0 +1,418 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Tight Layout guide + +How to use tight-layout to fit plots within your figure cleanly. + +*tight_layout* automatically adjusts subplot params so that the +subplot(s) fits in to the figure area. This is an experimental +feature and may not work for some cases. It only checks the extents +of ticklabels, axis labels, and titles. + +An alternative to *tight_layout* is [constrained_layout](constrainedlayout_guide.html). + +## Simple Example + +In matplotlib, the location of axes (including subplots) are specified in +normalized figure coordinates. It can happen that your axis labels or +titles (or sometimes even ticklabels) go outside the figure area, and are thus +clipped. + +``` python +# sphinx_gallery_thumbnail_number = 7 + +import matplotlib.pyplot as plt +import numpy as np + +plt.rcParams['savefig.facecolor'] = "0.8" + + +def example_plot(ax, fontsize=12): + ax.plot([1, 2]) + + ax.locator_params(nbins=3) + ax.set_xlabel('x-label', fontsize=fontsize) + ax.set_ylabel('y-label', fontsize=fontsize) + ax.set_title('Title', fontsize=fontsize) + +plt.close('all') +fig, ax = plt.subplots() +example_plot(ax, fontsize=24) +``` + +![sphx_glr_tight_layout_guide_001](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_001.png) + +To prevent this, the location of axes needs to be adjusted. For +subplots, this can be done by adjusting the subplot params +([Move the edge of an axes to make room for tick labels](https://matplotlib.orgfaq/howto_faq.html#howto-subplots-adjust)). Matplotlib v1.1 introduces a new +command [``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.tight_layout.html#matplotlib.pyplot.tight_layout) that does this +automatically for you. + +``` python +fig, ax = plt.subplots() +example_plot(ax, fontsize=24) +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_002](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_002.png) + +Note that [``matplotlib.pyplot.tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.tight_layout.html#matplotlib.pyplot.tight_layout) will only adjust the +subplot params when it is called. In order to perform this adjustment each +time the figure is redrawn, you can call ``fig.set_tight_layout(True)``, or, +equivalently, set the ``figure.autolayout`` rcParam to ``True``. + +When you have multiple subplots, often you see labels of different +axes overlapping each other. + +``` python +plt.close('all') + +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2) +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) +``` + +![sphx_glr_tight_layout_guide_003](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_003.png) + +[``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.tight_layout.html#matplotlib.pyplot.tight_layout) will also adjust spacing between +subplots to minimize the overlaps. + +``` python +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2) +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_004](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_004.png) + +[``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.tight_layout.html#matplotlib.pyplot.tight_layout) can take keyword arguments of +*pad*, *w_pad* and *h_pad*. These control the extra padding around the +figure border and between subplots. The pads are specified in fraction +of fontsize. + +``` python +fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2) +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) +plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0) +``` + +![sphx_glr_tight_layout_guide_005](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_005.png) + +[``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.tight_layout.html#matplotlib.pyplot.tight_layout) will work even if the sizes of +subplots are different as far as their grid specification is +compatible. In the example below, *ax1* and *ax2* are subplots of a 2x2 +grid, while *ax3* is of a 1x2 grid. + +``` python +plt.close('all') +fig = plt.figure() + +ax1 = plt.subplot(221) +ax2 = plt.subplot(223) +ax3 = plt.subplot(122) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) + +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_006](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_006.png) + +It works with subplots created with +[``subplot2grid()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplot2grid.html#matplotlib.pyplot.subplot2grid). In general, subplots created +from the gridspec ([Customizing Figure Layouts Using GridSpec and Other Functions](gridspec.html)) will work. + +``` python +plt.close('all') +fig = plt.figure() + +ax1 = plt.subplot2grid((3, 3), (0, 0)) +ax2 = plt.subplot2grid((3, 3), (0, 1), colspan=2) +ax3 = plt.subplot2grid((3, 3), (1, 0), colspan=2, rowspan=2) +ax4 = plt.subplot2grid((3, 3), (1, 2), rowspan=2) + +example_plot(ax1) +example_plot(ax2) +example_plot(ax3) +example_plot(ax4) + +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_007](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_007.png) + +Although not thoroughly tested, it seems to work for subplots with +aspect != "auto" (e.g., axes with images). + +``` python +arr = np.arange(100).reshape((10, 10)) + +plt.close('all') +fig = plt.figure(figsize=(5, 4)) + +ax = plt.subplot(111) +im = ax.imshow(arr, interpolation="none") + +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_008](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_008.png) + +## Caveats + +- [``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.tight_layout.html#matplotlib.pyplot.tight_layout) only considers ticklabels, axis +labels, and titles. Thus, other artists may be clipped and also may +overlap. +- It assumes that the extra space needed for ticklabels, axis labels, +and titles is independent of original location of axes. This is +often true, but there are rare cases where it is not. +- pad=0 clips some of the texts by a few pixels. This may be a bug or +a limitation of the current algorithm and it is not clear why it +happens. Meanwhile, use of pad at least larger than 0.3 is +recommended. + +## Use with GridSpec + +GridSpec has its own [``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec.tight_layout) method +(the pyplot api [``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.tight_layout.html#matplotlib.pyplot.tight_layout) also works). + +``` python +import matplotlib.gridspec as gridspec + +plt.close('all') +fig = plt.figure() + +gs1 = gridspec.GridSpec(2, 1) +ax1 = fig.add_subplot(gs1[0]) +ax2 = fig.add_subplot(gs1[1]) + +example_plot(ax1) +example_plot(ax2) + +gs1.tight_layout(fig) +``` + +![sphx_glr_tight_layout_guide_009](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_009.png) + +You may provide an optional *rect* parameter, which specifies the bounding box +that the subplots will be fit inside. The coordinates must be in normalized +figure coordinates and the default is (0, 0, 1, 1). + +``` python +fig = plt.figure() + +gs1 = gridspec.GridSpec(2, 1) +ax1 = fig.add_subplot(gs1[0]) +ax2 = fig.add_subplot(gs1[1]) + +example_plot(ax1) +example_plot(ax2) + +gs1.tight_layout(fig, rect=[0, 0, 0.5, 1]) +``` + +![sphx_glr_tight_layout_guide_010](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_010.png) + +For example, this can be used for a figure with multiple gridspecs. + +``` python +fig = plt.figure() + +gs1 = gridspec.GridSpec(2, 1) +ax1 = fig.add_subplot(gs1[0]) +ax2 = fig.add_subplot(gs1[1]) + +example_plot(ax1) +example_plot(ax2) + +gs1.tight_layout(fig, rect=[0, 0, 0.5, 1]) + +gs2 = gridspec.GridSpec(3, 1) + +for ss in gs2: + ax = fig.add_subplot(ss) + example_plot(ax) + ax.set_title("") + ax.set_xlabel("") + +ax.set_xlabel("x-label", fontsize=12) + +gs2.tight_layout(fig, rect=[0.5, 0, 1, 1], h_pad=0.5) + +# We may try to match the top and bottom of two grids :: +top = min(gs1.top, gs2.top) +bottom = max(gs1.bottom, gs2.bottom) + +gs1.update(top=top, bottom=bottom) +gs2.update(top=top, bottom=bottom) +plt.show() +``` + +![sphx_glr_tight_layout_guide_011](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_011.png) + +While this should be mostly good enough, adjusting top and bottom +may require adjustment of hspace also. To update hspace & vspace, we +call [``tight_layout()``](https://matplotlib.orgapi/_as_gen/matplotlib.gridspec.GridSpec.html#matplotlib.gridspec.GridSpec.tight_layout) again with updated +rect argument. Note that the rect argument specifies the area including the +ticklabels, etc. Thus, we will increase the bottom (which is 0 for the normal +case) by the difference between the *bottom* from above and the bottom of each +gridspec. Same thing for the top. + +``` python +fig = plt.gcf() + +gs1 = gridspec.GridSpec(2, 1) +ax1 = fig.add_subplot(gs1[0]) +ax2 = fig.add_subplot(gs1[1]) + +example_plot(ax1) +example_plot(ax2) + +gs1.tight_layout(fig, rect=[0, 0, 0.5, 1]) + +gs2 = gridspec.GridSpec(3, 1) + +for ss in gs2: + ax = fig.add_subplot(ss) + example_plot(ax) + ax.set_title("") + ax.set_xlabel("") + +ax.set_xlabel("x-label", fontsize=12) + +gs2.tight_layout(fig, rect=[0.5, 0, 1, 1], h_pad=0.5) + +top = min(gs1.top, gs2.top) +bottom = max(gs1.bottom, gs2.bottom) + +gs1.update(top=top, bottom=bottom) +gs2.update(top=top, bottom=bottom) + +top = min(gs1.top, gs2.top) +bottom = max(gs1.bottom, gs2.bottom) + +gs1.tight_layout(fig, rect=[None, 0 + (bottom-gs1.bottom), + 0.5, 1 - (gs1.top-top)]) +gs2.tight_layout(fig, rect=[0.5, 0 + (bottom-gs2.bottom), + None, 1 - (gs2.top-top)], + h_pad=0.5) +``` + +![sphx_glr_tight_layout_guide_012](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_012.png) + +## Legends and Annotations + +Pre Matplotlib 2.2, legends and annotations were excluded from the bounding +box calculations that decide the layout. Subsequently these artists were +added to the calculation, but sometimes it is undesirable to include them. +For instance in this case it might be good to have the axes shring a bit +to make room for the legend: + +``` python +fig, ax = plt.subplots(figsize=(4, 3)) +lines = ax.plot(range(10), label='A simple plot') +ax.legend(bbox_to_anchor=(0.7, 0.5), loc='center left',) +fig.tight_layout() +plt.show() +``` + +![sphx_glr_tight_layout_guide_013](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_013.png) + +However, sometimes this is not desired (quite often when using +``fig.savefig('outname.png', bbox_inches='tight')``). In order to +remove the legend from the bounding box calculation, we simply set its +bounding ``leg.set_in_layout(False)`` and the legend will be ignored. + +``` python +fig, ax = plt.subplots(figsize=(4, 3)) +lines = ax.plot(range(10), label='B simple plot') +leg = ax.legend(bbox_to_anchor=(0.7, 0.5), loc='center left',) +leg.set_in_layout(False) +fig.tight_layout() +plt.show() +``` + +![sphx_glr_tight_layout_guide_014](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_014.png) + +## Use with AxesGrid1 + +While limited, the axes_grid1 toolkit is also supported. + +``` python +from mpl_toolkits.axes_grid1 import Grid + +plt.close('all') +fig = plt.figure() +grid = Grid(fig, rect=111, nrows_ncols=(2, 2), + axes_pad=0.25, label_mode='L', + ) + +for ax in grid: + example_plot(ax) +ax.title.set_visible(False) + +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_015](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_015.png) + +## Colorbar + +If you create a colorbar with the [``colorbar()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.colorbar.html#matplotlib.pyplot.colorbar) +command, the created colorbar is an instance of Axes, *not* Subplot, so +tight_layout does not work. With Matplotlib v1.1, you may create a +colorbar as a subplot using the gridspec. + +``` python +plt.close('all') +arr = np.arange(100).reshape((10, 10)) +fig = plt.figure(figsize=(4, 4)) +im = plt.imshow(arr, interpolation="none") + +plt.colorbar(im, use_gridspec=True) + +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_016](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_016.png) + +Another option is to use AxesGrid1 toolkit to +explicitly create an axes for colorbar. + +``` python +from mpl_toolkits.axes_grid1 import make_axes_locatable + +plt.close('all') +arr = np.arange(100).reshape((10, 10)) +fig = plt.figure(figsize=(4, 4)) +im = plt.imshow(arr, interpolation="none") + +divider = make_axes_locatable(plt.gca()) +cax = divider.append_axes("right", "5%", pad="3%") +plt.colorbar(im, cax=cax) + +plt.tight_layout() +``` + +![sphx_glr_tight_layout_guide_017](https://matplotlib.org/_images/sphx_glr_tight_layout_guide_017.png) + +**Total running time of the script:** ( 0 minutes 3.417 seconds) + +## Download + +- [Download Python source code: tight_layout_guide.py](https://matplotlib.org/_downloads/08204f760ca1d178acca434333c21c5c/tight_layout_guide.py) +- [Download Jupyter notebook: tight_layout_guide.ipynb](https://matplotlib.org/_downloads/967ce4dd04ce9628b993a5a4e402046a/tight_layout_guide.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/introductory/customizing.md b/Python/matplotlab/introductory/customizing.md new file mode 100644 index 00000000..02a9ddfc --- /dev/null +++ b/Python/matplotlab/introductory/customizing.md @@ -0,0 +1,794 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Customizing Matplotlib with style sheets and rcParams + +Tips for customizing the properties and default styles of Matplotlib. + +## Using style sheets + +The ``style`` package adds support for easy-to-switch plotting "styles" with +the same parameters as a +[matplotlib rc](#customizing-with-matplotlibrc-files) file (which is read +at startup to configure matplotlib). + +There are a number of pre-defined styles [provided by Matplotlib](https://github.com/matplotlib/matplotlib/tree/master/lib/matplotlib/mpl-data/stylelib). For +example, there's a pre-defined style called "ggplot", which emulates the +aesthetics of [ggplot](https://ggplot2.tidyverse.org/) (a popular plotting package for [R](https://www.r-project.org/)). To use this style, +just add: + +``` python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib as mpl +plt.style.use('ggplot') +data = np.random.randn(50) +``` + +To list all available styles, use: + +``` python +print(plt.style.available) +``` + +Out: + +``` +['seaborn-dark', 'dark_background', 'seaborn-pastel', 'seaborn-colorblind', 'tableau-colorblind10', 'seaborn-notebook', 'seaborn-dark-palette', 'grayscale', 'seaborn-poster', 'seaborn', 'bmh', 'seaborn-talk', 'seaborn-ticks', '_classic_test', 'ggplot', 'seaborn-white', 'classic', 'Solarize_Light2', 'seaborn-paper', 'fast', 'fivethirtyeight', 'seaborn-muted', 'seaborn-whitegrid', 'seaborn-darkgrid', 'seaborn-bright', 'seaborn-deep'] +``` + +## Defining your own style + +You can create custom styles and use them by calling ``style.use`` with the +path or URL to the style sheet. Additionally, if you add your +``.mplstyle`` file to ``mpl_configdir/stylelib``, you can reuse +your custom style sheet with a call to ``style.use()``. By default +``mpl_configdir`` should be ``~/.config/matplotlib``, but you can check where +yours is with ``matplotlib.get_configdir()``; you may need to create this +directory. You also can change the directory where matplotlib looks for +the stylelib/ folder by setting the MPLCONFIGDIR environment variable, +see [matplotlib configuration and cache directory locations](https://matplotlib.orgfaq/troubleshooting_faq.html#locating-matplotlib-config-dir). + +Note that a custom style sheet in ``mpl_configdir/stylelib`` will +override a style sheet defined by matplotlib if the styles have the same name. + +For example, you might want to create +``mpl_configdir/stylelib/presentation.mplstyle`` with the following: + +``` python +axes.titlesize : 24 +axes.labelsize : 20 +lines.linewidth : 3 +lines.markersize : 10 +xtick.labelsize : 16 +ytick.labelsize : 16 +``` + +Then, when you want to adapt a plot designed for a paper to one that looks +good in a presentation, you can just add: + +``` python +>>> import matplotlib.pyplot as plt +>>> plt.style.use('presentation') +``` + +## Composing styles + +Style sheets are designed to be composed together. So you can have a style +sheet that customizes colors and a separate style sheet that alters element +sizes for presentations. These styles can easily be combined by passing +a list of styles: + +``` python +>>> import matplotlib.pyplot as plt +>>> plt.style.use(['dark_background', 'presentation']) +``` + +Note that styles further to the right will overwrite values that are already +defined by styles on the left. + +## Temporary styling + +If you only want to use a style for a specific block of code but don't want +to change the global styling, the style package provides a context manager +for limiting your changes to a specific scope. To isolate your styling +changes, you can write something like the following: + +``` python +with plt.style.context('dark_background'): + plt.plot(np.sin(np.linspace(0, 2 * np.pi)), 'r-o') +plt.show() +``` + +![sphx_glr_customizing_001](https://matplotlib.org/_images/sphx_glr_customizing_001.png) +# matplotlib rcParams + +## Dynamic rc settings + +You can also dynamically change the default rc settings in a python script or +interactively from the python shell. All of the rc settings are stored in a +dictionary-like variable called [``matplotlib.rcParams``](https://matplotlib.orgapi/matplotlib_configuration_api.html#matplotlib.rcParams), which is global to +the matplotlib package. rcParams can be modified directly, for example: + +``` python +mpl.rcParams['lines.linewidth'] = 2 +mpl.rcParams['lines.color'] = 'r' +plt.plot(data) +``` + +![sphx_glr_customizing_002](https://matplotlib.org/_images/sphx_glr_customizing_002.png) + +Matplotlib also provides a couple of convenience functions for modifying rc +settings. The [``matplotlib.rc()``](https://matplotlib.orgapi/matplotlib_configuration_api.html#matplotlib.rc) command can be used to modify multiple +settings in a single group at once, using keyword arguments: + +``` python +mpl.rc('lines', linewidth=4, color='g') +plt.plot(data) +``` + +![sphx_glr_customizing_003](https://matplotlib.org/_images/sphx_glr_customizing_003.png) + +The [``matplotlib.rcdefaults()``](https://matplotlib.orgapi/matplotlib_configuration_api.html#matplotlib.rcdefaults) command will restore the standard matplotlib +default settings. + +There is some degree of validation when setting the values of rcParams, see +[``matplotlib.rcsetup``](https://matplotlib.orgapi/rcsetup_api.html#module-matplotlib.rcsetup) for details. + +## The ``matplotlibrc`` file + +matplotlib uses ``matplotlibrc`` configuration files to customize all kinds +of properties, which we call ``rc settings`` or ``rc parameters``. You can control +the defaults of almost every property in matplotlib: figure size and dpi, line +width, color and style, axes, axis and grid properties, text and font +properties and so on. matplotlib looks for ``matplotlibrc`` in four +locations, in the following order: + +Once a ``matplotlibrc`` file has been found, it will *not* search any of +the other paths. + +To display where the currently active ``matplotlibrc`` file was +loaded from, one can do the following: + +``` python +>>> import matplotlib +>>> matplotlib.matplotlib_fname() +'/home/foo/.config/matplotlib/matplotlibrc' +``` + +See below for a sample [matplotlibrc file](#matplotlibrc-sample). + +### A sample matplotlibrc file + +``` python +#### MATPLOTLIBRC FORMAT + +## This is a sample matplotlib configuration file - you can find a copy +## of it on your system in +## site-packages/matplotlib/mpl-data/matplotlibrc. If you edit it +## there, please note that it will be overwritten in your next install. +## If you want to keep a permanent local copy that will not be +## overwritten, place it in the following location: +## unix/linux: +## $HOME/.config/matplotlib/matplotlibrc or +## $XDG_CONFIG_HOME/matplotlib/matplotlibrc (if $XDG_CONFIG_HOME is set) +## other platforms: +## $HOME/.matplotlib/matplotlibrc +## +## See http://matplotlib.org/users/customizing.html#the-matplotlibrc-file for +## more details on the paths which are checked for the configuration file. +## +## This file is best viewed in a editor which supports python mode +## syntax highlighting. Blank lines, or lines starting with a comment +## symbol, are ignored, as are trailing comments. Other lines must +## have the format +## key : val ## optional comment +## +## Colors: for the color values below, you can either use - a +## matplotlib color string, such as r, k, or b - an rgb tuple, such as +## (1.0, 0.5, 0.0) - a hex string, such as ff00ff - a scalar +## grayscale intensity such as 0.75 - a legal html color name, e.g., red, +## blue, darkslategray + +##### CONFIGURATION BEGINS HERE + +## The default backend. If you omit this parameter, the first +## working backend from the following list is used: +## MacOSX Qt5Agg Qt4Agg Gtk3Agg TkAgg WxAgg Agg +## +## Other choices include: +## Qt5Cairo Qt4Cairo GTK3Cairo TkCairo WxCairo Cairo Wx PS PDF SVG Template. +## +## You can also deploy your own backend outside of matplotlib by +## referring to the module name (which must be in the PYTHONPATH) as +## 'module://my_backend'. +#backend : Agg + +## Note that this can be overridden by the environment variable +## QT_API used by Enthought Tool Suite (ETS); valid values are +## "pyqt" and "pyside". The "pyqt" setting has the side effect of +## forcing the use of Version 2 API for QString and QVariant. + +## The port to use for the web server in the WebAgg backend. +#webagg.port : 8988 + +## The address on which the WebAgg web server should be reachable +#webagg.address : 127.0.0.1 + +## If webagg.port is unavailable, a number of other random ports will +## be tried until one that is available is found. +#webagg.port_retries : 50 + +## When True, open the webbrowser to the plot that is shown +#webagg.open_in_browser : True + +## if you are running pyplot inside a GUI and your backend choice +## conflicts, we will automatically try to find a compatible one for +## you if backend_fallback is True +#backend_fallback: True + +#interactive : False +#toolbar : toolbar2 ## None | toolbar2 ("classic" is deprecated) +#timezone : UTC ## a pytz timezone string, e.g., US/Central or Europe/Paris + +## Where your matplotlib data lives if you installed to a non-default +## location. This is where the matplotlib fonts, bitmaps, etc reside +#datapath : /home/jdhunter/mpldata + + +#### LINES +## See http://matplotlib.org/api/artist_api.html#module-matplotlib.lines for more +## information on line properties. +#lines.linewidth : 1.5 ## line width in points +#lines.linestyle : - ## solid line +#lines.color : C0 ## has no affect on plot(); see axes.prop_cycle +#lines.marker : None ## the default marker +#lines.markerfacecolor : auto ## the default markerfacecolor +#lines.markeredgecolor : auto ## the default markeredgecolor +#lines.markeredgewidth : 1.0 ## the line width around the marker symbol +#lines.markersize : 6 ## markersize, in points +#lines.dash_joinstyle : round ## miter|round|bevel +#lines.dash_capstyle : butt ## butt|round|projecting +#lines.solid_joinstyle : round ## miter|round|bevel +#lines.solid_capstyle : projecting ## butt|round|projecting +#lines.antialiased : True ## render lines in antialiased (no jaggies) + +## The three standard dash patterns. These are scaled by the linewidth. +#lines.dashed_pattern : 3.7, 1.6 +#lines.dashdot_pattern : 6.4, 1.6, 1, 1.6 +#lines.dotted_pattern : 1, 1.65 +#lines.scale_dashes : True + +#markers.fillstyle: full ## full|left|right|bottom|top|none + +#### PATCHES +## Patches are graphical objects that fill 2D space, like polygons or +## circles. See +## http://matplotlib.org/api/artist_api.html#module-matplotlib.patches +## information on patch properties +#patch.linewidth : 1 ## edge width in points. +#patch.facecolor : C0 +#patch.edgecolor : black ## if forced, or patch is not filled +#patch.force_edgecolor : False ## True to always use edgecolor +#patch.antialiased : True ## render patches in antialiased (no jaggies) + +#### HATCHES +#hatch.color : black +#hatch.linewidth : 1.0 + +#### Boxplot +#boxplot.notch : False +#boxplot.vertical : True +#boxplot.whiskers : 1.5 +#boxplot.bootstrap : None +#boxplot.patchartist : False +#boxplot.showmeans : False +#boxplot.showcaps : True +#boxplot.showbox : True +#boxplot.showfliers : True +#boxplot.meanline : False + +#boxplot.flierprops.color : black +#boxplot.flierprops.marker : o +#boxplot.flierprops.markerfacecolor : none +#boxplot.flierprops.markeredgecolor : black +#boxplot.flierprops.markeredgewidth : 1.0 +#boxplot.flierprops.markersize : 6 +#boxplot.flierprops.linestyle : none +#boxplot.flierprops.linewidth : 1.0 + +#boxplot.boxprops.color : black +#boxplot.boxprops.linewidth : 1.0 +#boxplot.boxprops.linestyle : - + +#boxplot.whiskerprops.color : black +#boxplot.whiskerprops.linewidth : 1.0 +#boxplot.whiskerprops.linestyle : - + +#boxplot.capprops.color : black +#boxplot.capprops.linewidth : 1.0 +#boxplot.capprops.linestyle : - + +#boxplot.medianprops.color : C1 +#boxplot.medianprops.linewidth : 1.0 +#boxplot.medianprops.linestyle : - + +#boxplot.meanprops.color : C2 +#boxplot.meanprops.marker : ^ +#boxplot.meanprops.markerfacecolor : C2 +#boxplot.meanprops.markeredgecolor : C2 +#boxplot.meanprops.markersize : 6 +#boxplot.meanprops.linestyle : -- +#boxplot.meanprops.linewidth : 1.0 + + +#### FONT + +## font properties used by text.Text. See +## http://matplotlib.org/api/font_manager_api.html for more +## information on font properties. The 6 font properties used for font +## matching are given below with their default values. +## +## The font.family property has five values: 'serif' (e.g., Times), +## 'sans-serif' (e.g., Helvetica), 'cursive' (e.g., Zapf-Chancery), +## 'fantasy' (e.g., Western), and 'monospace' (e.g., Courier). Each of +## these font families has a default list of font names in decreasing +## order of priority associated with them. When text.usetex is False, +## font.family may also be one or more concrete font names. +## +## The font.style property has three values: normal (or roman), italic +## or oblique. The oblique style will be used for italic, if it is not +## present. +## +## The font.variant property has two values: normal or small-caps. For +## TrueType fonts, which are scalable fonts, small-caps is equivalent +## to using a font size of 'smaller', or about 83%% of the current font +## size. +## +## The font.weight property has effectively 13 values: normal, bold, +## bolder, lighter, 100, 200, 300, ..., 900. Normal is the same as +## 400, and bold is 700. bolder and lighter are relative values with +## respect to the current weight. +## +## The font.stretch property has 11 values: ultra-condensed, +## extra-condensed, condensed, semi-condensed, normal, semi-expanded, +## expanded, extra-expanded, ultra-expanded, wider, and narrower. This +## property is not currently implemented. +## +## The font.size property is the default font size for text, given in pts. +## 10 pt is the standard value. + +#font.family : sans-serif +#font.style : normal +#font.variant : normal +#font.weight : normal +#font.stretch : normal +## note that font.size controls default text sizes. To configure +## special text sizes tick labels, axes, labels, title, etc, see the rc +## settings for axes and ticks. Special text sizes can be defined +## relative to font.size, using the following values: xx-small, x-small, +## small, medium, large, x-large, xx-large, larger, or smaller +#font.size : 10.0 +#font.serif : DejaVu Serif, Bitstream Vera Serif, Computer Modern Roman, New Century Schoolbook, Century Schoolbook L, Utopia, ITC Bookman, Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif +#font.sans-serif : DejaVu Sans, Bitstream Vera Sans, Computer Modern Sans Serif, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif +#font.cursive : Apple Chancery, Textile, Zapf Chancery, Sand, Script MT, Felipa, cursive +#font.fantasy : Comic Sans MS, Chicago, Charcoal, ImpactWestern, Humor Sans, xkcd, fantasy +#font.monospace : DejaVu Sans Mono, Bitstream Vera Sans Mono, Computer Modern Typewriter, Andale Mono, Nimbus Mono L, Courier New, Courier, Fixed, Terminal, monospace + +#### TEXT +## text properties used by text.Text. See +## http://matplotlib.org/api/artist_api.html#module-matplotlib.text for more +## information on text properties +#text.color : black + +#### LaTeX customizations. See http://wiki.scipy.org/Cookbook/Matplotlib/UsingTex +#text.usetex : False ## use latex for all text handling. The following fonts + ## are supported through the usual rc parameter settings: + ## new century schoolbook, bookman, times, palatino, + ## zapf chancery, charter, serif, sans-serif, helvetica, + ## avant garde, courier, monospace, computer modern roman, + ## computer modern sans serif, computer modern typewriter + ## If another font is desired which can loaded using the + ## LaTeX \usepackage command, please inquire at the + ## matplotlib mailing list +#text.latex.preamble : ## IMPROPER USE OF THIS FEATURE WILL LEAD TO LATEX FAILURES + ## AND IS THEREFORE UNSUPPORTED. PLEASE DO NOT ASK FOR HELP + ## IF THIS FEATURE DOES NOT DO WHAT YOU EXPECT IT TO. + ## text.latex.preamble is a single line of LaTeX code that + ## will be passed on to the LaTeX system. It may contain + ## any code that is valid for the LaTeX "preamble", i.e. + ## between the "\documentclass" and "\begin{document}" + ## statements. + ## Note that it has to be put on a single line, which may + ## become quite long. + ## The following packages are always loaded with usetex, so + ## beware of package collisions: color, geometry, graphicx, + ## type1cm, textcomp. + ## Adobe Postscript (PSSNFS) font packages may also be + ## loaded, depending on your font settings. +#text.latex.preview : False + +#text.hinting : auto ## May be one of the following: + ## none: Perform no hinting + ## auto: Use FreeType's autohinter + ## native: Use the hinting information in the + # font file, if available, and if your + # FreeType library supports it + ## either: Use the native hinting information, + # or the autohinter if none is available. + ## For backward compatibility, this value may also be + ## True === 'auto' or False === 'none'. +#text.hinting_factor : 8 ## Specifies the amount of softness for hinting in the + ## horizontal direction. A value of 1 will hint to full + ## pixels. A value of 2 will hint to half pixels etc. +#text.antialiased : True ## If True (default), the text will be antialiased. + ## This only affects the Agg backend. + +## The following settings allow you to select the fonts in math mode. +## They map from a TeX font name to a fontconfig font pattern. +## These settings are only used if mathtext.fontset is 'custom'. +## Note that this "custom" mode is unsupported and may go away in the +## future. +#mathtext.cal : cursive +#mathtext.rm : sans +#mathtext.tt : monospace +#mathtext.it : sans:italic +#mathtext.bf : sans:bold +#mathtext.sf : sans +#mathtext.fontset : dejavusans ## Should be 'dejavusans' (default), + ## 'dejavuserif', 'cm' (Computer Modern), 'stix', + ## 'stixsans' or 'custom' +#mathtext.fallback_to_cm : True ## When True, use symbols from the Computer Modern + ## fonts when a symbol can not be found in one of + ## the custom math fonts. +#mathtext.default : it ## The default font to use for math. + ## Can be any of the LaTeX font names, including + ## the special name "regular" for the same font + ## used in regular text. + +#### AXES +## default face and edge color, default tick sizes, +## default fontsizes for ticklabels, and so on. See +## http://matplotlib.org/api/axes_api.html#module-matplotlib.axes +#axes.facecolor : white ## axes background color +#axes.edgecolor : black ## axes edge color +#axes.linewidth : 0.8 ## edge linewidth +#axes.grid : False ## display grid or not +#axes.grid.axis : both ## which axis the grid should apply to +#axes.grid.which : major ## gridlines at major, minor or both ticks +#axes.titlesize : large ## fontsize of the axes title +#axes.titleweight : normal ## font weight of title +#axes.titlepad : 6.0 ## pad between axes and title in points +#axes.labelsize : medium ## fontsize of the x any y labels +#axes.labelpad : 4.0 ## space between label and axis +#axes.labelweight : normal ## weight of the x and y labels +#axes.labelcolor : black +#axes.axisbelow : line ## draw axis gridlines and ticks below + ## patches (True); above patches but below + ## lines ('line'); or above all (False) +#axes.formatter.limits : -7, 7 ## use scientific notation if log10 + ## of the axis range is smaller than the + ## first or larger than the second +#axes.formatter.use_locale : False ## When True, format tick labels + ## according to the user's locale. + ## For example, use ',' as a decimal + ## separator in the fr_FR locale. +#axes.formatter.use_mathtext : False ## When True, use mathtext for scientific + ## notation. +#axes.formatter.min_exponent: 0 ## minimum exponent to format in scientific notation +#axes.formatter.useoffset : True ## If True, the tick label formatter + ## will default to labeling ticks relative + ## to an offset when the data range is + ## small compared to the minimum absolute + ## value of the data. +#axes.formatter.offset_threshold : 4 ## When useoffset is True, the offset + ## will be used when it can remove + ## at least this number of significant + ## digits from tick labels. +#axes.spines.left : True ## display axis spines +#axes.spines.bottom : True +#axes.spines.top : True +#axes.spines.right : True +#axes.unicode_minus : True ## use unicode for the minus symbol + ## rather than hyphen. See + ## http://en.wikipedia.org/wiki/Plus_and_minus_signs#Character_codes +#axes.prop_cycle : cycler('color', ['1f77b4', 'ff7f0e', '2ca02c', 'd62728', '9467bd', '8c564b', 'e377c2', '7f7f7f', 'bcbd22', '17becf']) + ## color cycle for plot lines as list of string + ## colorspecs: single letter, long name, or web-style hex + ## Note the use of string escapes here ('1f77b4', instead of 1f77b4) + ## as opposed to the rest of this file. +#axes.autolimit_mode : data ## How to scale axes limits to the data. + ## Use "data" to use data limits, plus some margin + ## Use "round_number" move to the nearest "round" number +#axes.xmargin : .05 ## x margin. See `axes.Axes.margins` +#axes.ymargin : .05 ## y margin See `axes.Axes.margins` +#polaraxes.grid : True ## display grid on polar axes +#axes3d.grid : True ## display grid on 3d axes + +#### DATES +## These control the default format strings used in AutoDateFormatter. +## Any valid format datetime format string can be used (see the python +## `datetime` for details). For example using '%%x' will use the locale date representation +## '%%X' will use the locale time representation and '%%c' will use the full locale datetime +## representation. +## These values map to the scales: +## {'year': 365, 'month': 30, 'day': 1, 'hour': 1/24, 'minute': 1 / (24 * 60)} + +#date.autoformatter.year : %Y +#date.autoformatter.month : %Y-%m +#date.autoformatter.day : %Y-%m-%d +#date.autoformatter.hour : %m-%d %H +#date.autoformatter.minute : %d %H:%M +#date.autoformatter.second : %H:%M:%S +#date.autoformatter.microsecond : %M:%S.%f + +#### TICKS +## see http://matplotlib.org/api/axis_api.html#matplotlib.axis.Tick +#xtick.top : False ## draw ticks on the top side +#xtick.bottom : True ## draw ticks on the bottom side +#xtick.labeltop : False ## draw label on the top +#xtick.labelbottom : True ## draw label on the bottom +#xtick.major.size : 3.5 ## major tick size in points +#xtick.minor.size : 2 ## minor tick size in points +#xtick.major.width : 0.8 ## major tick width in points +#xtick.minor.width : 0.6 ## minor tick width in points +#xtick.major.pad : 3.5 ## distance to major tick label in points +#xtick.minor.pad : 3.4 ## distance to the minor tick label in points +#xtick.color : black ## color of the tick labels +#xtick.labelsize : medium ## fontsize of the tick labels +#xtick.direction : out ## direction: in, out, or inout +#xtick.minor.visible : False ## visibility of minor ticks on x-axis +#xtick.major.top : True ## draw x axis top major ticks +#xtick.major.bottom : True ## draw x axis bottom major ticks +#xtick.minor.top : True ## draw x axis top minor ticks +#xtick.minor.bottom : True ## draw x axis bottom minor ticks +#xtick.alignment : center ## alignment of xticks + +#ytick.left : True ## draw ticks on the left side +#ytick.right : False ## draw ticks on the right side +#ytick.labelleft : True ## draw tick labels on the left side +#ytick.labelright : False ## draw tick labels on the right side +#ytick.major.size : 3.5 ## major tick size in points +#ytick.minor.size : 2 ## minor tick size in points +#ytick.major.width : 0.8 ## major tick width in points +#ytick.minor.width : 0.6 ## minor tick width in points +#ytick.major.pad : 3.5 ## distance to major tick label in points +#ytick.minor.pad : 3.4 ## distance to the minor tick label in points +#ytick.color : black ## color of the tick labels +#ytick.labelsize : medium ## fontsize of the tick labels +#ytick.direction : out ## direction: in, out, or inout +#ytick.minor.visible : False ## visibility of minor ticks on y-axis +#ytick.major.left : True ## draw y axis left major ticks +#ytick.major.right : True ## draw y axis right major ticks +#ytick.minor.left : True ## draw y axis left minor ticks +#ytick.minor.right : True ## draw y axis right minor ticks +#ytick.alignment : center_baseline ## alignment of yticks + +#### GRIDS +#grid.color : b0b0b0 ## grid color +#grid.linestyle : - ## solid +#grid.linewidth : 0.8 ## in points +#grid.alpha : 1.0 ## transparency, between 0.0 and 1.0 + +#### Legend +#legend.loc : best +#legend.frameon : True ## if True, draw the legend on a background patch +#legend.framealpha : 0.8 ## legend patch transparency +#legend.facecolor : inherit ## inherit from axes.facecolor; or color spec +#legend.edgecolor : 0.8 ## background patch boundary color +#legend.fancybox : True ## if True, use a rounded box for the + ## legend background, else a rectangle +#legend.shadow : False ## if True, give background a shadow effect +#legend.numpoints : 1 ## the number of marker points in the legend line +#legend.scatterpoints : 1 ## number of scatter points +#legend.markerscale : 1.0 ## the relative size of legend markers vs. original +#legend.fontsize : medium +#legend.title_fontsize : None ## None sets to the same as the default axes. +## Dimensions as fraction of fontsize: +#legend.borderpad : 0.4 ## border whitespace +#legend.labelspacing : 0.5 ## the vertical space between the legend entries +#legend.handlelength : 2.0 ## the length of the legend lines +#legend.handleheight : 0.7 ## the height of the legend handle +#legend.handletextpad : 0.8 ## the space between the legend line and legend text +#legend.borderaxespad : 0.5 ## the border between the axes and legend edge +#legend.columnspacing : 2.0 ## column separation + +#### FIGURE +## See http://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure +#figure.titlesize : large ## size of the figure title (Figure.suptitle()) +#figure.titleweight : normal ## weight of the figure title +#figure.figsize : 6.4, 4.8 ## figure size in inches +#figure.dpi : 100 ## figure dots per inch +#figure.facecolor : white ## figure facecolor +#figure.edgecolor : white ## figure edgecolor +#figure.frameon : True ## enable figure frame +#figure.max_open_warning : 20 ## The maximum number of figures to open through + ## the pyplot interface before emitting a warning. + ## If less than one this feature is disabled. +## The figure subplot parameters. All dimensions are a fraction of the +#figure.subplot.left : 0.125 ## the left side of the subplots of the figure +#figure.subplot.right : 0.9 ## the right side of the subplots of the figure +#figure.subplot.bottom : 0.11 ## the bottom of the subplots of the figure +#figure.subplot.top : 0.88 ## the top of the subplots of the figure +#figure.subplot.wspace : 0.2 ## the amount of width reserved for space between subplots, + ## expressed as a fraction of the average axis width +#figure.subplot.hspace : 0.2 ## the amount of height reserved for space between subplots, + ## expressed as a fraction of the average axis height + +## Figure layout +#figure.autolayout : False ## When True, automatically adjust subplot + ## parameters to make the plot fit the figure + ## using `tight_layout` +#figure.constrained_layout.use: False ## When True, automatically make plot + ## elements fit on the figure. (Not compatible + ## with `autolayout`, above). +#figure.constrained_layout.h_pad : 0.04167 ## Padding around axes objects. Float representing +#figure.constrained_layout.w_pad : 0.04167 ## inches. Default is 3./72. inches (3 pts) +#figure.constrained_layout.hspace : 0.02 ## Space between subplot groups. Float representing +#figure.constrained_layout.wspace : 0.02 ## a fraction of the subplot widths being separated. + +#### IMAGES +#image.aspect : equal ## equal | auto | a number +#image.interpolation : nearest ## see help(imshow) for options +#image.cmap : viridis ## A colormap name, gray etc... +#image.lut : 256 ## the size of the colormap lookup table +#image.origin : upper ## lower | upper +#image.resample : True +#image.composite_image : True ## When True, all the images on a set of axes are + ## combined into a single composite image before + ## saving a figure as a vector graphics file, + ## such as a PDF. + +#### CONTOUR PLOTS +#contour.negative_linestyle : dashed ## string or on-off ink sequence +#contour.corner_mask : True ## True | False | legacy + +#### ERRORBAR PLOTS +#errorbar.capsize : 0 ## length of end cap on error bars in pixels + +#### HISTOGRAM PLOTS +#hist.bins : 10 ## The default number of histogram bins. + ## If Numpy 1.11 or later is + ## installed, may also be `auto` + +#### SCATTER PLOTS +#scatter.marker : o ## The default marker type for scatter plots. +#scatter.edgecolors : face ## The default edgecolors for scatter plots. + +#### Agg rendering +#### Warning: experimental, 2008/10/10 +#agg.path.chunksize : 0 ## 0 to disable; values in the range + ## 10000 to 100000 can improve speed slightly + ## and prevent an Agg rendering failure + ## when plotting very large data sets, + ## especially if they are very gappy. + ## It may cause minor artifacts, though. + ## A value of 20000 is probably a good + ## starting point. +#### PATHS +#path.simplify : True ## When True, simplify paths by removing "invisible" + ## points to reduce file size and increase rendering + ## speed +#path.simplify_threshold : 0.111111111111 ## The threshold of similarity below which + ## vertices will be removed in the + ## simplification process +#path.snap : True ## When True, rectilinear axis-aligned paths will be snapped to + ## the nearest pixel when certain criteria are met. When False, + ## paths will never be snapped. +#path.sketch : None ## May be none, or a 3-tuple of the form (scale, length, + ## randomness). + ## *scale* is the amplitude of the wiggle + ## perpendicular to the line (in pixels). *length* + ## is the length of the wiggle along the line (in + ## pixels). *randomness* is the factor by which + ## the length is randomly scaled. +#path.effects : [] ## + +#### SAVING FIGURES +## the default savefig params can be different from the display params +## e.g., you may want a higher resolution, or to make the figure +## background white +#savefig.dpi : figure ## figure dots per inch or 'figure' +#savefig.facecolor : white ## figure facecolor when saving +#savefig.edgecolor : white ## figure edgecolor when saving +#savefig.format : png ## png, ps, pdf, svg +#savefig.bbox : standard ## 'tight' or 'standard'. + ## 'tight' is incompatible with pipe-based animation + ## backends but will workd with temporary file based ones: + ## e.g. setting animation.writer to ffmpeg will not work, + ## use ffmpeg_file instead +#savefig.pad_inches : 0.1 ## Padding to be used when bbox is set to 'tight' +#savefig.jpeg_quality: 95 ## when a jpeg is saved, the default quality parameter. +#savefig.directory : ~ ## default directory in savefig dialog box, + ## leave empty to always use current working directory +#savefig.transparent : False ## setting that controls whether figures are saved with a + ## transparent background by default +#savefig.orientation : portrait ## Orientation of saved figure + +### tk backend params +#tk.window_focus : False ## Maintain shell focus for TkAgg + +### ps backend params +#ps.papersize : letter ## auto, letter, legal, ledger, A0-A10, B0-B10 +#ps.useafm : False ## use of afm fonts, results in small files +#ps.usedistiller : False ## can be: None, ghostscript or xpdf + ## Experimental: may produce smaller files. + ## xpdf intended for production of publication quality files, + ## but requires ghostscript, xpdf and ps2eps +#ps.distiller.res : 6000 ## dpi +#ps.fonttype : 3 ## Output Type 3 (Type3) or Type 42 (TrueType) + +### pdf backend params +#pdf.compression : 6 ## integer from 0 to 9 + ## 0 disables compression (good for debugging) +#pdf.fonttype : 3 ## Output Type 3 (Type3) or Type 42 (TrueType) +#pdf.use14corefonts : False +#pdf.inheritcolor : False + +### svg backend params +#svg.image_inline : True ## write raster image data directly into the svg file +#svg.fonttype : path ## How to handle SVG fonts: + ## none: Assume fonts are installed on the machine where the SVG will be viewed. + ## path: Embed characters as paths -- supported by most SVG renderers +#svg.hashsalt : None ## if not None, use this string as hash salt + ## instead of uuid4 +### pgf parameter +#pgf.rcfonts : True +#pgf.preamble : ## see text.latex.preamble for documentation +#pgf.texsystem : xelatex + +### docstring params +##docstring.hardcopy = False ## set this when you want to generate hardcopy docstring + +## Event keys to interact with figures/plots via keyboard. +## Customize these settings according to your needs. +## Leave the field(s) empty if you don't need a key-map. (i.e., fullscreen : '') +#keymap.fullscreen : f, ctrl+f ## toggling +#keymap.home : h, r, home ## home or reset mnemonic +#keymap.back : left, c, backspace, MouseButton.BACK ## forward / backward keys +#keymap.forward : right, v, MouseButton.FORWARD ## for quick navigation +#keymap.pan : p ## pan mnemonic +#keymap.zoom : o ## zoom mnemonic +#keymap.save : s, ctrl+s ## saving current figure +#keymap.help : f1 ## display help about active tools +#keymap.quit : ctrl+w, cmd+w, q ## close the current figure +#keymap.quit_all : W, cmd+W, Q ## close all figures +#keymap.grid : g ## switching on/off major grids in current axes +#keymap.grid_minor : G ## switching on/off minor grids in current axes +#keymap.yscale : l ## toggle scaling of y-axes ('log'/'linear') +#keymap.xscale : k, L ## toggle scaling of x-axes ('log'/'linear') +#keymap.all_axes : a ## enable all axes +#keymap.copy : ctrl+c, cmd+c ## Copy figure to clipboard + +###ANIMATION settings +#animation.html : none ## How to display the animation as HTML in + ## the IPython notebook. 'html5' uses + ## HTML5 video tag; 'jshtml' creates a + ## Javascript animation +#animation.writer : ffmpeg ## MovieWriter 'backend' to use +#animation.codec : h264 ## Codec to use for writing movie +#animation.bitrate: -1 ## Controls size/quality tradeoff for movie. + ## -1 implies let utility auto-determine +#animation.frame_format: png ## Controls frame format used by temp files +#animation.html_args: ## Additional arguments to pass to html writer +#animation.ffmpeg_path: ffmpeg ## Path to ffmpeg binary. Without full path + ## $PATH is searched +#animation.ffmpeg_args: ## Additional arguments to pass to ffmpeg +#animation.avconv_path: avconv ## Path to avconv binary. Without full path + ## $PATH is searched +#animation.avconv_args: ## Additional arguments to pass to avconv +#animation.convert_path: convert ## Path to ImageMagick's convert binary. + ## On Windows use the full path since convert + ## is also the name of a system tool. +#animation.convert_args: ## Additional arguments to pass to convert +#animation.embed_limit : 20.0 ## Limit, in MB, of size of base64 encoded + ## animation in HTML (i.e. IPython notebook) +``` + +## Download + +- [Download Python source code: customizing.py](https://matplotlib.org/_downloads/acbde7c6b91c31c4a29433e87403c871/customizing.py) +- [Download Jupyter notebook: customizing.ipynb](https://matplotlib.org/_downloads/a0acd54a96b40e271115ea0964417a12/customizing.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/introductory/images.md b/Python/matplotlab/introductory/images.md new file mode 100644 index 00000000..dd5fd20f --- /dev/null +++ b/Python/matplotlab/introductory/images.md @@ -0,0 +1,348 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Image tutorial + +A short tutorial on plotting images with Matplotlib. + +## Startup commands + +First, let's start IPython. It is a most excellent enhancement to the +standard Python prompt, and it ties in especially well with +Matplotlib. Start IPython either at a shell, or the IPython Notebook now. + +With IPython started, we now need to connect to a GUI event loop. This +tells IPython where (and how) to display plots. To connect to a GUI +loop, execute the **%matplotlib** magic at your IPython prompt. There's more +detail on exactly what this does at [IPython's documentation on GUI +event loops](http://ipython.org/ipython-doc/2/interactive/reference.html#gui-event-loop-support). + +If you're using IPython Notebook, the same commands are available, but +people commonly use a specific argument to the %matplotlib magic: + +``` python +In [1]: %matplotlib inline +``` + +This turns on inline plotting, where plot graphics will appear in your +notebook. This has important implications for interactivity. For inline plotting, commands in +cells below the cell that outputs a plot will not affect the plot. For example, +changing the color map is not possible from cells below the cell that creates a plot. +However, for other backends, such as Qt5, that open a separate window, +cells below those that create the plot will change the plot - it is a +live object in memory. + +This tutorial will use matplotlib's imperative-style plotting +interface, pyplot. This interface maintains global state, and is very +useful for quickly and easily experimenting with various plot +settings. The alternative is the object-oriented interface, which is also +very powerful, and generally more suitable for large application +development. If you'd like to learn about the object-oriented +interface, a great place to start is our [Usage guide](usage.html). For now, let's get on +with the imperative-style approach: + +``` python +import matplotlib.pyplot as plt +import matplotlib.image as mpimg +``` + +## Importing image data into Numpy arrays + +Loading image data is supported by the [Pillow](https://pillow.readthedocs.io/en/latest/) library. Natively, Matplotlib +only supports PNG images. The commands shown below fall back on Pillow if +the native read fails. + +The image used in this example is a PNG file, but keep that Pillow +requirement in mind for your own data. + +Here's the image we're going to play with: + +![stinkbug](https://matplotlib.org/_images/stinkbug.png) + +It's a 24-bit RGB PNG image (8 bits for each of R, G, B). Depending +on where you get your data, the other kinds of image that you'll most +likely encounter are RGBA images, which allow for transparency, or +single-channel grayscale (luminosity) images. You can right click on +it and choose "Save image as" to download it to your computer for the +rest of this tutorial. + +And here we go... + +``` python +img = mpimg.imread('../../doc/_static/stinkbug.png') +print(img) +``` + +Out: + +``` +[[[0.40784314 0.40784314 0.40784314] + [0.40784314 0.40784314 0.40784314] + [0.40784314 0.40784314 0.40784314] + ... + [0.42745098 0.42745098 0.42745098] + [0.42745098 0.42745098 0.42745098] + [0.42745098 0.42745098 0.42745098]] + + [[0.4117647 0.4117647 0.4117647 ] + [0.4117647 0.4117647 0.4117647 ] + [0.4117647 0.4117647 0.4117647 ] + ... + [0.42745098 0.42745098 0.42745098] + [0.42745098 0.42745098 0.42745098] + [0.42745098 0.42745098 0.42745098]] + + [[0.41960785 0.41960785 0.41960785] + [0.41568628 0.41568628 0.41568628] + [0.41568628 0.41568628 0.41568628] + ... + [0.43137255 0.43137255 0.43137255] + [0.43137255 0.43137255 0.43137255] + [0.43137255 0.43137255 0.43137255]] + + ... + + [[0.4392157 0.4392157 0.4392157 ] + [0.43529412 0.43529412 0.43529412] + [0.43137255 0.43137255 0.43137255] + ... + [0.45490196 0.45490196 0.45490196] + [0.4509804 0.4509804 0.4509804 ] + [0.4509804 0.4509804 0.4509804 ]] + + [[0.44313726 0.44313726 0.44313726] + [0.44313726 0.44313726 0.44313726] + [0.4392157 0.4392157 0.4392157 ] + ... + [0.4509804 0.4509804 0.4509804 ] + [0.44705883 0.44705883 0.44705883] + [0.44705883 0.44705883 0.44705883]] + + [[0.44313726 0.44313726 0.44313726] + [0.4509804 0.4509804 0.4509804 ] + [0.4509804 0.4509804 0.4509804 ] + ... + [0.44705883 0.44705883 0.44705883] + [0.44705883 0.44705883 0.44705883] + [0.44313726 0.44313726 0.44313726]]] +``` + +Note the dtype there - float32. Matplotlib has rescaled the 8 bit +data from each channel to floating point data between 0.0 and 1.0. As +a side note, the only datatype that Pillow can work with is uint8. +Matplotlib plotting can handle float32 and uint8, but image +reading/writing for any format other than PNG is limited to uint8 +data. Why 8 bits? Most displays can only render 8 bits per channel +worth of color gradation. Why can they only render 8 bits/channel? +Because that's about all the human eye can see. More here (from a +photography standpoint): [Luminous Landscape bit depth tutorial](https://luminous-landscape.com/bit-depth/). + +Each inner list represents a pixel. Here, with an RGB image, there +are 3 values. Since it's a black and white image, R, G, and B are all +similar. An RGBA (where A is alpha, or transparency), has 4 values +per inner list, and a simple luminance image just has one value (and +is thus only a 2-D array, not a 3-D array). For RGB and RGBA images, +matplotlib supports float32 and uint8 data types. For grayscale, +matplotlib supports only float32. If your array data does not meet +one of these descriptions, you need to rescale it. + +## Plotting numpy arrays as images + +So, you have your data in a numpy array (either by importing it, or by +generating it). Let's render it. In Matplotlib, this is performed +using the [``imshow()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow) function. Here we'll grab +the plot object. This object gives you an easy way to manipulate the +plot from the prompt. + +``` python +imgplot = plt.imshow(img) +``` + +![sphx_glr_images_001](https://matplotlib.org/_images/sphx_glr_images_001.png) + +You can also plot any numpy array. + +### Applying pseudocolor schemes to image plots + +Pseudocolor can be a useful tool for enhancing contrast and +visualizing your data more easily. This is especially useful when +making presentations of your data using projectors - their contrast is +typically quite poor. + +Pseudocolor is only relevant to single-channel, grayscale, luminosity +images. We currently have an RGB image. Since R, G, and B are all +similar (see for yourself above or in your data), we can just pick one +channel of our data: + +``` python +lum_img = img[:, :, 0] + +# This is array slicing. You can read more in the `Numpy tutorial +# `_. + +plt.imshow(lum_img) +``` + +![sphx_glr_images_002](https://matplotlib.org/_images/sphx_glr_images_002.png) + +Now, with a luminosity (2D, no color) image, the default colormap (aka lookup table, +LUT), is applied. The default is called viridis. There are plenty of +others to choose from. + +``` python +plt.imshow(lum_img, cmap="hot") +``` + +![sphx_glr_images_003](https://matplotlib.org/_images/sphx_glr_images_003.png) + +Note that you can also change colormaps on existing plot objects using the +``set_cmap()`` method: + +``` python +imgplot = plt.imshow(lum_img) +imgplot.set_cmap('nipy_spectral') +``` + +![sphx_glr_images_004](https://matplotlib.org/_images/sphx_glr_images_004.png) + +::: tip Note + +However, remember that in the IPython notebook with the inline backend, +you can't make changes to plots that have already been rendered. If you +create imgplot here in one cell, you cannot call set_cmap() on it in a later +cell and expect the earlier plot to change. Make sure that you enter these +commands together in one cell. plt commands will not change plots from earlier +cells. + +::: + +There are many other colormap schemes available. See the [list and +images of the colormaps](https://matplotlib.org/colors/colormaps.html). + +### Color scale reference + +It's helpful to have an idea of what value a color represents. We can +do that by adding color bars. + +``` python +imgplot = plt.imshow(lum_img) +plt.colorbar() +``` + +![sphx_glr_images_005](https://matplotlib.org/_images/sphx_glr_images_005.png) + +This adds a colorbar to your existing figure. This won't +automatically change if you change you switch to a different +colormap - you have to re-create your plot, and add in the colorbar +again. + +### Examining a specific data range + +Sometimes you want to enhance the contrast in your image, or expand +the contrast in a particular region while sacrificing the detail in +colors that don't vary much, or don't matter. A good tool to find +interesting regions is the histogram. To create a histogram of our +image data, we use the [``hist()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist) function. + +``` python +plt.hist(lum_img.ravel(), bins=256, range=(0.0, 1.0), fc='k', ec='k') +``` + +![sphx_glr_images_006](https://matplotlib.org/_images/sphx_glr_images_006.png) + +Most often, the "interesting" part of the image is around the peak, +and you can get extra contrast by clipping the regions above and/or +below the peak. In our histogram, it looks like there's not much +useful information in the high end (not many white things in the +image). Let's adjust the upper limit, so that we effectively "zoom in +on" part of the histogram. We do this by passing the clim argument to +imshow. You could also do this by calling the +``set_clim()`` method of the image plot +object, but make sure that you do so in the same cell as your plot +command when working with the IPython Notebook - it will not change +plots from earlier cells. + +You can specify the clim in the call to ``plot``. + +``` python +imgplot = plt.imshow(lum_img, clim=(0.0, 0.7)) +``` + +![sphx_glr_images_007](https://matplotlib.org/_images/sphx_glr_images_007.png) + +You can also specify the clim using the returned object + +``` python +fig = plt.figure() +a = fig.add_subplot(1, 2, 1) +imgplot = plt.imshow(lum_img) +a.set_title('Before') +plt.colorbar(ticks=[0.1, 0.3, 0.5, 0.7], orientation='horizontal') +a = fig.add_subplot(1, 2, 2) +imgplot = plt.imshow(lum_img) +imgplot.set_clim(0.0, 0.7) +a.set_title('After') +plt.colorbar(ticks=[0.1, 0.3, 0.5, 0.7], orientation='horizontal') +``` + +![sphx_glr_images_008](https://matplotlib.org/_images/sphx_glr_images_008.png) + +### Array Interpolation schemes + +Interpolation calculates what the color or value of a pixel "should" +be, according to different mathematical schemes. One common place +that this happens is when you resize an image. The number of pixels +change, but you want the same information. Since pixels are discrete, +there's missing space. Interpolation is how you fill that space. +This is why your images sometimes come out looking pixelated when you +blow them up. The effect is more pronounced when the difference +between the original image and the expanded image is greater. Let's +take our image and shrink it. We're effectively discarding pixels, +only keeping a select few. Now when we plot it, that data gets blown +up to the size on your screen. The old pixels aren't there anymore, +and the computer has to draw in pixels to fill that space. + +We'll use the Pillow library that we used to load the image also to resize +the image. + +``` python +from PIL import Image + +img = Image.open('../../doc/_static/stinkbug.png') +img.thumbnail((64, 64), Image.ANTIALIAS) # resizes image in-place +imgplot = plt.imshow(img) +``` + +![sphx_glr_images_009](https://matplotlib.org/_images/sphx_glr_images_009.png) + +Here we have the default interpolation, bilinear, since we did not +give [``imshow()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow) any interpolation argument. + +Let's try some others. Here's "nearest", which does no interpolation. + +``` python +imgplot = plt.imshow(img, interpolation="nearest") +``` + +![sphx_glr_images_010](https://matplotlib.org/_images/sphx_glr_images_010.png) + +and bicubic: + +``` python +imgplot = plt.imshow(img, interpolation="bicubic") +``` + +![sphx_glr_images_011](https://matplotlib.org/_images/sphx_glr_images_011.png) + +Bicubic interpolation is often used when blowing up photos - people +tend to prefer blurry over pixelated. + +**Total running time of the script:** ( 0 minutes 1.829 seconds) + +## Download + +- [Download Python source code: images.py](https://matplotlib.org/_downloads/b9ccc225a9488811ec7ceeb6dfc7d21f/images.py) +- [Download Jupyter notebook: images.ipynb](https://matplotlib.org/_downloads/ec8d45ccc5387a8e56bc5e286ae92234/images.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/introductory/lifecycle.md b/Python/matplotlab/introductory/lifecycle.md new file mode 100644 index 00000000..10440ece --- /dev/null +++ b/Python/matplotlab/introductory/lifecycle.md @@ -0,0 +1,325 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# The Lifecycle of a Plot + +This tutorial aims to show the beginning, middle, and end of a single +visualization using Matplotlib. We'll begin with some raw data and +end by saving a figure of a customized visualization. Along the way we'll try +to highlight some neat features and best-practices using Matplotlib. + +::: tip Note + +This tutorial is based off of +[this excellent blog post](http://pbpython.com/effective-matplotlib.html) +by Chris Moffitt. It was transformed into this tutorial by Chris Holdgraf. + +::: + +## A note on the Object-Oriented API vs Pyplot + +Matplotlib has two interfaces. The first is an object-oriented (OO) +interface. In this case, we utilize an instance of [``axes.Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) +in order to render visualizations on an instance of [``figure.Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure). + +The second is based on MATLAB and uses a state-based interface. This is +encapsulated in the ``pyplot`` module. See the [pyplot tutorials](pyplot.html) for a more in-depth look at the pyplot +interface. + +Most of the terms are straightforward but the main thing to remember +is that: + +- The Figure is the final image that may contain 1 or more Axes. +- The Axes represent an individual plot (don't confuse this with the word +"axis", which refers to the x/y axis of a plot). + +We call methods that do the plotting directly from the Axes, which gives +us much more flexibility and power in customizing our plot. + +::: tip Note + +In general, try to use the object-oriented interface over the pyplot +interface. + +::: + +## Our data + +We'll use the data from the post from which this tutorial was derived. +It contains sales information for a number of companies. + +``` python +# sphinx_gallery_thumbnail_number = 10 +import numpy as np +import matplotlib.pyplot as plt +from matplotlib.ticker import FuncFormatter + +data = {'Barton LLC': 109438.50, + 'Frami, Hills and Schmidt': 103569.59, + 'Fritsch, Russel and Anderson': 112214.71, + 'Jerde-Hilpert': 112591.43, + 'Keeling LLC': 100934.30, + 'Koepp Ltd': 103660.54, + 'Kulas Inc': 137351.96, + 'Trantow-Barrows': 123381.38, + 'White-Trantow': 135841.99, + 'Will LLC': 104437.60} +group_data = list(data.values()) +group_names = list(data.keys()) +group_mean = np.mean(group_data) +``` + +## Getting started + +This data is naturally visualized as a barplot, with one bar per +group. To do this with the object-oriented approach, we'll first generate +an instance of [``figure.Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) and +[``axes.Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes). The Figure is like a canvas, and the Axes +is a part of that canvas on which we will make a particular visualization. + +::: tip Note + +Figures can have multiple axes on them. For information on how to do this, +see the [Tight Layout tutorial](https://matplotlib.org/intermediate/tight_layout_guide.html). + +::: + +``` python +fig, ax = plt.subplots() +``` + +![sphx_glr_lifecycle_001](https://matplotlib.org/_images/sphx_glr_lifecycle_001.png) + +Now that we have an Axes instance, we can plot on top of it. + +``` python +fig, ax = plt.subplots() +ax.barh(group_names, group_data) +``` + +![sphx_glr_lifecycle_002](https://matplotlib.org/_images/sphx_glr_lifecycle_002.png) + +## Controlling the style + +There are many styles available in Matplotlib in order to let you tailor +your visualization to your needs. To see a list of styles, we can use +``pyplot.style``. + +``` python +print(plt.style.available) +``` + +Out: + +``` +['seaborn-dark', 'dark_background', 'seaborn-pastel', 'seaborn-colorblind', 'tableau-colorblind10', 'seaborn-notebook', 'seaborn-dark-palette', 'grayscale', 'seaborn-poster', 'seaborn', 'bmh', 'seaborn-talk', 'seaborn-ticks', '_classic_test', 'ggplot', 'seaborn-white', 'classic', 'Solarize_Light2', 'seaborn-paper', 'fast', 'fivethirtyeight', 'seaborn-muted', 'seaborn-whitegrid', 'seaborn-darkgrid', 'seaborn-bright', 'seaborn-deep'] +``` + +You can activate a style with the following: + +``` python +plt.style.use('fivethirtyeight') +``` + +Now let's remake the above plot to see how it looks: + +``` python +fig, ax = plt.subplots() +ax.barh(group_names, group_data) +``` + +![sphx_glr_lifecycle_003](https://matplotlib.org/_images/sphx_glr_lifecycle_003.png) + +The style controls many things, such as color, linewidths, backgrounds, +etc. + +## Customizing the plot + +Now we've got a plot with the general look that we want, so let's fine-tune +it so that it's ready for print. First let's rotate the labels on the x-axis +so that they show up more clearly. We can gain access to these labels +with the [``axes.Axes.get_xticklabels()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.get_xticklabels.html#matplotlib.axes.Axes.get_xticklabels) method: + +``` python +fig, ax = plt.subplots() +ax.barh(group_names, group_data) +labels = ax.get_xticklabels() +``` + +![sphx_glr_lifecycle_004](https://matplotlib.org/_images/sphx_glr_lifecycle_004.png) + +If we'd like to set the property of many items at once, it's useful to use +the [``pyplot.setp()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.setp.html#matplotlib.pyplot.setp) function. This will take a list (or many lists) of +Matplotlib objects, and attempt to set some style element of each one. + +``` python +fig, ax = plt.subplots() +ax.barh(group_names, group_data) +labels = ax.get_xticklabels() +plt.setp(labels, rotation=45, horizontalalignment='right') +``` + +![sphx_glr_lifecycle_005](https://matplotlib.org/_images/sphx_glr_lifecycle_005.png) + +It looks like this cut off some of the labels on the bottom. We can +tell Matplotlib to automatically make room for elements in the figures +that we create. To do this we'll set the ``autolayout`` value of our +rcParams. For more information on controlling the style, layout, and +other features of plots with rcParams, see +[Customizing Matplotlib with style sheets and rcParams](customizing.html). + +``` python +plt.rcParams.update({'figure.autolayout': True}) + +fig, ax = plt.subplots() +ax.barh(group_names, group_data) +labels = ax.get_xticklabels() +plt.setp(labels, rotation=45, horizontalalignment='right') +``` + +![sphx_glr_lifecycle_006](https://matplotlib.org/_images/sphx_glr_lifecycle_006.png) + +Next, we'll add labels to the plot. To do this with the OO interface, +we can use the [``axes.Axes.set()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set.html#matplotlib.axes.Axes.set) method to set properties of this +Axes object. + +``` python +fig, ax = plt.subplots() +ax.barh(group_names, group_data) +labels = ax.get_xticklabels() +plt.setp(labels, rotation=45, horizontalalignment='right') +ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company', + title='Company Revenue') +``` + +![sphx_glr_lifecycle_007](https://matplotlib.org/_images/sphx_glr_lifecycle_007.png) + +We can also adjust the size of this plot using the [``pyplot.subplots()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplots.html#matplotlib.pyplot.subplots) +function. We can do this with the ``figsize`` kwarg. + +::: tip Note + +While indexing in NumPy follows the form (row, column), the figsize +kwarg follows the form (width, height). This follows conventions in +visualization, which unfortunately are different from those of linear +algebra. + +::: + +``` python +fig, ax = plt.subplots(figsize=(8, 4)) +ax.barh(group_names, group_data) +labels = ax.get_xticklabels() +plt.setp(labels, rotation=45, horizontalalignment='right') +ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company', + title='Company Revenue') +``` + +![sphx_glr_lifecycle_008](https://matplotlib.org/_images/sphx_glr_lifecycle_008.png) + +For labels, we can specify custom formatting guidelines in the form of +functions by using the [``ticker.FuncFormatter``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.FuncFormatter) class. Below we'll +define a function that takes an integer as input, and returns a string +as an output. + +``` python +def currency(x, pos): + """The two args are the value and tick position""" + if x >= 1e6: + s = '${:1.1f}M'.format(x*1e-6) + else: + s = '${:1.0f}K'.format(x*1e-3) + return s + +formatter = FuncFormatter(currency) +``` + +We can then apply this formatter to the labels on our plot. To do this, +we'll use the ``xaxis`` attribute of our axis. This lets you perform +actions on a specific axis on our plot. + +``` python +fig, ax = plt.subplots(figsize=(6, 8)) +ax.barh(group_names, group_data) +labels = ax.get_xticklabels() +plt.setp(labels, rotation=45, horizontalalignment='right') + +ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company', + title='Company Revenue') +ax.xaxis.set_major_formatter(formatter) +``` + +![sphx_glr_lifecycle_009](https://matplotlib.org/_images/sphx_glr_lifecycle_009.png) + +## Combining multiple visualizations + +It is possible to draw multiple plot elements on the same instance of +[``axes.Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes). To do this we simply need to call another one of +the plot methods on that axes object. + +``` python +fig, ax = plt.subplots(figsize=(8, 8)) +ax.barh(group_names, group_data) +labels = ax.get_xticklabels() +plt.setp(labels, rotation=45, horizontalalignment='right') + +# Add a vertical line, here we set the style in the function call +ax.axvline(group_mean, ls='--', color='r') + +# Annotate new companies +for group in [3, 5, 8]: + ax.text(145000, group, "New Company", fontsize=10, + verticalalignment="center") + +# Now we'll move our title up since it's getting a little cramped +ax.title.set(y=1.05) + +ax.set(xlim=[-10000, 140000], xlabel='Total Revenue', ylabel='Company', + title='Company Revenue') +ax.xaxis.set_major_formatter(formatter) +ax.set_xticks([0, 25e3, 50e3, 75e3, 100e3, 125e3]) +fig.subplots_adjust(right=.1) + +plt.show() +``` + +![sphx_glr_lifecycle_010](https://matplotlib.org/_images/sphx_glr_lifecycle_010.png) + +## Saving our plot + +Now that we're happy with the outcome of our plot, we want to save it to +disk. There are many file formats we can save to in Matplotlib. To see +a list of available options, use: + +``` python +print(fig.canvas.get_supported_filetypes()) +``` + +Out: + +``` +{'ps': 'Postscript', 'eps': 'Encapsulated Postscript', 'pdf': 'Portable Document Format', 'pgf': 'PGF code for LaTeX', 'png': 'Portable Network Graphics', 'raw': 'Raw RGBA bitmap', 'rgba': 'Raw RGBA bitmap', 'svg': 'Scalable Vector Graphics', 'svgz': 'Scalable Vector Graphics', 'jpg': 'Joint Photographic Experts Group', 'jpeg': 'Joint Photographic Experts Group', 'tif': 'Tagged Image File Format', 'tiff': 'Tagged Image File Format'} +``` + +We can then use the [``figure.Figure.savefig()``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.savefig) in order to save the figure +to disk. Note that there are several useful flags we'll show below: + +- ``transparent=True`` makes the background of the saved figure transparent +if the format supports it. +- ``dpi=80`` controls the resolution (dots per square inch) of the output. +- ``bbox_inches="tight"`` fits the bounds of the figure to our plot. + +``` python +# Uncomment this line to save the figure. +# fig.savefig('sales.png', transparent=False, dpi=80, bbox_inches="tight") +``` + +**Total running time of the script:** ( 0 minutes 1.566 seconds) + +## Download + +- [Download Python source code: lifecycle.py](https://matplotlib.org/_downloads/9f5af95225ff5984a6cc962463c43459/lifecycle.py) +- [Download Jupyter notebook: lifecycle.ipynb](https://matplotlib.org/_downloads/db19d93870c5df97263c5f5a2e835466/lifecycle.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/introductory/pyplot.md b/Python/matplotlab/introductory/pyplot.md new file mode 100644 index 00000000..4c0a3ab3 --- /dev/null +++ b/Python/matplotlab/introductory/pyplot.md @@ -0,0 +1,568 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Pyplot tutorial + +An introduction to the pyplot interface. + +## Intro to pyplot + +[``matplotlib.pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) is a collection of command style functions +that make matplotlib work like MATLAB. +Each ``pyplot`` function makes +some change to a figure: e.g., creates a figure, creates a plotting area +in a figure, plots some lines in a plotting area, decorates the plot +with labels, etc. + +In [``matplotlib.pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) various states are preserved +across function calls, so that it keeps track of things like +the current figure and plotting area, and the plotting +functions are directed to the current axes (please note that "axes" here +and in most places in the documentation refers to the *axes* +[part of a figure](usage.html#figure-parts) +and not the strict mathematical term for more than one axis). + +::: tip Note + +the pyplot API is generally less-flexible than the object-oriented API. +Most of the function calls you see here can also be called as methods +from an ``Axes`` object. We recommend browsing the tutorials and +examples to see how this works. + +::: + +Generating visualizations with pyplot is very quick: + +``` python +import matplotlib.pyplot as plt +plt.plot([1, 2, 3, 4]) +plt.ylabel('some numbers') +plt.show() +``` + +![sphx_glr_pyplot_001](https://matplotlib.org/_images/sphx_glr_pyplot_001.png) + +You may be wondering why the x-axis ranges from 0-3 and the y-axis +from 1-4. If you provide a single list or array to the +[``plot()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) command, matplotlib assumes it is a +sequence of y values, and automatically generates the x values for +you. Since python ranges start with 0, the default x vector has the +same length as y but starts with 0. Hence the x data are +``[0,1,2,3]``. + +[``plot()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) is a versatile command, and will take +an arbitrary number of arguments. For example, to plot x versus y, +you can issue the command: + +``` python +plt.plot([1, 2, 3, 4], [1, 4, 9, 16]) +``` + +![sphx_glr_pyplot_002](https://matplotlib.org/_images/sphx_glr_pyplot_002.png) + +### Formatting the style of your plot + +For every x, y pair of arguments, there is an optional third argument +which is the format string that indicates the color and line type of +the plot. The letters and symbols of the format string are from +MATLAB, and you concatenate a color string with a line style string. +The default format string is 'b-', which is a solid blue line. For +example, to plot the above with red circles, you would issue + +``` python +plt.plot([1, 2, 3, 4], [1, 4, 9, 16], 'ro') +plt.axis([0, 6, 0, 20]) +plt.show() +``` + +![sphx_glr_pyplot_003](https://matplotlib.org/_images/sphx_glr_pyplot_003.png) + +See the [``plot()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) documentation for a complete +list of line styles and format strings. The +[``axis()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.axis.html#matplotlib.pyplot.axis) command in the example above takes a +list of ``[xmin, xmax, ymin, ymax]`` and specifies the viewport of the +axes. + +If matplotlib were limited to working with lists, it would be fairly +useless for numeric processing. Generally, you will use [numpy](http://www.numpy.org) arrays. In fact, all sequences are +converted to numpy arrays internally. The example below illustrates a +plotting several lines with different format styles in one command +using arrays. + +``` python +import numpy as np + +# evenly sampled time at 200ms intervals +t = np.arange(0., 5., 0.2) + +# red dashes, blue squares and green triangles +plt.plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^') +plt.show() +``` + +![sphx_glr_pyplot_004](https://matplotlib.org/_images/sphx_glr_pyplot_004.png) + +## Plotting with keyword strings + +There are some instances where you have data in a format that lets you +access particular variables with strings. For example, with +[``numpy.recarray``](https://docs.scipy.org/doc/numpy/reference/generated/numpy.recarray.html#numpy.recarray) or [``pandas.DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame). + +Matplotlib allows you provide such an object with +the ``data`` keyword argument. If provided, then you may generate plots with +the strings corresponding to these variables. + +``` python +data = {'a': np.arange(50), + 'c': np.random.randint(0, 50, 50), + 'd': np.random.randn(50)} +data['b'] = data['a'] + 10 * np.random.randn(50) +data['d'] = np.abs(data['d']) * 100 + +plt.scatter('a', 'b', c='c', s='d', data=data) +plt.xlabel('entry a') +plt.ylabel('entry b') +plt.show() +``` + +![sphx_glr_pyplot_005](https://matplotlib.org/_images/sphx_glr_pyplot_005.png) + +## Plotting with categorical variables + +It is also possible to create a plot using categorical variables. +Matplotlib allows you to pass categorical variables directly to +many plotting functions. For example: + +``` python +names = ['group_a', 'group_b', 'group_c'] +values = [1, 10, 100] + +plt.figure(figsize=(9, 3)) + +plt.subplot(131) +plt.bar(names, values) +plt.subplot(132) +plt.scatter(names, values) +plt.subplot(133) +plt.plot(names, values) +plt.suptitle('Categorical Plotting') +plt.show() +``` + +![sphx_glr_pyplot_006](https://matplotlib.org/_images/sphx_glr_pyplot_006.png) + +## Controlling line properties + +Lines have many attributes that you can set: linewidth, dash style, +antialiased, etc; see [``matplotlib.lines.Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D). There are +several ways to set line properties + +- Use keyword args: + +``` python +plt.plot(x, y, linewidth=2.0) +``` +- Use the setter methods of a ``Line2D`` instance. ``plot`` returns a list +of ``Line2D`` objects; e.g., ``line1, line2 = plot(x1, y1, x2, y2)``. In the code +below we will suppose that we have only +one line so that the list returned is of length 1. We use tuple unpacking with +``line,`` to get the first element of that list: + +``` python +line, = plt.plot(x, y, '-') +line.set_antialiased(False) # turn off antialiasing +``` +- Use the [``setp()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.setp.html#matplotlib.pyplot.setp) command. The example below +uses a MATLAB-style command to set multiple properties +on a list of lines. ``setp`` works transparently with a list of objects +or a single object. You can either use python keyword arguments or +MATLAB-style string/value pairs: + +``` python +lines = plt.plot(x1, y1, x2, y2) +# use keyword args +plt.setp(lines, color='r', linewidth=2.0) +# or MATLAB style string value pairs +plt.setp(lines, 'color', 'r', 'linewidth', 2.0) +``` + +Here are the available [``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D) properties. + + +--- + + + + + +Property +Value Type + + +``` +alpha +float + +animated +[True | False] + +antialiased or aa +[True | False] + +clip_box +a matplotlib.transform.Bbox instance + +clip_on +[True | False] + +clip_path +a Path instance and a Transform instance, a Patch + +color or c +any matplotlib color + +contains +the hit testing function + +dash_capstyle +['butt' | 'round' | 'projecting'] + +dash_joinstyle +['miter' | 'round' | 'bevel'] + +dashes +sequence of on/off ink in points + +data +(np.array xdata, np.array ydata) + +figure +a matplotlib.figure.Figure instance + +label +any string + +linestyle or ls +[ '-' | '--' | '-.' | ':' | 'steps' | ...] + +linewidth or lw +float value in points + +marker +[ '+' | ',' | '.' | '1' | '2' | '3' | '4' ] + +markeredgecolor or mec +any matplotlib color + +markeredgewidth or mew +float value in points + +markerfacecolor or mfc +any matplotlib color + +markersize or ms +float + +markevery +[ None | integer | (startind, stride) ] + +picker +used in interactive line selection + +pickradius +the line pick selection radius + +solid_capstyle +['butt' | 'round' | 'projecting'] + +solid_joinstyle +['miter' | 'round' | 'bevel'] + +transform +a matplotlib.transforms.Transform instance + +visible +[True | False] + +xdata +np.array + +ydata +np.array + +zorder +any number +``` + + + +To get a list of settable line properties, call the +[``setp()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.setp.html#matplotlib.pyplot.setp) function with a line or lines +as argument + +``` python +In [69]: lines = plt.plot([1, 2, 3]) + +In [70]: plt.setp(lines) + alpha: float + animated: [True | False] + antialiased or aa: [True | False] + ...snip +``` + +## Working with multiple figures and axes + +MATLAB, and [``pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot), have the concept of the current +figure and the current axes. All plotting commands apply to the +current axes. The function [``gca()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.gca.html#matplotlib.pyplot.gca) returns the +current axes (a [``matplotlib.axes.Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) instance), and +[``gcf()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.gcf.html#matplotlib.pyplot.gcf) returns the current figure +([``matplotlib.figure.Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) instance). Normally, you don't have +to worry about this, because it is all taken care of behind the +scenes. Below is a script to create two subplots. + +``` python +def f(t): + return np.exp(-t) * np.cos(2*np.pi*t) + +t1 = np.arange(0.0, 5.0, 0.1) +t2 = np.arange(0.0, 5.0, 0.02) + +plt.figure() +plt.subplot(211) +plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k') + +plt.subplot(212) +plt.plot(t2, np.cos(2*np.pi*t2), 'r--') +plt.show() +``` + +![sphx_glr_pyplot_007](https://matplotlib.org/_images/sphx_glr_pyplot_007.png) + +The [``figure()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure) command here is optional because +``figure(1)`` will be created by default, just as a ``subplot(111)`` +will be created by default if you don't manually specify any axes. The +[``subplot()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.subplot.html#matplotlib.pyplot.subplot) command specifies ``numrows, +numcols, plot_number`` where ``plot_number`` ranges from 1 to +``numrows*numcols``. The commas in the ``subplot`` command are +optional if ``numrows*numcols<10``. So ``subplot(211)`` is identical +to ``subplot(2, 1, 1)``. + +You can create an arbitrary number of subplots +and axes. If you want to place an axes manually, i.e., not on a +rectangular grid, use the [``axes()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.axes.html#matplotlib.pyplot.axes) command, +which allows you to specify the location as ``axes([left, bottom, +width, height])`` where all values are in fractional (0 to 1) +coordinates. See [Axes Demo](https://matplotlib.orggallery/subplots_axes_and_figures/axes_demo.html) for an example of +placing axes manually and [Basic Subplot Demo](https://matplotlib.orggallery/subplots_axes_and_figures/subplot_demo.html) for an +example with lots of subplots. + +You can create multiple figures by using multiple +[``figure()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.figure.html#matplotlib.pyplot.figure) calls with an increasing figure +number. Of course, each figure can contain as many axes and subplots +as your heart desires: + +``` python +import matplotlib.pyplot as plt +plt.figure(1) # the first figure +plt.subplot(211) # the first subplot in the first figure +plt.plot([1, 2, 3]) +plt.subplot(212) # the second subplot in the first figure +plt.plot([4, 5, 6]) + + +plt.figure(2) # a second figure +plt.plot([4, 5, 6]) # creates a subplot(111) by default + +plt.figure(1) # figure 1 current; subplot(212) still current +plt.subplot(211) # make subplot(211) in figure1 current +plt.title('Easy as 1, 2, 3') # subplot 211 title +``` + +You can clear the current figure with [``clf()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.clf.html#matplotlib.pyplot.clf) +and the current axes with [``cla()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.cla.html#matplotlib.pyplot.cla). If you find +it annoying that states (specifically the current image, figure and axes) +are being maintained for you behind the scenes, don't despair: this is just a thin +stateful wrapper around an object oriented API, which you can use +instead (see [Artist tutorial](https://matplotlib.org/intermediate/artists.html)) + +If you are making lots of figures, you need to be aware of one +more thing: the memory required for a figure is not completely +released until the figure is explicitly closed with +[``close()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.close.html#matplotlib.pyplot.close). Deleting all references to the +figure, and/or using the window manager to kill the window in which +the figure appears on the screen, is not enough, because pyplot +maintains internal references until [``close()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.close.html#matplotlib.pyplot.close) +is called. + +## Working with text + +The [``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) command can be used to add text in +an arbitrary location, and the [``xlabel()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.xlabel.html#matplotlib.pyplot.xlabel), +[``ylabel()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.ylabel.html#matplotlib.pyplot.ylabel) and [``title()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.title.html#matplotlib.pyplot.title) +are used to add text in the indicated locations (see [Text in Matplotlib Plots](https://matplotlib.org/text/text_intro.html) +for a more detailed example) + +``` python +mu, sigma = 100, 15 +x = mu + sigma * np.random.randn(10000) + +# the histogram of the data +n, bins, patches = plt.hist(x, 50, density=1, facecolor='g', alpha=0.75) + + +plt.xlabel('Smarts') +plt.ylabel('Probability') +plt.title('Histogram of IQ') +plt.text(60, .025, r'$\mu=100,\ \sigma=15$') +plt.axis([40, 160, 0, 0.03]) +plt.grid(True) +plt.show() +``` + +![sphx_glr_pyplot_008](https://matplotlib.org/_images/sphx_glr_pyplot_008.png) + +All of the [``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) commands return an +[``matplotlib.text.Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text) instance. Just as with with lines +above, you can customize the properties by passing keyword arguments +into the text functions or using [``setp()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.setp.html#matplotlib.pyplot.setp): + +``` python +t = plt.xlabel('my data', fontsize=14, color='red') +``` + +These properties are covered in more detail in [Text properties and layout](https://matplotlib.org/text/text_props.html). + +### Using mathematical expressions in text + +matplotlib accepts TeX equation expressions in any text expression. +For example to write the expression \(\sigma_i=15\) in the title, +you can write a TeX expression surrounded by dollar signs: + +``` python +plt.title(r'$\sigma_i=15$') +``` + +The ``r`` preceding the title string is important -- it signifies +that the string is a *raw* string and not to treat backslashes as +python escapes. matplotlib has a built-in TeX expression parser and +layout engine, and ships its own math fonts -- for details see +[Writing mathematical expressions](https://matplotlib.org/text/mathtext.html). Thus you can use mathematical text across platforms +without requiring a TeX installation. For those who have LaTeX and +dvipng installed, you can also use LaTeX to format your text and +incorporate the output directly into your display figures or saved +postscript -- see [Text rendering With LaTeX](https://matplotlib.org/text/usetex.html). + +### Annotating text + +The uses of the basic [``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) command above +place text at an arbitrary position on the Axes. A common use for +text is to annotate some feature of the plot, and the +[``annotate()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.annotate.html#matplotlib.pyplot.annotate) method provides helper +functionality to make annotations easy. In an annotation, there are +two points to consider: the location being annotated represented by +the argument ``xy`` and the location of the text ``xytext``. Both of +these arguments are ``(x,y)`` tuples. + +``` python +ax = plt.subplot(111) + +t = np.arange(0.0, 5.0, 0.01) +s = np.cos(2*np.pi*t) +line, = plt.plot(t, s, lw=2) + +plt.annotate('local max', xy=(2, 1), xytext=(3, 1.5), + arrowprops=dict(facecolor='black', shrink=0.05), + ) + +plt.ylim(-2, 2) +plt.show() +``` + +![sphx_glr_pyplot_009](https://matplotlib.org/_images/sphx_glr_pyplot_009.png) + +In this basic example, both the ``xy`` (arrow tip) and ``xytext`` +locations (text location) are in data coordinates. There are a +variety of other coordinate systems one can choose -- see +[Basic annotation](https://matplotlib.org/text/annotations.html#annotations-tutorial) and [Advanced Annotation](https://matplotlib.org/text/annotations.html#plotting-guide-annotation) for +details. More examples can be found in +[Annotating Plots](https://matplotlib.orggallery/text_labels_and_annotations/annotation_demo.html). + +## Logarithmic and other nonlinear axes + +[``matplotlib.pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) supports not only linear axis scales, but also +logarithmic and logit scales. This is commonly used if data spans many orders +of magnitude. Changing the scale of an axis is easy: + +An example of four plots with the same data and different scales for the y axis +is shown below. + +``` python +from matplotlib.ticker import NullFormatter # useful for `logit` scale + +# Fixing random state for reproducibility +np.random.seed(19680801) + +# make up some data in the interval ]0, 1[ +y = np.random.normal(loc=0.5, scale=0.4, size=1000) +y = y[(y > 0) & (y < 1)] +y.sort() +x = np.arange(len(y)) + +# plot with various axes scales +plt.figure() + +# linear +plt.subplot(221) +plt.plot(x, y) +plt.yscale('linear') +plt.title('linear') +plt.grid(True) + + +# log +plt.subplot(222) +plt.plot(x, y) +plt.yscale('log') +plt.title('log') +plt.grid(True) + + +# symmetric log +plt.subplot(223) +plt.plot(x, y - y.mean()) +plt.yscale('symlog', linthreshy=0.01) +plt.title('symlog') +plt.grid(True) + +# logit +plt.subplot(224) +plt.plot(x, y) +plt.yscale('logit') +plt.title('logit') +plt.grid(True) +# Format the minor tick labels of the y-axis into empty strings with +# `NullFormatter`, to avoid cumbering the axis with too many labels. +plt.gca().yaxis.set_minor_formatter(NullFormatter()) +# Adjust the subplot layout, because the logit one may take more space +# than usual, due to y-tick labels like "1 - 10^{-3}" +plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10, right=0.95, hspace=0.25, + wspace=0.35) + +plt.show() +``` + +![sphx_glr_pyplot_010](https://matplotlib.org/_images/sphx_glr_pyplot_010.png) + +It is also possible to add your own scale, see [Developer's guide for creating scales and transformations](https://matplotlib.orgdevel/add_new_projection.html#adding-new-scales) for +details. + +**Total running time of the script:** ( 0 minutes 1.262 seconds) + +## Download + +- [Download Python source code: pyplot.py](https://matplotlib.org/_downloads/2dc0b600c5a44dd0a9ee2d1b44a67235/pyplot.py) +- [Download Jupyter notebook: pyplot.ipynb](https://matplotlib.org/_downloads/a5d09473b82821f8a7203a3d071d953a/pyplot.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/introductory/sample_plots.md b/Python/matplotlab/introductory/sample_plots.md new file mode 100644 index 00000000..436daf25 --- /dev/null +++ b/Python/matplotlab/introductory/sample_plots.md @@ -0,0 +1,411 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Sample plots in Matplotlib + +Here you'll find a host of example plots with the code that +generated them. + +## Line Plot + +Here's how to create a line plot with text labels using +[``plot()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot). + +
+ + + +

+ Simple Plot +

+
+ +## Multiple subplots in one figure + +Multiple axes (i.e. subplots) are created with the +[``subplot()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplot.html#matplotlib.pyplot.subplot) function: + +
+ + + +

+ Subplot +

+
+ +## Images + +Matplotlib can display images (assuming equally spaced +horizontal dimensions) using the [``imshow()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow) function. + +
+ + + +
+ +**Example of using [``imshow()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html#matplotlib.pyplot.imshow) to display a CT scan** + +## Contouring and pseudocolor + +The [``pcolormesh()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pcolormesh.html#matplotlib.pyplot.pcolormesh) function can make a colored +representation of a two-dimensional array, even if the horizontal dimensions +are unevenly spaced. The +[``contour()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.contour.html#matplotlib.pyplot.contour) function is another way to represent +the same data: + +
+ + + +
+ +**Example comparing [``pcolormesh()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pcolormesh.html#matplotlib.pyplot.pcolormesh) and [``contour()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.contour.html#matplotlib.pyplot.contour) for plotting two-dimensional data** + +## Histograms + +The [``hist()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.hist.html#matplotlib.pyplot.hist) function automatically generates +histograms and returns the bin counts or probabilities: + +
+ + + +

+ Histogram Features +

+
+ +## Paths + +You can add arbitrary paths in Matplotlib using the +[``matplotlib.path``](https://matplotlib.org/api/path_api.html#module-matplotlib.path) module: + +
+ + + +

+ Path Patch +

+
+ +## Three-dimensional plotting + +The mplot3d toolkit (see [Getting started](https://matplotlib.org//toolkits/mplot3d.html#toolkit-mplot3d-tutorial) and +[3D plotting](https://matplotlib.org/gallery/index.html#mplot3d-examples-index)) has support for simple 3d graphs +including surface, wireframe, scatter, and bar charts. + +
+ + + +

+ Surface3d +

+
+ +Thanks to John Porter, Jonathon Taylor, Reinier Heeres, and Ben Root for +the ``mplot3d`` toolkit. This toolkit is included with all standard Matplotlib +installs. + +## Streamplot + +The [``streamplot()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.streamplot.html#matplotlib.pyplot.streamplot) function plots the streamlines of +a vector field. In addition to simply plotting the streamlines, it allows you +to map the colors and/or line widths of streamlines to a separate parameter, +such as the speed or local intensity of the vector field. + +
+ + + +

+ Streamplot with various plotting options. +

+
+ +This feature complements the [``quiver()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.quiver.html#matplotlib.pyplot.quiver) function for +plotting vector fields. Thanks to Tom Flannaghan and Tony Yu for adding the +streamplot function. + +## Ellipses + +In support of the [Phoenix](http://www.jpl.nasa.gov/news/phoenix/main.php) +mission to Mars (which used Matplotlib to display ground tracking of +spacecraft), Michael Droettboom built on work by Charlie Moad to provide +an extremely accurate 8-spline approximation to elliptical arcs (see +[``Arc``](https://matplotlib.org/api/_as_gen/matplotlib.patches.Arc.html#matplotlib.patches.Arc)), which are insensitive to zoom level. + +
+ + + +

+ Ellipse Demo +

+
+ +## Bar charts + +Use the [``bar()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.bar.html#matplotlib.pyplot.bar) function to make bar charts, which +includes customizations such as error bars: + +
+ + + +

+ Barchart Demo +

+
+ +You can also create stacked bars +([bar_stacked.py](https://matplotlib.org/gallery/lines_bars_and_markers/bar_stacked.html)), +or horizontal bar charts +([barh.py](https://matplotlib.org/gallery/lines_bars_and_markers/barh.html)). + +## Pie charts + +The [``pie()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pie.html#matplotlib.pyplot.pie) function allows you to create pie +charts. Optional features include auto-labeling the percentage of area, +exploding one or more wedges from the center of the pie, and a shadow effect. +Take a close look at the attached code, which generates this figure in just +a few lines of code. + +
+ + + +

+ Pie Features +

+
+ +## Tables + +The [``table()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.table.html#matplotlib.pyplot.table) function adds a text table +to an axes. + +
+ + + +

+ Table Demo +

+
+ +## Scatter plots + +The [``scatter()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html#matplotlib.pyplot.scatter) function makes a scatter plot +with (optional) size and color arguments. This example plots changes +in Google's stock price, with marker sizes reflecting the +trading volume and colors varying with time. Here, the +alpha attribute is used to make semitransparent circle markers. + +
+ + + +

+ Scatter Demo2 +

+
+ +## GUI widgets + +Matplotlib has basic GUI widgets that are independent of the graphical +user interface you are using, allowing you to write cross GUI figures +and widgets. See [``matplotlib.widgets``](https://matplotlib.org/api/widgets_api.html#module-matplotlib.widgets) and the +[widget examples](https://matplotlib.org/gallery/index.html). + +
+ + + +

+ Slider and radio-button GUI. +

+
+ +## Filled curves + +The [``fill()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.fill.html#matplotlib.pyplot.fill) function lets you +plot filled curves and polygons: + +
+ + + +

+ Fill +

+
+ +Thanks to Andrew Straw for adding this function. + +## Date handling + +You can plot timeseries data with major and minor ticks and custom +tick formatters for both. + +
+ + + +

+ Date +

+
+ +See [``matplotlib.ticker``](https://matplotlib.org/api/ticker_api.html#module-matplotlib.ticker) and [``matplotlib.dates``](https://matplotlib.org/api/dates_api.html#module-matplotlib.dates) for details and usage. + +## Log plots + +The [``semilogx()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.semilogx.html#matplotlib.pyplot.semilogx), +[``semilogy()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.semilogy.html#matplotlib.pyplot.semilogy) and +[``loglog()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.loglog.html#matplotlib.pyplot.loglog) functions simplify the creation of +logarithmic plots. + +
+ + + +

+ Log Demo +

+
+ +Thanks to Andrew Straw, Darren Dale and Gregory Lielens for contributions +log-scaling infrastructure. + +## Polar plots + +The [``polar()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.polar.html#matplotlib.pyplot.polar) function generates polar plots. + +
+ + + +

+ Polar Demo +

+
+ +## Legends + +The [``legend()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html#matplotlib.pyplot.legend) function automatically +generates figure legends, with MATLAB-compatible legend-placement +functions. + +
+ + + +

+ Legend +

+
+ +Thanks to Charles Twardy for input on the legend function. + +## TeX-notation for text objects + +Below is a sampling of the many TeX expressions now supported by Matplotlib's +internal mathtext engine. The mathtext module provides TeX style mathematical +expressions using [FreeType](https://www.freetype.org/) +and the DejaVu, BaKoMa computer modern, or [STIX](http://www.stixfonts.org) +fonts. See the [``matplotlib.mathtext``](https://matplotlib.org/api/mathtext_api.html#module-matplotlib.mathtext) module for additional details. + +
+ + + +

+ Mathtext Examples +

+
+ +Matplotlib's mathtext infrastructure is an independent implementation and +does not require TeX or any external packages installed on your computer. See +the tutorial at [Writing mathematical expressions](https://matplotlib.org//text/mathtext.html). + +## Native TeX rendering + +Although Matplotlib's internal math rendering engine is quite +powerful, sometimes you need TeX. Matplotlib supports external TeX +rendering of strings with the *usetex* option. + +
+ + + +

+ Tex Demo +

+
+ +## EEG GUI + +You can embed Matplotlib into pygtk, wx, Tk, or Qt applications. +Here is a screenshot of an EEG viewer called [pbrain](https://github.com/nipy/pbrain). + +![eeg_small](https://matplotlib.org/_images/eeg_small.png) + +The lower axes uses [``specgram()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.specgram.html#matplotlib.pyplot.specgram) +to plot the spectrogram of one of the EEG channels. + +For examples of how to embed Matplotlib in different toolkits, see: + +- [Embedding in GTK3](https://matplotlib.org/gallery/user_interfaces/embedding_in_gtk3_sgskip.html) +- [Embedding in wx #2](https://matplotlib.org/gallery/user_interfaces/embedding_in_wx2_sgskip.html) +- [Matplotlib With Glade 3](https://matplotlib.org/gallery/user_interfaces/mpl_with_glade3_sgskip.html) +- [Embedding in Qt](https://matplotlib.org/gallery/user_interfaces/embedding_in_qt_sgskip.html) +- [Embedding in Tk](https://matplotlib.org/gallery/user_interfaces/embedding_in_tk_sgskip.html) + +## XKCD-style sketch plots + +Just for fun, Matplotlib supports plotting in the style of ``xkcd``. + +
+ + + +

+ xkcd +

+
+ +## Subplot example + +Many plot types can be combined in one figure to create +powerful and flexible representations of data. + +
+ +
+ +``` python +import matplotlib.pyplot as plt +import numpy as np + +np.random.seed(19680801) +data = np.random.randn(2, 100) + +fig, axs = plt.subplots(2, 2, figsize=(5, 5)) +axs[0, 0].hist(data[0]) +axs[1, 0].scatter(data[0], data[1]) +axs[0, 1].plot(data[0], data[1]) +axs[1, 1].hist2d(data[0], data[1]) + +plt.show() +``` + +## Download + +- [Download Python source code: sample_plots.py](https://matplotlib.org/_downloads/6b0f2d1b3dc8d0e75eaa96feb738e947/sample_plots.py) +- [Download Jupyter notebook: sample_plots.ipynb](https://matplotlib.org/_downloads/dcfd63fc031d50e9c085f5dc4aa458b1/sample_plots.ipynb) diff --git a/Python/matplotlab/introductory/usage.md b/Python/matplotlab/introductory/usage.md new file mode 100644 index 00000000..8feee2c6 --- /dev/null +++ b/Python/matplotlab/introductory/usage.md @@ -0,0 +1,812 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Usage Guide + +This tutorial covers some basic usage patterns and best-practices to +help you get started with Matplotlib. + +## General Concepts + +``matplotlib`` has an extensive codebase that can be daunting to many +new users. However, most of matplotlib can be understood with a fairly +simple conceptual framework and knowledge of a few important points. + +Plotting requires action on a range of levels, from the most general +(e.g., 'contour this 2-D array') to the most specific (e.g., 'color +this screen pixel red'). The purpose of a plotting package is to assist +you in visualizing your data as easily as possible, with all the necessary +control -- that is, by using relatively high-level commands most of +the time, and still have the ability to use the low-level commands when +needed. + +Therefore, everything in matplotlib is organized in a hierarchy. At the top +of the hierarchy is the matplotlib "state-machine environment" which is +provided by the [``matplotlib.pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) module. At this level, simple +functions are used to add plot elements (lines, images, text, etc.) to +the current axes in the current figure. + +::: tip Note + +Pyplot's state-machine environment behaves similarly to MATLAB and +should be most familiar to users with MATLAB experience. + +::: + +The next level down in the hierarchy is the first level of the object-oriented +interface, in which pyplot is used only for a few functions such as figure +creation, and the user explicitly creates and keeps track of the figure +and axes objects. At this level, the user uses pyplot to create figures, +and through those figures, one or more axes objects can be created. These +axes objects are then used for most plotting actions. + +For even more control -- which is essential for things like embedding +matplotlib plots in GUI applications -- the pyplot level may be dropped +completely, leaving a purely object-oriented approach. + +``` python +# sphinx_gallery_thumbnail_number = 3 +import matplotlib.pyplot as plt +import numpy as np +``` + +## Parts of a Figure + +![anatomy](https://matplotlib.org/_images/anatomy.png) + +### ``Figure`` + +The **whole** figure. The figure keeps +track of all the child [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes), a smattering of +'special' artists (titles, figure legends, etc), and the **canvas**. +(Don't worry too much about the canvas, it is crucial as it is the +object that actually does the drawing to get you your plot, but as the +user it is more-or-less invisible to you). A figure can have any +number of [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes), but to be useful should have +at least one. + +The easiest way to create a new figure is with pyplot: + +``` python +fig = plt.figure() # an empty figure with no axes +fig.suptitle('No axes on this figure') # Add a title so we know which it is + +fig, ax_lst = plt.subplots(2, 2) # a figure with a 2x2 grid of Axes +``` + +- ![sphx_glr_usage_001](https://matplotlib.org/_images/sphx_glr_usage_001.png) +- ![sphx_glr_usage_002](https://matplotlib.org/_images/sphx_glr_usage_002.png) + +### ``Axes`` + +This is what you think of as 'a plot', it is the region of the image +with the data space. A given figure +can contain many Axes, but a given [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) +object can only be in one [``Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure). The +Axes contains two (or three in the case of 3D) +[``Axis``](https://matplotlib.orgapi/axis_api.html#matplotlib.axis.Axis) objects (be aware of the difference +between **Axes** and **Axis**) which take care of the data limits (the +data limits can also be controlled via set via the +[``set_xlim()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_xlim.html#matplotlib.axes.Axes.set_xlim) and +[``set_ylim()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_ylim.html#matplotlib.axes.Axes.set_ylim) ``Axes`` methods). Each +``Axes`` has a title (set via +[``set_title()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_title.html#matplotlib.axes.Axes.set_title)), an x-label (set via +[``set_xlabel()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_xlabel.html#matplotlib.axes.Axes.set_xlabel)), and a y-label set via +[``set_ylabel()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_ylabel.html#matplotlib.axes.Axes.set_ylabel)). + +The ``Axes`` class and its member functions are the primary entry +point to working with the OO interface. + +### ``Axis`` + +These are the number-line-like objects. They take +care of setting the graph limits and generating the ticks (the marks +on the axis) and ticklabels (strings labeling the ticks). The +location of the ticks is determined by a +[``Locator``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.Locator) object and the ticklabel strings +are formatted by a [``Formatter``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.Formatter). The +combination of the correct ``Locator`` and ``Formatter`` gives +very fine control over the tick locations and labels. + +### ``Artist`` + +Basically everything you can see on the figure is an artist (even the +``Figure``, ``Axes``, and ``Axis`` objects). This +includes ``Text`` objects, ``Line2D`` objects, +``collection`` objects, ``Patch`` objects ... (you get the +idea). When the figure is rendered, all of the artists are drawn to +the **canvas**. Most Artists are tied to an Axes; such an Artist +cannot be shared by multiple Axes, or moved from one to another. + +## Types of inputs to plotting functions + +All of plotting functions expect ``np.array`` or ``np.ma.masked_array`` as +input. Classes that are 'array-like' such as [``pandas``](https://pandas.pydata.org/pandas-docs/stable/index.html#module-pandas) data objects +and ``np.matrix`` may or may not work as intended. It is best to +convert these to ``np.array`` objects prior to plotting. + +For example, to convert a [``pandas.DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) + +``` python +a = pandas.DataFrame(np.random.rand(4,5), columns = list('abcde')) +a_asarray = a.values +``` + +and to convert a ``np.matrix`` + +``` python +b = np.matrix([[1,2],[3,4]]) +b_asarray = np.asarray(b) +``` + +## Matplotlib, pyplot and pylab: how are they related? + +Matplotlib is the whole package and [``matplotlib.pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) is a module in +Matplotlib. + +For functions in the pyplot module, there is always a "current" figure and +axes (which is created automatically on request). For example, in the +following example, the first call to ``plt.plot`` creates the axes, then +subsequent calls to ``plt.plot`` add additional lines on the same axes, and +``plt.xlabel``, ``plt.ylabel``, ``plt.title`` and ``plt.legend`` set the +axes labels and title and add a legend. + +``` python +x = np.linspace(0, 2, 100) + +plt.plot(x, x, label='linear') +plt.plot(x, x**2, label='quadratic') +plt.plot(x, x**3, label='cubic') + +plt.xlabel('x label') +plt.ylabel('y label') + +plt.title("Simple Plot") + +plt.legend() + +plt.show() +``` + +![sphx_glr_usage_003](https://matplotlib.org/_images/sphx_glr_usage_003.png) + +``pylab`` is a convenience module that bulk imports +[``matplotlib.pyplot``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) (for plotting) and [``numpy``](https://docs.scipy.org/doc/numpy/reference/index.html#module-numpy) +(for mathematics and working with arrays) in a single namespace. +pylab is deprecated and its use is strongly discouraged because +of namespace pollution. Use pyplot instead. + +For non-interactive plotting it is suggested +to use pyplot to create the figures and then the OO interface for +plotting. + +## Coding Styles + +When viewing this documentation and examples, you will find different +coding styles and usage patterns. These styles are perfectly valid +and have their pros and cons. Just about all of the examples can be +converted into another style and achieve the same results. +The only caveat is to avoid mixing the coding styles for your own code. + +::: tip Note + +Developers for matplotlib have to follow a specific style and guidelines. +See [The Matplotlib Developers' Guide](https://matplotlib.orgdevel/index.html#developers-guide-index). + +::: + +Of the different styles, there are two that are officially supported. +Therefore, these are the preferred ways to use matplotlib. + +For the pyplot style, the imports at the top of your +scripts will typically be: + +``` python +import matplotlib.pyplot as plt +import numpy as np +``` + +Then one calls, for example, np.arange, np.zeros, np.pi, plt.figure, +plt.plot, plt.show, etc. Use the pyplot interface +for creating figures, and then use the object methods for the rest: + +``` python +x = np.arange(0, 10, 0.2) +y = np.sin(x) +fig, ax = plt.subplots() +ax.plot(x, y) +plt.show() +``` + +![sphx_glr_usage_004](https://matplotlib.org/_images/sphx_glr_usage_004.png) + +So, why all the extra typing instead of the MATLAB-style (which relies +on global state and a flat namespace)? For very simple things like +this example, the only advantage is academic: the wordier styles are +more explicit, more clear as to where things come from and what is +going on. For more complicated applications, this explicitness and +clarity becomes increasingly valuable, and the richer and more +complete object-oriented interface will likely make the program easier +to write and maintain. + +Typically one finds oneself making the same plots over and over +again, but with different data sets, which leads to needing to write +specialized functions to do the plotting. The recommended function +signature is something like: + +``` python +def my_plotter(ax, data1, data2, param_dict): + """ + A helper function to make a graph + + Parameters + ---------- + ax : Axes + The axes to draw to + + data1 : array + The x data + + data2 : array + The y data + + param_dict : dict + Dictionary of kwargs to pass to ax.plot + + Returns + ------- + out : list + list of artists added + """ + out = ax.plot(data1, data2, **param_dict) + return out + +# which you would then use as: + +data1, data2, data3, data4 = np.random.randn(4, 100) +fig, ax = plt.subplots(1, 1) +my_plotter(ax, data1, data2, {'marker': 'x'}) +``` + +![sphx_glr_usage_005](https://matplotlib.org/_images/sphx_glr_usage_005.png) + +or if you wanted to have 2 sub-plots: + +``` python +fig, (ax1, ax2) = plt.subplots(1, 2) +my_plotter(ax1, data1, data2, {'marker': 'x'}) +my_plotter(ax2, data3, data4, {'marker': 'o'}) +``` + +![sphx_glr_usage_006](https://matplotlib.org/_images/sphx_glr_usage_006.png) + +Again, for these simple examples this style seems like overkill, however +once the graphs get slightly more complex it pays off. + +## Backends + +### What is a backend? + +A lot of documentation on the website and in the mailing lists refers +to the "backend" and many new users are confused by this term. +matplotlib targets many different use cases and output formats. Some +people use matplotlib interactively from the python shell and have +plotting windows pop up when they type commands. Some people run +[Jupyter](https://jupyter.org) notebooks and draw inline plots for +quick data analysis. Others embed matplotlib into graphical user +interfaces like wxpython or pygtk to build rich applications. Some +people use matplotlib in batch scripts to generate postscript images +from numerical simulations, and still others run web application +servers to dynamically serve up graphs. + +To support all of these use cases, matplotlib can target different +outputs, and each of these capabilities is called a backend; the +"frontend" is the user facing code, i.e., the plotting code, whereas the +"backend" does all the hard work behind-the-scenes to make the figure. +There are two types of backends: user interface backends (for use in +pygtk, wxpython, tkinter, qt4, or macosx; also referred to as +"interactive backends") and hardcopy backends to make image files +(PNG, SVG, PDF, PS; also referred to as "non-interactive backends"). + +There are four ways to configure your backend. If they conflict each other, +the method mentioned last in the following list will be used, e.g. calling +[``use()``](https://matplotlib.orgapi/matplotlib_configuration_api.html#matplotlib.use) will override the setting in your ``matplotlibrc``. + +::: tip Note + +Backend name specifications are not case-sensitive; e.g., 'GTK3Agg' +and 'gtk3agg' are equivalent. + +::: + +With a typical installation of matplotlib, such as from a +binary installer or a linux distribution package, a good default +backend will already be set, allowing both interactive work and +plotting from scripts, with output to the screen and/or to +a file, so at least initially you will not need to use any of the +methods given above. + +If, however, you want to write graphical user interfaces, or a web +application server ([How to use Matplotlib in a web application server](https://matplotlib.orgfaq/howto_faq.html#howto-webapp)), or need a better +understanding of what is going on, read on. To make things a little +more customizable for graphical user interfaces, matplotlib separates +the concept of the renderer (the thing that actually does the drawing) +from the canvas (the place where the drawing goes). The canonical +renderer for user interfaces is ``Agg`` which uses the [Anti-Grain +Geometry](http://antigrain.com/) C++ library to make a raster (pixel) image of the figure. +All of the user interfaces except ``macosx`` can be used with +agg rendering, e.g., ``WXAgg``, ``GTK3Agg``, ``QT4Agg``, ``QT5Agg``, +``TkAgg``. In addition, some of the user interfaces support other rendering +engines. For example, with GTK+ 3, you can also select Cairo rendering +(backend ``GTK3Cairo``). + +For the rendering engines, one can also distinguish between [vector](https://en.wikipedia.org/wiki/Vector_graphics) or [raster](https://en.wikipedia.org/wiki/Raster_graphics) renderers. Vector +graphics languages issue drawing commands like "draw a line from this +point to this point" and hence are scale free, and raster backends +generate a pixel representation of the line whose accuracy depends on a +DPI setting. + +Here is a summary of the matplotlib renderers (there is an eponymous +backend for each; these are *non-interactive backends*, capable of +writing to a file): + + +--- + + + + + + +Renderer +Filetypes +Description + + + +[AGG](htt[[ps](https://matplotlib.org/../glossary/index.html#term-ps)](https://matplotlib.org/../glossary/index.html#term-ps)://matplotlib.org/../glossary/index.html#term-agg) +[[png](https://matplotlib.org/../glossary/index.html#term-png)](https://matplotlib.org/../glossary/index.html#term-png) +[[raster graphics](https://matplotlib.org/../glossary/index.html#term-raster-graphics)](https://matplotlib.org/../glossary/index.html#term-raster-graphics) -- high quality images +using the [Anti-Grain Geometry](http://antigrain.com/) engine + +PS +ps +[eps](https://matplotlib.org/../glossary/index.html#term-eps) +[[[[vector graphics](https://matplotlib.org/../glossary/index.html#term-vector-graphics)](https://matplotlib.org/../glossary/index.html#term-vector-graphics)](https://matplotlib.org/../glossary/index.html#term-vector-graphics)](https://matplotlib.org/../glossary/index.html#term-vector-graphics) -- [Postscript](https://en.wikipedia.org/wiki/PostScript) output + +PDF +[[pdf](https://matplotlib.org/../glossary/index.html#term-pdf)](https://matplotlib.org/../glossary/index.html#term-pdf) +vector graphics -- +[Portable Document Format](https://en.wikipedia.org/wiki/Portable_Document_Format) + +SVG +[[svg](https://matplotlib.org/../glossary/index.html#term-svg)](https://matplotlib.org/../glossary/index.html#term-svg) +vector graphics -- +[Scalable Vector Graphics](https://en.wikipedia.org/wiki/Scalable_Vector_Graphics) + +[Cairo](https://matplotlib.org/../glossary/index.html#term-cairo) +png +ps +pdf +svg +raster graphics and +vector graphics -- using the +[Cairo graphics](https://wwW.cairographics.org) library + + + + +And here are the user interfaces and renderer combinations supported; +these are *interactive backends*, capable of displaying to the screen +and of using appropriate renderers from the table above to write to +a file: + + +--- + + + + + +Backend +Description + + + +[Qt5](https://matplotlib.org/../glossary/index.html#term-qt5)Agg +Agg rendering in a Qt5 canvas (requires [PyQt5](https://riverbankcomputing.com/software/pyqt/intro)). This +backend can be activated in IPython with %matplotlib qt5. + +ipympl +Agg rendering embedded in a Jupyter widget. (requires ipympl). +This backend can be enabled in a Jupyter notebook with +%matplotlib ipympl. + +[[GTK](https://matplotlib.org/../glossary/index.html#term-gtk)](https://matplotlib.org/../glossary/index.html#term-gtk)3Agg +Agg rendering to a GTK 3.x canvas (requires [[PyGObject](https://wiki.gnome.org/action/show/Projects/PyGObject)](https://wiki.gnome.org/action/show/Projects/PyGObject), +and [[pycairo](https://www.cairographics.org/pycairo/)](https://www.cairographics.org/pycairo/) or [[cairocffi](https://pythonhosted.org/cairocffi/)](https://pythonhosted.org/cairocffi/)). This backend can be activated in +IPython with %matplotlib gtk3. + +macosx +Agg rendering into a Cocoa canvas in OSX. This backend can be +activated in IPython with %matplotlib osx. + +[Tk](https://matplotlib.org/../glossary/index.html#term-tk)Agg +Agg rendering to a Tk canvas (requires [TkInter](https://wiki.python.org/moin/TkInter)). This +backend can be activated in IPython with %matplotlib tk. + +nbAgg +Embed an interactive figure in a Jupyter classic notebook. This +backend can be enabled in Jupyter notebooks via +%matplotlib notebook. + +WebAgg +On show() will start a tornado server with an interactive +figure. + +GTK3Cairo +Cairo rendering to a GTK 3.x canvas (requires PyGObject, +and pycairo or cairocffi). + +[Qt4](https://matplotlib.org/../glossary/index.html#term-qt4)Agg +Agg rendering to a Qt4 canvas (requires [PyQt4](https://riverbankcomputing.com/software/pyqt/intro) or +pyside). This backend can be activated in IPython with +%matplotlib qt4. + +WXAgg +Agg rendering to a [wxWidgets](https://matplotlib.org/../glossary/index.html#term-wxwidgets) canvas (requires [wxPython](https://www.wxpython.org/) 4). +This backend can be activated in IPython with %matplotlib wx. + + + + +### ipympl + +The Jupyter widget ecosystem is moving too fast to support directly in +Matplotlib. To install ipympl + +``` bash +pip install ipympl +jupyter nbextension enable --py --sys-prefix ipympl +``` + +or + +``` bash +conda install ipympl -c conda-forge +``` + +See [jupyter-matplotlib](https://github.com/matplotlib/jupyter-matplotlib) +for more details. + +### GTK and Cairo + +``GTK3`` backends (*both* ``GTK3Agg`` and ``GTK3Cairo``) depend on Cairo +(pycairo>=1.11.0 or cairocffi). + +### How do I select PyQt4 or PySide? + +The ``QT_API`` environment variable can be set to either ``pyqt`` or ``pyside`` +to use ``PyQt4`` or ``PySide``, respectively. + +Since the default value for the bindings to be used is ``PyQt4``, +``matplotlib`` first tries to import it, if the import fails, it tries to +import ``PySide``. + +## What is interactive mode? + +Use of an interactive backend (see [What is a backend?](#what-is-a-backend)) +permits--but does not by itself require or ensure--plotting +to the screen. Whether and when plotting to the screen occurs, +and whether a script or shell session continues after a plot +is drawn on the screen, depends on the functions and methods +that are called, and on a state variable that determines whether +matplotlib is in "interactive mode". The default Boolean value is set +by the ``matplotlibrc`` file, and may be customized like any other +configuration parameter (see [Customizing Matplotlib with style sheets and rcParams](customizing.html)). It +may also be set via [``matplotlib.interactive()``](https://matplotlib.orgapi/matplotlib_configuration_api.html#matplotlib.interactive), and its +value may be queried via [``matplotlib.is_interactive()``](https://matplotlib.orgapi/matplotlib_configuration_api.html#matplotlib.is_interactive). Turning +interactive mode on and off in the middle of a stream of plotting +commands, whether in a script or in a shell, is rarely needed +and potentially confusing, so in the following we will assume all +plotting is done with interactive mode either on or off. + +::: tip Note + +Major changes related to interactivity, and in particular the +role and behavior of [``show()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.show.html#matplotlib.pyplot.show), were made in the +transition to matplotlib version 1.0, and bugs were fixed in +1.0.1. Here we describe the version 1.0.1 behavior for the +primary interactive backends, with the partial exception of +*macosx*. + +::: + +Interactive mode may also be turned on via [``matplotlib.pyplot.ion()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.ion.html#matplotlib.pyplot.ion), +and turned off via [``matplotlib.pyplot.ioff()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.ioff.html#matplotlib.pyplot.ioff). + +::: tip Note + +Interactive mode works with suitable backends in ipython and in +the ordinary python shell, but it does *not* work in the IDLE IDE. +If the default backend does not support interactivity, an interactive +backend can be explicitly activated using any of the methods discussed in [What is a backend?](#id4). + +::: + +### Interactive example + +From an ordinary python prompt, or after invoking ipython with no options, +try this: + +``` python +import matplotlib.pyplot as plt +plt.ion() +plt.plot([1.6, 2.7]) +``` + +Assuming you are running version 1.0.1 or higher, and you have +an interactive backend installed and selected by default, you should +see a plot, and your terminal prompt should also be active; you +can type additional commands such as: + +``` python +plt.title("interactive test") +plt.xlabel("index") +``` + +and you will see the plot being updated after each line. Since version 1.5, +modifying the plot by other means *should* also automatically +update the display on most backends. Get a reference to the [``Axes``](https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes) instance, +and call a method of that instance: + +``` python +ax = plt.gca() +ax.plot([3.1, 2.2]) +``` + +If you are using certain backends (like ``macosx``), or an older version +of matplotlib, you may not see the new line added to the plot immediately. +In this case, you need to explicitly call [``draw()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.draw.html#matplotlib.pyplot.draw) +in order to update the plot: + +``` python +plt.draw() +``` + +### Non-interactive example + +Start a fresh session as in the previous example, but now +turn interactive mode off: + +``` python +import matplotlib.pyplot as plt +plt.ioff() +plt.plot([1.6, 2.7]) +``` + +Nothing happened--or at least nothing has shown up on the +screen (unless you are using *macosx* backend, which is +anomalous). To make the plot appear, you need to do this: + +``` python +plt.show() +``` + +Now you see the plot, but your terminal command line is +unresponsive; the ``show()`` command *blocks* the input +of additional commands until you manually kill the plot +window. + +What good is this--being forced to use a blocking function? +Suppose you need a script that plots the contents of a file +to the screen. You want to look at that plot, and then end +the script. Without some blocking command such as show(), the +script would flash up the plot and then end immediately, +leaving nothing on the screen. + +In addition, non-interactive mode delays all drawing until +show() is called; this is more efficient than redrawing +the plot each time a line in the script adds a new feature. + +Prior to version 1.0, show() generally could not be called +more than once in a single script (although sometimes one +could get away with it); for version 1.0.1 and above, this +restriction is lifted, so one can write a script like this: + +``` python +import numpy as np +import matplotlib.pyplot as plt + +plt.ioff() +for i in range(3): + plt.plot(np.random.rand(10)) + plt.show() +``` + +which makes three plots, one at a time. I.e. the second plot will show up, +once the first plot is closed. + +### Summary + +In interactive mode, pyplot functions automatically draw +to the screen. + +When plotting interactively, if using +object method calls in addition to pyplot functions, then +call [``draw()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.draw.html#matplotlib.pyplot.draw) whenever you want to +refresh the plot. + +Use non-interactive mode in scripts in which you want to +generate one or more figures and display them before ending +or generating a new set of figures. In that case, use +[``show()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.show.html#matplotlib.pyplot.show) to display the figure(s) and +to block execution until you have manually destroyed them. + +## Performance + +Whether exploring data in interactive mode or programmatically +saving lots of plots, rendering performance can be a painful +bottleneck in your pipeline. Matplotlib provides a couple +ways to greatly reduce rendering time at the cost of a slight +change (to a settable tolerance) in your plot's appearance. +The methods available to reduce rendering time depend on the +type of plot that is being created. + +### Line segment simplification + +For plots that have line segments (e.g. typical line plots, +outlines of polygons, etc.), rendering performance can be +controlled by the ``path.simplify`` and +``path.simplify_threshold`` parameters in your +``matplotlibrc`` file (see +[Customizing Matplotlib with style sheets and rcParams](customizing.html) for +more information about the ``matplotlibrc`` file). +The ``path.simplify`` parameter is a boolean indicating whether +or not line segments are simplified at all. The +``path.simplify_threshold`` parameter controls how much line +segments are simplified; higher thresholds result in quicker +rendering. + +The following script will first display the data without any +simplification, and then display the same data with simplification. +Try interacting with both of them: + +``` python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib as mpl + +# Setup, and create the data to plot +y = np.random.rand(100000) +y[50000:] *= 2 +y[np.logspace(1, np.log10(50000), 400).astype(int)] = -1 +mpl.rcParams['path.simplify'] = True + +mpl.rcParams['path.simplify_threshold'] = 0.0 +plt.plot(y) +plt.show() + +mpl.rcParams['path.simplify_threshold'] = 1.0 +plt.plot(y) +plt.show() +``` + +Matplotlib currently defaults to a conservative simplification +threshold of ``1/9``. If you want to change your default settings +to use a different value, you can change your ``matplotlibrc`` +file. Alternatively, you could create a new style for +interactive plotting (with maximal simplification) and another +style for publication quality plotting (with minimal +simplification) and activate them as necessary. See +[Customizing Matplotlib with style sheets and rcParams](customizing.html) for +instructions on how to perform these actions. + +The simplification works by iteratively merging line segments +into a single vector until the next line segment's perpendicular +distance to the vector (measured in display-coordinate space) +is greater than the ``path.simplify_threshold`` parameter. + +::: tip Note + +Changes related to how line segments are simplified were made +in version 2.1. Rendering time will still be improved by these +parameters prior to 2.1, but rendering time for some kinds of +data will be vastly improved in versions 2.1 and greater. + +::: + +### Marker simplification + +Markers can also be simplified, albeit less robustly than +line segments. Marker simplification is only available +to [``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D) objects (through the +``markevery`` property). Wherever +[``Line2D``](https://matplotlib.orgapi/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D) construction parameters +are passed through, such as +[``matplotlib.pyplot.plot()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot) and +[``matplotlib.axes.Axes.plot()``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot), the ``markevery`` +parameter can be used: + +``` python +plt.plot(x, y, markevery=10) +``` + +The markevery argument allows for naive subsampling, or an +attempt at evenly spaced (along the *x* axis) sampling. See the +[Markevery Demo](https://matplotlib.orggallery/lines_bars_and_markers/markevery_demo.html) +for more information. + +### Splitting lines into smaller chunks + +If you are using the Agg backend (see [What is a backend?](#what-is-a-backend)), +then you can make use of the ``agg.path.chunksize`` rc parameter. +This allows you to specify a chunk size, and any lines with +greater than that many vertices will be split into multiple +lines, each of which has no more than ``agg.path.chunksize`` +many vertices. (Unless ``agg.path.chunksize`` is zero, in +which case there is no chunking.) For some kind of data, +chunking the line up into reasonable sizes can greatly +decrease rendering time. + +The following script will first display the data without any +chunk size restriction, and then display the same data with +a chunk size of 10,000. The difference can best be seen when +the figures are large, try maximizing the GUI and then +interacting with them: + +``` python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib as mpl +mpl.rcParams['path.simplify_threshold'] = 1.0 + +# Setup, and create the data to plot +y = np.random.rand(100000) +y[50000:] *= 2 +y[np.logspace(1,np.log10(50000), 400).astype(int)] = -1 +mpl.rcParams['path.simplify'] = True + +mpl.rcParams['agg.path.chunksize'] = 0 +plt.plot(y) +plt.show() + +mpl.rcParams['agg.path.chunksize'] = 10000 +plt.plot(y) +plt.show() +``` + +### Legends + +The default legend behavior for axes attempts to find the location +that covers the fewest data points (``loc='best'``). This can be a +very expensive computation if there are lots of data points. In +this case, you may want to provide a specific location. + +### Using the fast style + +The *fast* style can be used to automatically set +simplification and chunking parameters to reasonable +settings to speed up plotting large amounts of data. +It can be used simply by running: + +``` python +import matplotlib.style as mplstyle +mplstyle.use('fast') +``` + +It is very light weight, so it plays nicely with other +styles, just make sure the fast style is applied last +so that other styles do not overwrite the settings: + +``` python +mplstyle.use(['dark_background', 'ggplot', 'fast']) +``` + +## Download + +- [Download Python source code: usage.py](https://matplotlib.org/_downloads/841a514c2538fd0de68b22f22b25f56d/usage.py) +- [Download Jupyter notebook: usage.ipynb](https://matplotlib.org/_downloads/16d604c55fb650c0dce205aa67def02b/usage.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/pyplot_attr.md b/Python/matplotlab/pyplot_attr.md new file mode 100644 index 00000000..959cf746 --- /dev/null +++ b/Python/matplotlab/pyplot_attr.md @@ -0,0 +1,279 @@ +# 面向对象的绘图方式 + + +## 配置参数 + +* axex: 设置坐标轴边界和表面的颜色、坐标刻度值大小和网格的显示 +* figure: 控制dpi、边界颜色、图形大小、和子区( subplot)设置 +* font: 字体集(font family)、字体大小和样式设置 +* grid: 设置网格颜色和线性 +* legend: 设置图例和其中的文本的显示 +* line: 设置线条(颜色、线型、宽度等)和标记 +* patch: 是填充2D空间的图形对象,如多边形和圆。控制线宽、颜色和抗锯齿设置等。 +* savefig: 可以对保存的图形进行单独设置。例如,设置渲染的文件的背景为白色。 +* verbose: 设置matplotlib在执行期间信息输出,如silent、helpful、debug和debug-annoying。 +* xticks和yticks: 为x,y轴的主刻度和次刻度设置颜色、大小、方向,以及标签大小。 + + +## 线条风格 + +线条风格linestyle或ls | 描述 +----|--- +‘-‘ |实线 +‘:’ |虚线 +‘–’ |破折线 +‘None’,’ ‘,’’ |什么都不画 +‘-.’ |点划线 + +## 线条标记 + +标记maker | 描述 +----|---- +‘o’ |圆圈 +‘.’ |点 +‘D’ |菱形 +‘s’ |正方形 +‘h’ |六边形1 +‘*’ |星号 +‘H’ |六边形2 +‘d’ |小菱形 +‘_’ | 水平线 +‘v’ |一角朝下的三角形 +‘8’ |八边形 +‘<’ | 一角朝左的三角形 +‘p’ |五边形 +‘>’ |一角朝右的三角形 +‘,’ |像素 +‘^’ | 一角朝上的三角形 +‘+’ | 加号 +‘\ ‘ |竖线 +‘None’,’’,’ ‘ |无 +‘x’ | X + + +## 颜色 + +别名 | 颜色 +---|--- +b | 蓝色 +g |绿色 +r |红色 +y |黄色 +c |青色 +k |黑色 +m |洋红色 +w |白色 + + +## 绘图步骤 + +```py +#使用numpy产生数据 +x=np.arange(-5,5,0.1) +y=x*3 + +#创建窗口、子图 +#方法1:先创建窗口,再创建子图。(一定绘制) +fig = plt.figure(num=1, figsize=(15, 8),dpi=80) #开启一个窗口,同时设置大小,分辨率 +ax1 = fig.add_subplot(2,1,1) #通过fig添加子图,参数:行数,列数,第几个。 +ax2 = fig.add_subplot(2,1,2) #通过fig添加子图,参数:行数,列数,第几个。 +print(fig,ax1,ax2) +#方法2:一次性创建窗口和多个子图。(空白不绘制) +fig,axarr = plt.subplots(4,1) #开一个新窗口,并添加4个子图,返回子图数组 +ax1 = axarr[0] #通过子图数组获取一个子图 +print(fig,ax1) +#方法3:一次性创建窗口和一个子图。(空白不绘制) +ax1 = plt.subplot(1,1,1,facecolor='white') #开一个新窗口,创建1个子图。facecolor设置背景颜色 +print(ax1) +#获取对窗口的引用,适用于上面三种方法 +# fig = plt.gcf() #获得当前figure +# fig=ax1.figure #获得指定子图所属窗口 + +# fig.subplots_adjust(left=0) #设置窗口左内边距为0,即左边留白为0。 + +#设置子图的基本元素 +ax1.set_title('python-drawing') #设置图体,plt.title +ax1.set_xlabel('x-name') #设置x轴名称,plt.xlabel +ax1.set_ylabel('y-name') #设置y轴名称,plt.ylabel +plt.axis([-6,6,-10,10]) #设置横纵坐标轴范围,这个在子图中被分解为下面两个函数 +ax1.set_xlim(-5,5) #设置横轴范围,会覆盖上面的横坐标,plt.xlim +ax1.set_ylim(-10,10) #设置纵轴范围,会覆盖上面的纵坐标,plt.ylim + +xmajorLocator = MultipleLocator(2) #定义横向主刻度标签的刻度差为2的倍数。就是隔几个刻度才显示一个标签文本 +ymajorLocator = MultipleLocator(3) #定义纵向主刻度标签的刻度差为3的倍数。就是隔几个刻度才显示一个标签文本 + +ax1.xaxis.set_major_locator(xmajorLocator) #x轴 应用定义的横向主刻度格式。如果不应用将采用默认刻度格式 +ax1.yaxis.set_major_locator(ymajorLocator) #y轴 应用定义的纵向主刻度格式。如果不应用将采用默认刻度格式 + +ax1.xaxis.grid(True, which='major') #x坐标轴的网格使用定义的主刻度格式 +ax1.yaxis.grid(True, which='major') #x坐标轴的网格使用定义的主刻度格式 + +ax1.set_xticks([]) #去除坐标轴刻度 +ax1.set_xticks((-5,-3,-1,1,3,5)) #设置坐标轴刻度 +ax1.set_xticklabels(labels=['x1','x2','x3','x4','x5'],rotation=-30,fontsize='small') #设置刻度的显示文本,rotation旋转角度,fontsize字体大小 + +plot1=ax1.plot(x,y,marker='o',color='g',label='legend1') #点图:marker图标 +plot2=ax1.plot(x,y,linestyle='--',alpha=0.5,color='r',label='legend2') #线图:linestyle线性,alpha透明度,color颜色,label图例文本 + +ax1.legend(loc='upper left') #显示图例,plt.legend() +ax1.text(2.8, 7, r'y=3*x') #指定位置显示文字,plt.text() +ax1.annotate('important point', xy=(2, 6), xytext=(3, 1.5), #添加标注,参数:注释文本、指向点、文字位置、箭头属性 + arrowprops=dict(facecolor='black', shrink=0.05), + ) +#显示网格。which参数的值为major(只绘制大刻度)、minor(只绘制小刻度)、both,默认值为major。axis为'x','y','both' +ax1.grid(b=True,which='major',axis='both',alpha= 0.5,color='skyblue',linestyle='--',linewidth=2) + +axes1 = plt.axes([.2, .3, .1, .1], facecolor='y') #在当前窗口添加一个子图,rect=[左, 下, 宽, 高],是使用的绝对布局,不和以存在窗口挤占空间 +axes1.plot(x,y) #在子图上画图 +plt.savefig('aa.jpg',dpi=400,bbox_inches='tight') #savefig保存图片,dpi分辨率,bbox_inches子图周边白色空间的大小 +plt.show() #打开窗口,对于方法1创建在窗口一定绘制,对于方法2方法3创建的窗口,若坐标系全部空白,则不绘制 + +``` + +## plot属性 + +```py +属性 值类型 +alpha 浮点值 +animated [True / False] +antialiased or aa [True / False] +clip_box matplotlib.transform.Bbox 实例 +clip_on [True / False] +clip_path Path 实例, Transform,以及Patch实例 +color or c 任何 matplotlib 颜色 +contains 命中测试函数 +dash_capstyle ['butt' / 'round' / 'projecting'] +dash_joinstyle ['miter' / 'round' / 'bevel'] +dashes 以点为单位的连接/断开墨水序列 +data (np.array xdata, np.array ydata) +figure matplotlib.figure.Figure 实例 +label 任何字符串 +linestyle or ls [ '-' / '--' / '-.' / ':' / 'steps' / ...] +linewidth or lw 以点为单位的浮点值 +lod [True / False] +marker [ '+' / ',' / '.' / '1' / '2' / '3' / '4' ] +markeredgecolor or mec 任何 matplotlib 颜色 +markeredgewidth or mew 以点为单位的浮点值 +markerfacecolor or mfc 任何 matplotlib 颜色 +markersize or ms 浮点值 +markevery [ None / 整数值 / (startind, stride) ] +picker 用于交互式线条选择 +pickradius 线条的拾取选择半径 +solid_capstyle ['butt' / 'round' / 'projecting'] +solid_joinstyle ['miter' / 'round' / 'bevel'] +transform matplotlib.transforms.Transform 实例 +visible [True / False] +xdata np.array +ydata np.array +zorder 任何数值 +``` + +## 多图绘制 + +```py +#一个窗口,多个图,多条数据 +sub1=plt.subplot(211,facecolor=(0.1843,0.3098,0.3098)) #将窗口分成2行1列,在第1个作图,并设置背景色 +sub2=plt.subplot(212) #将窗口分成2行1列,在第2个作图 +sub1.plot(x,y) #绘制子图 +sub2.plot(x,y) #绘制子图 + +axes1 = plt.axes([.2, .3, .1, .1], facecolor='y') #添加一个子坐标系,rect=[左, 下, 宽, 高] +plt.plot(x,y) #绘制子坐标系, +axes2 = plt.axes([0.7, .2, .1, .1], facecolor='y') #添加一个子坐标系,rect=[左, 下, 宽, 高] +plt.plot(x,y) +plt.show() +``` +## 极坐标 + +```py +fig = plt.figure(2) #新开一个窗口 +ax1 = fig.add_subplot(1,2,1,polar=True) #启动一个极坐标子图 +theta=np.arange(0,2*np.pi,0.02) #角度数列值 +ax1.plot(theta,2*np.ones_like(theta),lw=2) #画图,参数:角度,半径,lw线宽 +ax1.plot(theta,theta/6,linestyle='--',lw=2) #画图,参数:角度,半径,linestyle样式,lw线宽 + +ax2 = fig.add_subplot(1,2,2,polar=True) #启动一个极坐标子图 +ax2.plot(theta,np.cos(5*theta),linestyle='--',lw=2) +ax2.plot(theta,2*np.cos(4*theta),lw=2) + +ax2.set_rgrids(np.arange(0.2,2,0.2),angle=45) #距离网格轴,轴线刻度和显示位置 +ax2.set_thetagrids([0,45,90]) #角度网格轴,范围0-360度 + +plt.show() +``` + +## 柱状图 + +```py +plt.figure(3) +x_index = np.arange(5) #柱的索引 +x_data = ('A', 'B', 'C', 'D', 'E') +y1_data = (20, 35, 30, 35, 27) +y2_data = (25, 32, 34, 20, 25) +bar_width = 0.35 #定义一个数字代表每个独立柱的宽度 + +rects1 = plt.bar(x_index, y1_data, width=bar_width,alpha=0.4, color='b',label='legend1') #参数:左偏移、高度、柱宽、透明度、颜色、图例 +rects2 = plt.bar(x_index + bar_width, y2_data, width=bar_width,alpha=0.5,color='r',label='legend2') #参数:左偏移、高度、柱宽、透明度、颜色、图例 +#关于左偏移,不用关心每根柱的中心不中心,因为只要把刻度线设置在柱的中间就可以了 +plt.xticks(x_index + bar_width/2, x_data) #x轴刻度线 +plt.legend() #显示图例 +plt.tight_layout() #自动控制图像外部边缘,此方法不能够很好的控制图像间的间隔 +plt.show() +``` + +## 直方图 + +```py +fig,(ax0,ax1) = plt.subplots(nrows=2,figsize=(9,6)) #在窗口上添加2个子图 +sigma = 1 #标准差 +mean = 0 #均值 +x=mean+sigma*np.random.randn(10000) #正态分布随机数 +ax0.hist(x,bins=40,normed=False,histtype='bar',facecolor='yellowgreen',alpha=0.75) #normed是否归一化,histtype直方图类型,facecolor颜色,alpha透明度 +ax1.hist(x,bins=20,normed=1,histtype='bar',facecolor='pink',alpha=0.75,cumulative=True,rwidth=0.8) #bins柱子的个数,cumulative是否计算累加分布,rwidth柱子宽度 +plt.show() #所有窗口运行 +``` + +## 散点图 + +```py +fig = plt.figure(4) #添加一个窗口 +ax =fig.add_subplot(1,1,1) #在窗口上添加一个子图 +x=np.random.random(100) #产生随机数组 +y=np.random.random(100) #产生随机数组 +ax.scatter(x,y,s=x*1000,c='y',marker=(5,1),alpha=0.5,lw=2,facecolors='none') #x横坐标,y纵坐标,s图像大小,c颜色,marker图片,lw图像边框宽度 +plt.show() #所有窗口运行 +``` + +## 三维图 + +```py +fig = plt.figure(5) +ax=fig.add_subplot(1,1,1,projection='3d') #绘制三维图 + +x,y=np.mgrid[-2:2:20j,-2:2:20j] #获取x轴数据,y轴数据 +z=x*np.exp(-x**2-y**2) #获取z轴数据 + +ax.plot_surface(x,y,z,rstride=2,cstride=1,cmap=plt.cm.coolwarm,alpha=0.8) #绘制三维图表面 +ax.set_xlabel('x-name') #x轴名称 +ax.set_ylabel('y-name') #y轴名称 +ax.set_zlabel('z-name') #z轴名称 + +plt.show() +``` + +## 集合图形 + +```py +fig = plt.figure(6) #创建一个窗口 +ax=fig.add_subplot(1,1,1) #添加一个子图 +rect1 = plt.Rectangle((0.1,0.2),0.2,0.3,color='r') #创建一个矩形,参数:(x,y),width,height +circ1 = plt.Circle((0.7,0.2),0.15,color='r',alpha=0.3) #创建一个椭圆,参数:中心点,半径,默认这个圆形会跟随窗口大小进行长宽压缩 +pgon1 = plt.Polygon([[0.45,0.45],[0.65,0.6],[0.2,0.6]]) #创建一个多边形,参数:每个顶点坐标 + +ax.add_patch(rect1) #将形状添加到子图上 +ax.add_patch(circ1) #将形状添加到子图上 +ax.add_patch(pgon1) #将形状添加到子图上 + +fig.canvas.draw() #子图绘制 +plt.show() +``` \ No newline at end of file diff --git a/Python/matplotlab/pyplot_function.md b/Python/matplotlab/pyplot_function.md new file mode 100644 index 00000000..1b8d4a16 --- /dev/null +++ b/Python/matplotlab/pyplot_function.md @@ -0,0 +1,184 @@ + + +## 模块简介 + +matplotlib.pyplot is a state-based interface to matplotlib. It provides a MATLAB-like way of plotting. + + +## 使用教程 +pyplot is mainly intended for interactive plots and simple cases of programmatic plot generation: + +```py +import numpy as np +import matplotlib.pyplot as plt + +x = np.arange(0, 5, 0.1) +y = np.sin(x) +plt.plot(x, y) +``` + + + +## 内部函数 + +| 名称 | 作用 | +|------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------| +| acorr\(x, \*\[, data\]\) | Plot the autocorrelation of x\. | +| angle\_spectrum\(x\[, Fs, Fc, window, pad\_to, \.\.\.\]\) | Plot the angle spectrum\. | +| annotate\(text, xy, \*args, \*\*kwargs\) | Annotate the point xy with text text\. | +| arrow\(x, y, dx, dy, \*\*kwargs\) | Add an arrow to the axes\. | +| autoscale\(\[enable, axis, tight\]\) | Autoscale the axis view to the data \(toggle\)\. | +| autumn\(\) | Set the colormap to "autumn"\. | +| axes\(\[arg\]\) | Add an axes to the current figure and make it the current axes\. | +| axhline\(\[y, xmin, xmax\]\) | Add a horizontal line across the axis\. | +| axhspan\(ymin, ymax\[, xmin, xmax\]\) | Add a horizontal span \(rectangle\) across the axis\. | +| axis\(\*args\[, emit\]\) | Convenience method to get or set some axis properties\. | +| axline\(xy1\[, xy2, slope\]\) | Add an infinitely long straight line\. | +| axvline\(\[x, ymin, ymax\]\) | Add a vertical line across the axes\. | +| axvspan\(xmin, xmax\[, ymin, ymax\]\) | Add a vertical span \(rectangle\) across the axes\. | +| bar\(x, height\[, width, bottom, align, data\]\) | Make a bar plot\. | +| barbs\(\*args\[, data\]\) | Plot a 2D field of barbs\. | +| barh\(y, width\[, height, left, align\]\) | Make a horizontal bar plot\. | +| bone\(\) | Set the colormap to "bone"\. | +| box\(\[on\]\) | Turn the axes box on or off on the current axes\. | +| boxplot\(x\[, notch, sym, vert, whis, \.\.\.\]\) | Make a box and whisker plot\. | +| broken\_barh\(xranges, yrange, \*\[, data\]\) | Plot a horizontal sequence of rectangles\. | +| cla\(\) | Clear the current axes\. | +| clabel\(CS\[, levels\]\) | Label a contour plot\. | +| clf\(\) | Clear the current figure\. | +| clim\(\[vmin, vmax\]\) | Set the color limits of the current image\. | +| close\(\[fig\]\) | Close a figure window\. | +| cohere\(x, y\[, NFFT, Fs, Fc, detrend, \.\.\.\]\) | Plot the coherence between x and y\. | +| colorbar\(\[mappable, cax, ax\]\) | Add a colorbar to a plot\. | +| connect\(s, func\) | Bind function func to event s\. | +| contour\(\*args\[, data\]\) | Plot contours\. | +| contourf\(\*args\[, data\]\) | Plot contours\. | +| cool\(\) | Set the colormap to "cool"\. | +| copper\(\) | Set the colormap to "copper"\. | +| csd\(x, y\[, NFFT, Fs, Fc, detrend, window, \.\.\.\]\) | Plot the cross\-spectral density\. | +| delaxes\(\[ax\]\) | Remove an Axes \(defaulting to the current axes\) from its figure\. | +| disconnect\(cid\) | Disconnect the callback with id cid\. | +| draw\(\) | Redraw the current figure\. | +| draw\_if\_interactive\(\) | | +| errorbar\(x, y\[, yerr, xerr, fmt, ecolor, \.\.\.\]\) | Plot y versus x as lines and/or markers with attached errorbars\. | +| eventplot\(positions\[, orientation, \.\.\.\]\) | Plot identical parallel lines at the given positions\. | +| figimage\(X\[, xo, yo, alpha, norm, cmap, \.\.\.\]\) | Add a non\-resampled image to the figure\. | +| figlegend\(\*args, \*\*kwargs\) | Place a legend on the figure\. | +| fignum\_exists\(num\) | Return whether the figure with the given id exists\. | +| figtext\(x, y, s\[, fontdict\]\) | Add text to figure\. | +| figure\(\[num, figsize, dpi, facecolor, \.\.\.\]\) | Create a new figure, or activate an existing figure\. | +| fill\(\*args\[, data\]\) | Plot filled polygons\. | +| fill\_between\(x, y1\[, y2, where, \.\.\.\]\) | Fill the area between two horizontal curves\. | +| fill\_betweenx\(y, x1\[, x2, where, step, \.\.\.\]\) | Fill the area between two vertical curves\. | +| findobj\(\[o, match, include\_self\]\) | Find artist objects\. | +| flag\(\) | Set the colormap to "flag"\. | +| gca\(\*\*kwargs\) | Get the current axes, creating one if necessary\. | +| gcf\(\) | Get the current figure\. | +| gci\(\) | Get the current colorable artist\. | +| get\(obj, \*args, \*\*kwargs\) | Return the value of an object's property, or print all of them\. | +| get\_current\_fig\_manager\(\) | Return the figure manager of the current figure\. | +| get\_figlabels\(\) | Return a list of existing figure labels\. | +| get\_fignums\(\) | Return a list of existing figure numbers\. | +| get\_plot\_commands\(\) | Get a sorted list of all of the plotting commands\. | +| getp\(obj, \*args, \*\*kwargs\) | Return the value of an object's property, or print all of them\. | +| ginput\(\[n, timeout, show\_clicks, mouse\_add, \.\.\.\]\) | Blocking call to interact with a figure\. | +| gray\(\) | Set the colormap to "gray"\. | +| grid\(\[b, which, axis\]\) | Configure the grid lines\. | +| hexbin\(x, y\[, C, gridsize, bins, xscale, \.\.\.\]\) | Make a 2D hexagonal binning plot of points x, y\. | +| hist\(x\[, bins, range, density, weights, \.\.\.\]\) | Plot a histogram\. | +| hist2d\(x, y\[, bins, range, density, \.\.\.\]\) | Make a 2D histogram plot\. | +| hlines\(y, xmin, xmax\[, colors, linestyles, \.\.\.\]\) | Plot horizontal lines at each y from xmin to xmax\. | +| hot\(\) | Set the colormap to "hot"\. | +| hsv\(\) | Set the colormap to "hsv"\. | +| imread\(fname\[, format\]\) | Read an image from a file into an array\. | +| imsave\(fname, arr, \*\*kwargs\) | Save an array as an image file\. | +| imshow\(X\[, cmap, norm, aspect, \.\.\.\]\) | Display data as an image, i\.e\., on a 2D regular raster\. | +| inferno\(\) | Set the colormap to "inferno"\. | +| install\_repl\_displayhook\(\) | Install a repl display hook so that any stale figure are automatically redrawn when control is returned to the repl\. | +| ioff\(\) | Turn the interactive mode off\. | +| ion\(\) | Turn the interactive mode on\. | +| isinteractive\(\) | Return if pyplot is in "interactive mode" or not\. | +| jet\(\) | Set the colormap to "jet"\. | +| legend\(\*args, \*\*kwargs\) | Place a legend on the axes\. | +| locator\_params\(\[axis, tight\]\) | Control behavior of major tick locators\. | +| loglog\(\*args, \*\*kwargs\) | Make a plot with log scaling on both the x and y axis\. | +| magma\(\) | Set the colormap to "magma"\. | +| magnitude\_spectrum\(x\[, Fs, Fc, window, \.\.\.\]\) | Plot the magnitude spectrum\. | +| margins\(\*margins\[, x, y, tight\]\) | Set or retrieve autoscaling margins\. | +| matshow\(A\[, fignum\]\) | Display an array as a matrix in a new figure window\. | +| minorticks\_off\(\) | Remove minor ticks from the axes\. | +| minorticks\_on\(\) | Display minor ticks on the axes\. | +| new\_figure\_manager\(num, \*args, \*\*kwargs\) | Create a new figure manager instance\. | +| nipy\_spectral\(\) | Set the colormap to "nipy\_spectral"\. | +| pause\(interval\) | Run the GUI event loop for interval seconds\. | +| pcolor\(\*args\[, shading, alpha, norm, cmap, \.\.\.\]\) | Create a pseudocolor plot with a non\-regular rectangular grid\. | +| pcolormesh\(\*args\[, alpha, norm, cmap, vmin, \.\.\.\]\) | Create a pseudocolor plot with a non\-regular rectangular grid\. | +| phase\_spectrum\(x\[, Fs, Fc, window, pad\_to, \.\.\.\]\) | Plot the phase spectrum\. | +| pie\(x\[, explode, labels, colors, autopct, \.\.\.\]\) | Plot a pie chart\. | +| pink\(\) | Set the colormap to "pink"\. | +| plasma\(\) | Set the colormap to "plasma"\. | +| plot\(\*args\[, scalex, scaley, data\]\) | Plot y versus x as lines and/or markers\. | +| plot\_date\(x, y\[, fmt, tz, xdate, ydate, data\]\) | Plot data that contains dates\. | +| polar\(\*args, \*\*kwargs\) | Make a polar plot\. | +| prism\(\) | Set the colormap to "prism"\. | +| psd\(x\[, NFFT, Fs, Fc, detrend, window, \.\.\.\]\) | Plot the power spectral density\. | +| quiver\(\*args\[, data\]\) | Plot a 2D field of arrows\. | +| quiverkey\(Q, X, Y, U, label, \*\*kw\) | Add a key to a quiver plot\. | +| rc\(group, \*\*kwargs\) | Set the current rcParams\. | +| rc\_context\(\[rc, fname\]\) | Return a context manager for temporarily changing rcParams\. | +| rcdefaults\(\) | Restore the rcParams from Matplotlib's internal default style\. | +| rgrids\(\[radii, labels, angle, fmt\]\) | Get or set the radial gridlines on the current polar plot\. | +| savefig\(\*args, \*\*kwargs\) | Save the current figure\. | +| sca\(ax\) | Set the current Axes to ax and the current Figure to the parent of ax\. | +| scatter\(x, y\[, s, c, marker, cmap, norm, \.\.\.\]\) | A scatter plot of y vs\. | +| sci\(im\) | Set the current image\. | +| semilogx\(\*args, \*\*kwargs\) | Make a plot with log scaling on the x axis\. | +| semilogy\(\*args, \*\*kwargs\) | Make a plot with log scaling on the y axis\. | +| set\_cmap\(cmap\) | Set the default colormap, and applies it to the current image if any\. | +| setp\(obj, \*args, \*\*kwargs\) | Set a property on an artist object\. | +| show\(\*\[, block\]\) | Display all open figures\. | +| specgram\(x\[, NFFT, Fs, Fc, detrend, window, \.\.\.\]\) | Plot a spectrogram\. | +| spring\(\) | Set the colormap to "spring"\. | +| spy\(Z\[, precision, marker, markersize, \.\.\.\]\) | Plot the sparsity pattern of a 2D array\. | +| stackplot\(x, \*args\[, labels, colors, \.\.\.\]\) | Draw a stacked area plot\. | +| stem\(\*args\[, linefmt, markerfmt, basefmt, \.\.\.\]\) | Create a stem plot\. | +| step\(x, y, \*args\[, where, data\]\) | Make a step plot\. | +| streamplot\(x, y, u, v\[, density, linewidth, \.\.\.\]\) | Draw streamlines of a vector flow\. | +| subplot\(\*args, \*\*kwargs\) | Add a subplot to the current figure\. | +| subplot2grid\(shape, loc\[, rowspan, colspan, fig\]\) | Create a subplot at a specific location inside a regular grid\. | +| subplot\_mosaic\(layout, \*\[, subplot\_kw, \.\.\.\]\) | Build a layout of Axes based on ASCII art or nested lists\. | +| subplot\_tool\(\[targetfig\]\) | Launch a subplot tool window for a figure\. | +| subplots\(\[nrows, ncols, sharex, sharey, \.\.\.\]\) | Create a figure and a set of subplots\. | +| subplots\_adjust\(\[left, bottom, right, top, \.\.\.\]\) | Adjust the subplot layout parameters\. | +| summer\(\) | Set the colormap to "summer"\. | +| suptitle\(t, \*\*kwargs\) | Add a centered title to the figure\. | +| switch\_backend\(newbackend\) | Close all open figures and set the Matplotlib backend\. | +| table\(\[cellText, cellColours, cellLoc, \.\.\.\]\) | Add a table to an Axes\. | +| text\(x, y, s\[, fontdict\]\) | Add text to the axes\. | +| thetagrids\(\[angles, labels, fmt\]\) | Get or set the theta gridlines on the current polar plot\. | +| tick\_params\(\[axis\]\) | Change the appearance of ticks, tick labels, and gridlines\. | +| ticklabel\_format\(\*\[, axis, style, \.\.\.\]\) | Configure the ScalarFormatter used by default for linear axes\. | +| tight\_layout\(\*\[, pad, h\_pad, w\_pad, rect\]\) | Adjust the padding between and around subplots\. | +| title\(label\[, fontdict, loc, pad, y\]\) | Set a title for the axes\. | +| tricontour\(\*args, \*\*kwargs\) | Draw contour lines on an unstructured triangular grid\. | +| tricontourf\(\*args, \*\*kwargs\) | Draw contour regions on an unstructured triangular grid\. | +| tripcolor\(\*args\[, alpha, norm, cmap, vmin, \.\.\.\]\) | Create a pseudocolor plot of an unstructured triangular grid\. | +| triplot\(\*args, \*\*kwargs\) | Draw a unstructured triangular grid as lines and/or markers\. | +| twinx\(\[ax\]\) | Make and return a second axes that shares the x\-axis\. | +| twiny\(\[ax\]\) | Make and return a second axes that shares the y\-axis\. | +| uninstall\_repl\_displayhook\(\) | Uninstall the matplotlib display hook\. | +| violinplot\(dataset\[, positions, vert, \.\.\.\]\) | Make a violin plot\. | +| viridis\(\) | Set the colormap to "viridis"\. | +| vlines\(x, ymin, ymax\[, colors, linestyles, \.\.\.\]\) | Plot vertical lines\. | +| waitforbuttonpress\(\[timeout\]\) | Blocking call to interact with the figure\. | +| winter\(\) | Set the colormap to "winter"\. | +| xcorr\(x, y\[, normed, detrend, usevlines, \.\.\.\]\) | Plot the cross correlation between x and y\. | +| xkcd\(\[scale, length, randomness\]\) | Turn on xkcd sketch\-style drawing mode\.This will only have effect on things drawn after this function is called\.\. | +| xlabel\(xlabel\[, fontdict, labelpad, loc\]\) | Set the label for the x\-axis\. | +| xlim\(\*args, \*\*kwargs\) | Get or set the x limits of the current axes\. | +| xscale\(value, \*\*kwargs\) | Set the x\-axis scale\. | +| xticks\(\[ticks, labels\]\) | Get or set the current tick locations and labels of the x\-axis\. | +| ylabel\(ylabel\[, fontdict, labelpad, loc\]\) | Set the label for the y\-axis\. | +| ylim\(\*args, \*\*kwargs\) | Get or set the y\-limits of the current axes\. | +| yscale\(value, \*\*kwargs\) | Set the y\-axis scale\. | +| yticks\(\[ticks, labels\]\) | Get or set the current tick locations and labels of the y\-axis\. | diff --git a/Python/matplotlab/text/annotations.md b/Python/matplotlab/text/annotations.md new file mode 100644 index 00000000..54cd3b0d --- /dev/null +++ b/Python/matplotlab/text/annotations.md @@ -0,0 +1,580 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Annotations + +Annotating text with Matplotlib. + +Table of Contents + +- [Annotations](#annotations) +- [Basic annotation](#basic-annotation) +- [Advanced Annotation](#advanced-annotation) +[Annotating with Text with Box](#annotating-with-text-with-box) +[Annotating with Arrow](#annotating-with-arrow) +[Placing Artist at the anchored location of the Axes](#placing-artist-at-the-anchored-location-of-the-axes) +[Using Complex Coordinates with Annotations](#using-complex-coordinates-with-annotations) +[Using ConnectionPatch](#using-connectionpatch) +[Advanced Topics](#advanced-topics) + + +[Zoom effect between Axes](#zoom-effect-between-axes) +[Define Custom BoxStyle](#define-custom-boxstyle) +- [Annotating with Text with Box](#annotating-with-text-with-box) +- [Annotating with Arrow](#annotating-with-arrow) +- [Placing Artist at the anchored location of the Axes](#placing-artist-at-the-anchored-location-of-the-axes) +- [Using Complex Coordinates with Annotations](#using-complex-coordinates-with-annotations) +- [Using ConnectionPatch](#using-connectionpatch) +[Advanced Topics](#advanced-topics) +- [Advanced Topics](#advanced-topics) +- [Zoom effect between Axes](#zoom-effect-between-axes) +- [Define Custom BoxStyle](#define-custom-boxstyle) +# Basic annotation + +The uses of the basic [``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) will place text +at an arbitrary position on the Axes. A common use case of text is to +annotate some feature of the plot, and the +``annotate()`` method provides helper functionality +to make annotations easy. In an annotation, there are two points to +consider: the location being annotated represented by the argument +``xy`` and the location of the text ``xytext``. Both of these +arguments are ``(x,y)`` tuples. + +Annotation Basic + +In this example, both the ``xy`` (arrow tip) and ``xytext`` locations +(text location) are in data coordinates. There are a variety of other +coordinate systems one can choose -- you can specify the coordinate +system of ``xy`` and ``xytext`` with one of the following strings for +``xycoords`` and ``textcoords`` (default is 'data') + + +--- + + + + + +argument +coordinate system + + + +'figure points' +points from the lower left corner of the figure + +'figure pixels' +pixels from the lower left corner of the figure + +'figure fraction' +0,0 is lower left of figure and 1,1 is upper right + +'axes points' +points from lower left corner of axes + +'axes pixels' +pixels from lower left corner of axes + +'axes fraction' +0,0 is lower left of axes and 1,1 is upper right + +'data' +use the axes data coordinate system + + + + +For example to place the text coordinates in fractional axes +coordinates, one could do: + +``` python +ax.annotate('local max', xy=(3, 1), xycoords='data', + xytext=(0.8, 0.95), textcoords='axes fraction', + arrowprops=dict(facecolor='black', shrink=0.05), + horizontalalignment='right', verticalalignment='top', + ) +``` + +For physical coordinate systems (points or pixels) the origin is the +bottom-left of the figure or axes. + +Optionally, you can enable drawing of an arrow from the text to the annotated +point by giving a dictionary of arrow properties in the optional keyword +argument ``arrowprops``. + + +--- + + + + + +arrowprops key +description + + + +width +the width of the arrow in points + +frac +the fraction of the arrow length occupied by the head + +headwidth +the width of the base of the arrow head in points + +shrink +move the tip and base some percent away from +the annotated point and text + +**kwargs +any key for [matplotlib.patches.Polygon](https://matplotlib.org/../api/_as_gen/matplotlib.patches.Polygon.html#matplotlib.patches.Polygon), +e.g., facecolor + + + + +In the example below, the ``xy`` point is in native coordinates +(``xycoords`` defaults to 'data'). For a polar axes, this is in +(theta, radius) space. The text in this example is placed in the +fractional figure coordinate system. [``matplotlib.text.Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text) +keyword args like ``horizontalalignment``, ``verticalalignment`` and +``fontsize`` are passed from ``annotate`` to the +``Text`` instance. + +Annotation Polar + +For more on all the wild and wonderful things you can do with +annotations, including fancy arrows, see [Advanced Annotation](#plotting-guide-annotation) +and [Annotating Plots](https://matplotlib.orggallery/text_labels_and_annotations/annotation_demo.html). + +Do not proceed unless you have already read [Basic annotation](#annotations-tutorial), +[``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) and [``annotate()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.annotate.html#matplotlib.pyplot.annotate)! +# Advanced Annotation + +## Annotating with Text with Box + +Let's start with a simple example. + +Annotate Text Arrow + +The [``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) function in the pyplot module (or +text method of the Axes class) takes bbox keyword argument, and when +given, a box around the text is drawn. + +``` python +bbox_props = dict(boxstyle="rarrow,pad=0.3", fc="cyan", ec="b", lw=2) +t = ax.text(0, 0, "Direction", ha="center", va="center", rotation=45, + size=15, + bbox=bbox_props) +``` + +The patch object associated with the text can be accessed by: + +``` python +bb = t.get_bbox_patch() +``` + +The return value is an instance of FancyBboxPatch and the patch +properties like facecolor, edgewidth, etc. can be accessed and +modified as usual. To change the shape of the box, use the *set_boxstyle* +method. + +``` python +bb.set_boxstyle("rarrow", pad=0.6) +``` + +The arguments are the name of the box style with its attributes as +keyword arguments. Currently, following box styles are implemented. + + +--- + + + + + + +Class +Name +Attrs + + + +Circle +circle +pad=0.3 + +DArrow +darrow +pad=0.3 + +LArrow +larrow +pad=0.3 + +RArrow +rarrow +pad=0.3 + +Round +round +pad=0.3,rounding_size=None + +Round4 +round4 +pad=0.3,rounding_size=None + +Roundtooth +roundtooth +pad=0.3,tooth_size=None + +Sawtooth +sawtooth +pad=0.3,tooth_size=None + +Square +square +pad=0.3 + + + + +Fancybox Demo + +Note that the attribute arguments can be specified within the style +name with separating comma (this form can be used as "boxstyle" value +of bbox argument when initializing the text instance) + +``` python +bb.set_boxstyle("rarrow,pad=0.6") +``` + +## Annotating with Arrow + +The [``annotate()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.annotate.html#matplotlib.pyplot.annotate) function in the pyplot module +(or annotate method of the Axes class) is used to draw an arrow +connecting two points on the plot. + +``` python +ax.annotate("Annotation", + xy=(x1, y1), xycoords='data', + xytext=(x2, y2), textcoords='offset points', + ) +``` + +This annotates a point at ``xy`` in the given coordinate (``xycoords``) +with the text at ``xytext`` given in ``textcoords``. Often, the +annotated point is specified in the *data* coordinate and the annotating +text in *offset points*. +See [``annotate()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.annotate.html#matplotlib.pyplot.annotate) for available coordinate systems. + +An arrow connecting two points (xy & xytext) can be optionally drawn by +specifying the ``arrowprops`` argument. To draw only an arrow, use +empty string as the first argument. + +``` python +ax.annotate("", + xy=(0.2, 0.2), xycoords='data', + xytext=(0.8, 0.8), textcoords='data', + arrowprops=dict(arrowstyle="->", + connectionstyle="arc3"), + ) +``` + +Annotate Simple01 + +The arrow drawing takes a few steps. + +1. a connecting path between two points are created. This is +controlled by ``connectionstyle`` key value. +1. If patch object is given (*patchA* & *patchB*), the path is clipped to +avoid the patch. +1. The path is further shrunk by given amount of pixels (*shrinkA* +& *shrinkB*) +1. The path is transmuted to arrow patch, which is controlled by the +``arrowstyle`` key value. + +Annotate Explain + +The creation of the connecting path between two points is controlled by +``connectionstyle`` key and the following styles are available. + + +--- + + + + + +Name +Attrs + + + +angle +angleA=90,angleB=0,rad=0.0 + +angle3 +angleA=90,angleB=0 + +arc +angleA=0,angleB=0,armA=None,armB=None,rad=0.0 + +arc3 +rad=0.0 + +bar +armA=0.0,armB=0.0,fraction=0.3,angle=None + + + + +Note that "3" in ``angle3`` and ``arc3`` is meant to indicate that the +resulting path is a quadratic spline segment (three control +points). As will be discussed below, some arrow style options can only +be used when the connecting path is a quadratic spline. + +The behavior of each connection style is (limitedly) demonstrated in the +example below. (Warning : The behavior of the ``bar`` style is currently not +well defined, it may be changed in the future). + +Connectionstyle Demo + +The connecting path (after clipping and shrinking) is then mutated to +an arrow patch, according to the given ``arrowstyle``. + + +--- + + + + + +Name +Attrs + + + +- +None + +-> +head_length=0.4,head_width=0.2 + +-[ +widthB=1.0,lengthB=0.2,angleB=None + +|-| +widthA=1.0,widthB=1.0 + +-|> +head_length=0.4,head_width=0.2 + +<- +head_length=0.4,head_width=0.2 + +<-> +head_length=0.4,head_width=0.2 + +<|- +head_length=0.4,head_width=0.2 + +<|-|> +head_length=0.4,head_width=0.2 + +fancy +head_length=0.4,head_width=0.4,tail_width=0.4 + +simple +head_length=0.5,head_width=0.5,tail_width=0.2 + +wedge +tail_width=0.3,shrink_factor=0.5 + + + + +Fancyarrow Demo + +Some arrowstyles only work with connection styles that generate a +quadratic-spline segment. They are ``fancy``, ``simple``, and ``wedge``. +For these arrow styles, you must use the "angle3" or "arc3" connection +style. + +If the annotation string is given, the patchA is set to the bbox patch +of the text by default. + +Annotate Simple02 + +As in the text command, a box around the text can be drawn using +the ``bbox`` argument. + +Annotate Simple03 + +By default, the starting point is set to the center of the text +extent. This can be adjusted with ``relpos`` key value. The values +are normalized to the extent of the text. For example, (0,0) means +lower-left corner and (1,1) means top-right. + +Annotate Simple04 + +## Placing Artist at the anchored location of the Axes + +There are classes of artists that can be placed at an anchored location +in the Axes. A common example is the legend. This type of artist can +be created by using the OffsetBox class. A few predefined classes are +available in ``mpl_toolkits.axes_grid1.anchored_artists`` others in +``matplotlib.offsetbox`` + +``` python +from matplotlib.offsetbox import AnchoredText +at = AnchoredText("Figure 1a", + prop=dict(size=15), frameon=True, + loc='upper left', + ) +at.patch.set_boxstyle("round,pad=0.,rounding_size=0.2") +ax.add_artist(at) +``` + +Anchored Box01 + +The *loc* keyword has same meaning as in the legend command. + +A simple application is when the size of the artist (or collection of +artists) is known in pixel size during the time of creation. For +example, If you want to draw a circle with fixed size of 20 pixel x 20 +pixel (radius = 10 pixel), you can utilize +``AnchoredDrawingArea``. The instance is created with a size of the +drawing area (in pixels), and arbitrary artists can added to the +drawing area. Note that the extents of the artists that are added to +the drawing area are not related to the placement of the drawing +area itself. Only the initial size matters. + +``` python +from mpl_toolkits.axes_grid1.anchored_artists import AnchoredDrawingArea + +ada = AnchoredDrawingArea(20, 20, 0, 0, + loc='upper right', pad=0., frameon=False) +p1 = Circle((10, 10), 10) +ada.drawing_area.add_artist(p1) +p2 = Circle((30, 10), 5, fc="r") +ada.drawing_area.add_artist(p2) +``` + +The artists that are added to the drawing area should not have a +transform set (it will be overridden) and the dimensions of those +artists are interpreted as a pixel coordinate, i.e., the radius of the +circles in above example are 10 pixels and 5 pixels, respectively. + +Anchored Box02 + +Sometimes, you want your artists to scale with the data coordinate (or +coordinates other than canvas pixels). You can use +``AnchoredAuxTransformBox`` class. This is similar to +``AnchoredDrawingArea`` except that the extent of the artist is +determined during the drawing time respecting the specified transform. + +``` python +from mpl_toolkits.axes_grid1.anchored_artists import AnchoredAuxTransformBox + +box = AnchoredAuxTransformBox(ax.transData, loc='upper left') +el = Ellipse((0,0), width=0.1, height=0.4, angle=30) # in data coordinates! +box.drawing_area.add_artist(el) +``` + +The ellipse in the above example will have width and height +corresponding to 0.1 and 0.4 in data coordinates and will be +automatically scaled when the view limits of the axes change. + +Anchored Box03 + +As in the legend, the bbox_to_anchor argument can be set. Using the +HPacker and VPacker, you can have an arrangement(?) of artist as in the +legend (as a matter of fact, this is how the legend is created). + +Anchored Box04 + +Note that unlike the legend, the ``bbox_transform`` is set +to IdentityTransform by default. + +## Using Complex Coordinates with Annotations + +The Annotation in matplotlib supports several types of coordinates as +described in [Basic annotation](#annotations-tutorial). For an advanced user who wants +more control, it supports a few other options. + +## Using ConnectionPatch + +The ConnectionPatch is like an annotation without text. While the annotate +function is recommended in most situations, the ConnectionPatch is useful when +you want to connect points in different axes. + +``` python +from matplotlib.patches import ConnectionPatch +xy = (0.2, 0.2) +con = ConnectionPatch(xyA=xy, xyB=xy, coordsA="data", coordsB="data", + axesA=ax1, axesB=ax2) +ax2.add_artist(con) +``` + +The above code connects point xy in the data coordinates of ``ax1`` to +point xy in the data coordinates of ``ax2``. Here is a simple example. + +Connect Simple01 + +While the ConnectionPatch instance can be added to any axes, you may want to add +it to the axes that is latest in drawing order to prevent overlap by other +axes. + +### Advanced Topics + +## Zoom effect between Axes + +``mpl_toolkits.axes_grid1.inset_locator`` defines some patch classes useful +for interconnecting two axes. Understanding the code requires some +knowledge of how mpl's transform works. But, utilizing it will be +straight forward. + +Axes Zoom Effect + +## Define Custom BoxStyle + +You can use a custom box style. The value for the ``boxstyle`` can be a +callable object in the following forms.: + +``` python +def __call__(self, x0, y0, width, height, mutation_size, + aspect_ratio=1.): + ''' + Given the location and size of the box, return the path of + the box around it. + + - *x0*, *y0*, *width*, *height* : location and size of the box + - *mutation_size* : a reference scale for the mutation. + - *aspect_ratio* : aspect-ratio for the mutation. + ''' + path = ... + return path +``` + +Here is a complete example. + +Custom Boxstyle01 + +However, it is recommended that you derive from the +matplotlib.patches.BoxStyle._Base as demonstrated below. + +Custom Boxstyle02 + +Similarly, you can define a custom ConnectionStyle and a custom ArrowStyle. +See the source code of ``lib/matplotlib/patches.py`` and check +how each style class is defined. + +## Download + +- [Download Python source code: annotations.py](https://matplotlib.org/_downloads/e9b9ec3e7de47d2ccae486e437e86de2/annotations.py) +- [Download Jupyter notebook: annotations.ipynb](https://matplotlib.org/_downloads/c4f2a18ccd63dc25619141aee3712b03/annotations.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/text/mathtext.md b/Python/matplotlab/text/mathtext.md new file mode 100644 index 00000000..3292aff1 --- /dev/null +++ b/Python/matplotlab/text/mathtext.md @@ -0,0 +1,1186 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Writing mathematical expressions + +An introduction to writing mathematical expressions in Matplotlib. + +You can use a subset TeX markup in any matplotlib text string by placing it +inside a pair of dollar signs ($). + +Note that you do not need to have TeX installed, since Matplotlib ships +its own TeX expression parser, layout engine, and fonts. The layout engine +is a fairly direct adaptation of the layout algorithms in Donald Knuth's +TeX, so the quality is quite good (matplotlib also provides a ``usetex`` +option for those who do want to call out to TeX to generate their text (see +[Text rendering With LaTeX](usetex.html)). + +Any text element can use math text. You should use raw strings (precede the +quotes with an ``'r'``), and surround the math text with dollar signs ($), as +in TeX. Regular text and mathtext can be interleaved within the same string. +Mathtext can use DejaVu Sans (default), DejaVu Serif, the Computer Modern fonts +(from (La)TeX), [STIX](http://www.stixfonts.org/) fonts (with are designed +to blend well with Times), or a Unicode font that you provide. The mathtext +font can be selected with the customization variable ``mathtext.fontset`` (see +[Customizing Matplotlib with style sheets and rcParams](https://matplotlib.org/introductory/customizing.html)) + +Here is a simple example: + +``` python +# plain text +plt.title('alpha > beta') +``` + +produces "alpha > beta". + +Whereas this: + +``` python +# math text +plt.title(r'$\alpha > \beta$') +``` + +produces "". + +::: tip Note + +Mathtext should be placed between a pair of dollar signs ($). To make it +easy to display monetary values, e.g., "$100.00", if a single dollar sign +is present in the entire string, it will be displayed verbatim as a dollar +sign. This is a small change from regular TeX, where the dollar sign in +non-math text would have to be escaped ('\$'). + +::: + +::: tip Note + +While the syntax inside the pair of dollar signs ($) aims to be TeX-like, +the text outside does not. In particular, characters such as: + +``` python +# $ % & ~ _ ^ \ { } \( \) \[ \] +``` + +have special meaning outside of math mode in TeX. Therefore, these +characters will behave differently depending on the rcParam ``text.usetex`` +flag. See the [usetex tutorial](usetex.html) for more +information. + +::: + +## Subscripts and superscripts + +To make subscripts and superscripts, use the ``'_'`` and ``'^'`` symbols: + +``` python +r'$\alpha_i > \beta_i$' +``` + +
+It's a mathematical formula. +
+ +Some symbols automatically put their sub/superscripts under and over the +operator. For example, to write the sum of from to +, you could do: + +``` python +r'$\sum_{i=0}^\infty x_i$' +``` + +
+It's a mathematical formula. +
+ +## Fractions, binomials, and stacked numbers + +Fractions, binomials, and stacked numbers can be created with the +``\frac{}{}``, ``\binom{}{}`` and ``\genfrac{}{}{}{}{}{}`` commands, +respectively: + +``` python +r'$\frac{3}{4} \binom{3}{4} \genfrac{}{}{0}{}{3}{4}$' +``` + +produces + +
+It's a mathematical formula. +
+ +Fractions can be arbitrarily nested: + +``` python +r'$\frac{5 - \frac{1}{x}}{4}$' +``` + +produces + +
+It's a mathematical formula. +
+ +Note that special care needs to be taken to place parentheses and brackets +around fractions. Doing things the obvious way produces brackets that are too +small: + +``` python +r'$(\frac{5 - \frac{1}{x}}{4})$' +``` + +
+It's a mathematical formula. +
+ +The solution is to precede the bracket with ``\left`` and ``\right`` to inform +the parser that those brackets encompass the entire object.: + +``` python +r'$\left(\frac{5 - \frac{1}{x}}{4}\right)$' +``` + +
+It's a mathematical formula. +
+ +## Radicals + +Radicals can be produced with the ``\sqrt[]{}`` command. For example: + +``` python +r'$\sqrt{2}$' +``` + +
+It's a mathematical formula. +
+ +Any base can (optionally) be provided inside square brackets. Note that the +base must be a simple expression, and can not contain layout commands such as +fractions or sub/superscripts: + +``` python +r'$\sqrt[3]{x}$' +``` + +
+It's a mathematical formula. +
+ +## Fonts + +The default font is *italics* for mathematical symbols. + +::: tip Note + +This default can be changed using the ``mathtext.default`` rcParam. This is +useful, for example, to use the same font as regular non-math text for math +text, by setting it to ``regular``. + +::: + +To change fonts, e.g., to write "sin" in a Roman font, enclose the text in a +font command: + +``` python +r'$s(t) = \mathcal{A}\mathrm{sin}(2 \omega t)$' +``` + +
+It's a mathematical formula. +
+ +More conveniently, many commonly used function names that are typeset in +a Roman font have shortcuts. So the expression above could be written as +follows: + +``` python +r'$s(t) = \mathcal{A}\sin(2 \omega t)$' +``` + +
+It's a mathematical formula. +
+ +Here "s" and "t" are variable in italics font (default), "sin" is in Roman +font, and the amplitude "A" is in calligraphy font. Note in the example above +the calligraphy ``A`` is squished into the ``sin``. You can use a spacing +command to add a little whitespace between them: + +``` python +r's(t) = \mathcal{A}\/\sin(2 \omega t)' +``` + +
+It's a mathematical formula. +
+ +The choices available with all fonts are: + + +--- + + + + + +Command +Result + + + +\mathrm{Roman} + + +\mathit{Italic} + + +\mathtt{Typewriter} + + +\mathcal{CALLIGRAPHY} + + + + + +When using the [STIX](http://www.stixfonts.org/) fonts, you also have the +choice of: + + +--- + + + + + +Command +Result + + + +\mathbb{blackboard} + + +\mathrm{\mathbb{blackboard}} + + +\mathfrak{Fraktur} + + +\mathsf{sansserif} + + +\mathrm{\mathsf{sansserif}} + + + + + +There are also three global "font sets" to choose from, which are +selected using the ``mathtext.fontset`` parameter in [matplotlibrc](https://matplotlib.org/introductory/customizing.html#matplotlibrc-sample). + +``cm``: **Computer Modern (TeX)** + +![cm_fontset](https://matplotlib.org/_images/cm_fontset.png) + +``stix``: **STIX** (designed to blend well with Times) + +![stix_fontset](https://matplotlib.org/_images/stix_fontset.png) + +``stixsans``: **STIX sans-serif** + +![stixsans_fontset](https://matplotlib.org/_images/stixsans_fontset.png) + +Additionally, you can use ``\mathdefault{...}`` or its alias +``\mathregular{...}`` to use the font used for regular text outside of +mathtext. There are a number of limitations to this approach, most notably +that far fewer symbols will be available, but it can be useful to make math +expressions blend well with other text in the plot. + +### Custom fonts + +mathtext also provides a way to use custom fonts for math. This method is +fairly tricky to use, and should be considered an experimental feature for +patient users only. By setting the rcParam ``mathtext.fontset`` to ``custom``, +you can then set the following parameters, which control which font file to use +for a particular set of math characters. + + +--- + + + + + +Parameter +Corresponds to + + + +mathtext.it +\mathit{} or default italic + +mathtext.rm +\mathrm{} Roman (upright) + +mathtext.tt +\mathtt{} Typewriter (monospace) + +mathtext.bf +\mathbf{} bold italic + +mathtext.cal +\mathcal{} calligraphic + +mathtext.sf +\mathsf{} sans-serif + + + + +Each parameter should be set to a fontconfig font descriptor (as defined in the +yet-to-be-written font chapter). + +The fonts used should have a Unicode mapping in order to find any +non-Latin characters, such as Greek. If you want to use a math symbol +that is not contained in your custom fonts, you can set the rcParam +``mathtext.fallback_to_cm`` to ``True`` which will cause the mathtext system +to use characters from the default Computer Modern fonts whenever a particular +character can not be found in the custom font. + +Note that the math glyphs specified in Unicode have evolved over time, and many +fonts may not have glyphs in the correct place for mathtext. + +## Accents + +An accent command may precede any symbol to add an accent above it. There are +long and short forms for some of them. + + +--- + + + + + +Command +Result + + + +\acute a or \'a + + +\bar a + + +\breve a + + +\ddot a or \''a + + +\dot a or \.a + + +\grave a or \`a + + +\hat a or \^a + + +\tilde a or \~a + + +\vec a + + +\overline{abc} + + + + + +In addition, there are two special accents that automatically adjust to the +width of the symbols below: + + +--- + + + + + +Command +Result + + + +\widehat{xyz} + + +\widetilde{xyz} + + + + + +Care should be taken when putting accents on lower-case i's and j's. Note that +in the following ``\imath`` is used to avoid the extra dot over the i: + +``` python +r"$\hat i\ \ \hat \imath$" +``` + +
+It's a mathematical formula. +
+ +## Symbols + +You can also use a large number of the TeX symbols, as in ``\infty``, +``\leftarrow``, ``\sum``, ``\int``. + +**Lower-case Greek** + + +--- + + + + + + + + + +α \alpha +β \beta +χ \chi +δ \delta +ϝ \digamma +ε \epsilon + +η \eta +γ \gamma +ι \iota +κ \kappa +λ \lambda +μ \mu + +ν \nu +ω \omega +ϕ \phi +π \pi +ψ \psi +ρ \rho + +σ \sigma +τ \tau +θ \theta +υ \upsilon +ε \varepsilon +ϰ \varkappa + +φ \varphi +ϖ \varpi +ϱ \varrho +ς \varsigma +ϑ \vartheta +ξ \xi + +ζ \zeta +  +  +  +  +  + + + + +**Upper-case Greek** + + +--- + + + + + + + + + + + +Δ \Delta +Γ \Gamma +Λ \Lambda +Ω \Omega +Φ \Phi +Π \Pi +Ψ \Psi +Σ \Sigma + +Θ \Theta +Υ \Upsilon +Ξ \Xi +℧ \mho +∇ \nabla +  +  +  + + + + +**Hebrew** + + +--- + + + + + + + + + +ℵ \aleph +ℶ \beth +ℸ \daleth +ℷ \gimel +  +  + + + + +**Delimiters** + + +--- + + + + + + + + + +/ / +[ [ +⇓ \Downarrow +⇑ \Uparrow +‖ \Vert +\backslash + +↓ \downarrow +⟨ \langle +⌈ \lceil +⌊ \lfloor +⌞ \llcorner +⌟ \lrcorner + +⟩ \rangle +⌉ \rceil +⌋ \rfloor +⌜ \ulcorner +↑ \uparrow +⌝ \urcorner + + +\vert + + +{ \{ + +\| + + +} \} +] ] + +| + + + + + + +**Big symbols** + + +--- + + + + + + + + + +⋂ \bigcap +⋃ \bigcup +⨀ \bigodot +⨁ \bigoplus +⨂ \bigotimes +⨄ \biguplus + +⋁ \bigvee +⋀ \bigwedge +∐ \coprod +∫ \int +∮ \oint +∏ \prod + +∑ \sum +  +  +  +  +  + + + + +**Standard function names** + + +--- + + + + + + + + + +Pr \Pr +arccos \arccos +arcsin \arcsin +arctan \arctan +arg \arg +cos \cos + +cosh \cosh +cot \cot +coth \coth +csc \csc +deg \deg +det \det + +dim \dim +exp \exp +gcd \gcd +hom \hom +inf \inf +ker \ker + +lg \lg +lim \lim +liminf \liminf +limsup \limsup +ln \ln +log \log + +max \max +min \min +sec \sec +sin \sin +sinh \sinh +sup \sup + +tan \tan +tanh \tanh +  +  +  +  + + + + +**Binary operation and relation symbols** + + +--- + + + + + + + +≎ \Bumpeq +⋒ \Cap +⋓ \Cup +≑ \Doteq + +⨝ \Join +⋐ \Subset +⋑ \Supset +⊩ \Vdash + +⊪ \Vvdash +≈ \approx +≊ \approxeq +∗ \ast + +≍ \asymp +϶ \backepsilon +∽ \backsim +⋍ \backsimeq + +⊼ \barwedge +∵ \because +≬ \between +○ \bigcirc + +▽ \bigtriangledown +△ \bigtriangleup +◀ \blacktriangleleft +▶ \blacktriangleright + +⊥ \bot +⋈ \bowtie +⊡ \boxdot +⊟ \boxminus + +⊞ \boxplus +⊠ \boxtimes +∙ \bullet +≏ \bumpeq + +∩ \cap +⋅ \cdot +∘ \circ +≗ \circeq + +≔ \coloneq +≅ \cong +∪ \cup +⋞ \curlyeqprec + +⋟ \curlyeqsucc +⋎ \curlyvee +⋏ \curlywedge +† \dag + +⊣ \dashv +‡ \ddag +⋄ \diamond +÷ \div + +⋇ \divideontimes +≐ \doteq +≑ \doteqdot +∔ \dotplus + +⌆ \doublebarwedge +≖ \eqcirc +≕ \eqcolon +≂ \eqsim + +⪖ \eqslantgtr +⪕ \eqslantless +≡ \equiv +≒ \fallingdotseq + +⌢ \frown +≥ \geq +≧ \geqq +⩾ \geqslant + +≫ \gg +⋙ \ggg +⪺ \gnapprox +≩ \gneqq + +⋧ \gnsim +⪆ \gtrapprox +⋗ \gtrdot +⋛ \gtreqless + +⪌ \gtreqqless +≷ \gtrless +≳ \gtrsim +∈ \in + +⊺ \intercal +⋋ \leftthreetimes +≤ \leq +≦ \leqq + +⩽ \leqslant +⪅ \lessapprox +⋖ \lessdot +⋚ \lesseqgtr + +⪋ \lesseqqgtr +≶ \lessgtr +≲ \lesssim +≪ \ll + +⋘ \lll +⪹ \lnapprox +≨ \lneqq +⋦ \lnsim + +⋉ \ltimes +∣ \mid +⊧ \models +∓ \mp + +⊯ \nVDash +⊮ \nVdash +≉ \napprox +≇ \ncong + +≠ \ne +≠ \neq +≠ \neq +≢ \nequiv + +≱ \ngeq +≯ \ngtr +∋ \ni +≰ \nleq + +≮ \nless +∤ \nmid +∉ \notin +∦ \nparallel + +⊀ \nprec +≁ \nsim +⊄ \nsubset +⊈ \nsubseteq + +⊁ \nsucc +⊅ \nsupset +⊉ \nsupseteq +⋪ \ntriangleleft + +⋬ \ntrianglelefteq +⋫ \ntriangleright +⋭ \ntrianglerighteq +⊭ \nvDash + +⊬ \nvdash +⊙ \odot +⊖ \ominus +⊕ \oplus + +⊘ \oslash +⊗ \otimes +∥ \parallel +⟂ \perp + +⋔ \pitchfork +± \pm +≺ \prec +⪷ \precapprox + +≼ \preccurlyeq +≼ \preceq +⪹ \precnapprox +⋨ \precnsim + +≾ \precsim +∝ \propto +⋌ \rightthreetimes +≓ \risingdotseq + +⋊ \rtimes +∼ \sim +≃ \simeq +∕ \slash + +⌣ \smile +⊓ \sqcap +⊔ \sqcup +⊏ \sqsubset + +⊏ \sqsubset +⊑ \sqsubseteq +⊐ \sqsupset +⊐ \sqsupset + +⊒ \sqsupseteq +⋆ \star +⊂ \subset +⊆ \subseteq + +⫅ \subseteqq +⊊ \subsetneq +⫋ \subsetneqq +≻ \succ + +⪸ \succapprox +≽ \succcurlyeq +≽ \succeq +⪺ \succnapprox + +⋩ \succnsim +≿ \succsim +⊃ \supset +⊇ \supseteq + +⫆ \supseteqq +⊋ \supsetneq +⫌ \supsetneqq +∴ \therefore + +× \times +⊤ \top +◁ \triangleleft +⊴ \trianglelefteq + +≜ \triangleq +▷ \triangleright +⊵ \trianglerighteq +⊎ \uplus + +⊨ \vDash +∝ \varpropto +⊲ \vartriangleleft +⊳ \vartriangleright + +⊢ \vdash +∨ \vee +⊻ \veebar +∧ \wedge + +≀ \wr +  +  +  + + + + +**Arrow symbols** + + +--- + + + + + + + +⇓ \Downarrow +⇐ \Leftarrow +⇔ \Leftrightarrow +⇚ \Lleftarrow + +⟸ \Longleftarrow +⟺ \Longleftrightarrow +⟹ \Longrightarrow +↰ \Lsh + +⇗ \Nearrow +⇖ \Nwarrow +⇒ \Rightarrow +⇛ \Rrightarrow + +↱ \Rsh +⇘ \Searrow +⇙ \Swarrow +⇑ \Uparrow + +⇕ \Updownarrow +↺ \circlearrowleft +↻ \circlearrowright +↶ \curvearrowleft + +↷ \curvearrowright +⤎ \dashleftarrow +⤏ \dashrightarrow +↓ \downarrow + +⇊ \downdownarrows +⇃ \downharpoonleft +⇂ \downharpoonright +↩ \hookleftarrow + +↪ \hookrightarrow +⇝ \leadsto +← \leftarrow +↢ \leftarrowtail + +↽ \leftharpoondown +↼ \leftharpoonup +⇇ \leftleftarrows +↔ \leftrightarrow + +⇆ \leftrightarrows +⇋ \leftrightharpoons +↭ \leftrightsquigarrow +↜ \leftsquigarrow + +⟵ \longleftarrow +⟷ \longleftrightarrow +⟼ \longmapsto +⟶ \longrightarrow + +↫ \looparrowleft +↬ \looparrowright +↦ \mapsto +⊸ \multimap + +⇍ \nLeftarrow +⇎ \nLeftrightarrow +⇏ \nRightarrow +↗ \nearrow + +↚ \nleftarrow +↮ \nleftrightarrow +↛ \nrightarrow +↖ \nwarrow + +→ \rightarrow +↣ \rightarrowtail +⇁ \rightharpoondown +⇀ \rightharpoonup + +⇄ \rightleftarrows +⇄ \rightleftarrows +⇌ \rightleftharpoons +⇌ \rightleftharpoons + +⇉ \rightrightarrows +⇉ \rightrightarrows +↝ \rightsquigarrow +↘ \searrow + +↙ \swarrow +→ \to +↞ \twoheadleftarrow +↠ \twoheadrightarrow + +↑ \uparrow +↕ \updownarrow +↕ \updownarrow +↿ \upharpoonleft + +↾ \upharpoonright +⇈ \upuparrows +  +  + + + + +**Miscellaneous symbols** + + +--- + + + + + + + +$ \$ +Å \AA +Ⅎ \Finv +⅁ \Game + +ℑ \Im +¶ \P +ℜ \Re +§ \S + +∠ \angle +‵ \backprime +★ \bigstar +■ \blacksquare + +▴ \blacktriangle +▾ \blacktriangledown +⋯ \cdots +✓ \checkmark + +® \circledR +Ⓢ \circledS +♣ \clubsuit +∁ \complement + +© \copyright +⋱ \ddots +♢ \diamondsuit +ℓ \ell + +∅ \emptyset +ð \eth +∃ \exists +♭ \flat + +∀ \forall +ħ \hbar +♡ \heartsuit +ℏ \hslash + +∭ \iiint +∬ \iint +ı \imath +∞ \infty + +ȷ \jmath +… \ldots +∡ \measuredangle +♮ \natural + +¬ \neg +∄ \nexists +∰ \oiiint +∂ \partial + +′ \prime +♯ \sharp +♠ \spadesuit +∢ \sphericalangle + +ß \ss +▿ \triangledown +∅ \varnothing +▵ \vartriangle + +⋮ \vdots +℘ \wp +¥ \yen +  + + + + +If a particular symbol does not have a name (as is true of many of the more +obscure symbols in the STIX fonts), Unicode characters can also be used: + +``` python +ur'$\u23ce$' +``` + +## Example + +Here is an example illustrating many of these features in context. + +Pyplot Mathtext + +## Download + +- [Download Python source code: mathtext.py](https://matplotlib.org/_downloads/5dc5da00d7d7311390ebcde7a47a6f38/mathtext.py) +- [Download Jupyter notebook: mathtext.ipynb](https://matplotlib.org/_downloads/953adc52a638855f0815466d723fca0d/mathtext.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/text/pgf.md b/Python/matplotlab/text/pgf.md new file mode 100644 index 00000000..3bd80956 --- /dev/null +++ b/Python/matplotlab/text/pgf.md @@ -0,0 +1,260 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Typesetting With XeLaTeX/LuaLaTeX + +How to typeset text with the ``pgf`` backend in Matplotlib. + +Using the ``pgf`` backend, matplotlib can export figures as pgf drawing commands +that can be processed with pdflatex, xelatex or lualatex. XeLaTeX and LuaLaTeX +have full unicode support and can use any font that is installed in the operating +system, making use of advanced typographic features of OpenType, AAT and +Graphite. Pgf pictures created by ``plt.savefig('figure.pgf')`` can be +embedded as raw commands in LaTeX documents. Figures can also be directly +compiled and saved to PDF with ``plt.savefig('figure.pdf')`` by either +switching to the backend + +``` python +matplotlib.use('pgf') +``` + +or registering it for handling pdf output + +``` python +from matplotlib.backends.backend_pgf import FigureCanvasPgf +matplotlib.backend_bases.register_backend('pdf', FigureCanvasPgf) +``` + +The second method allows you to keep using regular interactive backends and to +save xelatex, lualatex or pdflatex compiled PDF files from the graphical user interface. + +Matplotlib's pgf support requires a recent [LaTeX](http://www.tug.org) installation that includes +the TikZ/PGF packages (such as [TeXLive](http://www.tug.org/texlive/)), preferably with XeLaTeX or LuaLaTeX +installed. If either pdftocairo or ghostscript is present on your system, +figures can optionally be saved to PNG images as well. The executables +for all applications must be located on your [``PATH``](https://matplotlib.orgfaq/environment_variables_faq.html#envvar-PATH). + +Rc parameters that control the behavior of the pgf backend: + + +--- + + + + + +Parameter +Documentation + + + +pgf.preamble +Lines to be included in the LaTeX preamble + +pgf.rcfonts +Setup fonts from rc params using the fontspec package + +pgf.texsystem +Either "xelatex" (default), "lualatex" or "pdflatex" + + + + +::: tip Note + +TeX defines a set of special characters, such as: + +``` python +# $ % & ~ _ ^ \ { } +``` + +Generally, these characters must be escaped correctly. For convenience, +some characters (_,^,%) are automatically escaped outside of math +environments. + +::: + +## Multi-Page PDF Files + +The pgf backend also supports multipage pdf files using ``PdfPages`` + +``` python +from matplotlib.backends.backend_pgf import PdfPages +import matplotlib.pyplot as plt + +with PdfPages('multipage.pdf', metadata={'author': 'Me'}) as pdf: + + fig1, ax1 = plt.subplots() + ax1.plot([1, 5, 3]) + pdf.savefig(fig1) + + fig2, ax2 = plt.subplots() + ax2.plot([1, 5, 3]) + pdf.savefig(fig2) +``` + +## Font specification + +The fonts used for obtaining the size of text elements or when compiling +figures to PDF are usually defined in the matplotlib rc parameters. You can +also use the LaTeX default Computer Modern fonts by clearing the lists for +``font.serif``, ``font.sans-serif`` or ``font.monospace``. Please note that +the glyph coverage of these fonts is very limited. If you want to keep the +Computer Modern font face but require extended unicode support, consider +installing the [Computer Modern Unicode](https://sourceforge.net/projects/cm-unicode/) +fonts *CMU Serif*, *CMU Sans Serif*, etc. + +When saving to ``.pgf``, the font configuration matplotlib used for the +layout of the figure is included in the header of the text file. + +``` python +""" +========= +Pgf Fonts +========= + +""" + +import matplotlib.pyplot as plt +plt.rcParams.update({ + "font.family": "serif", + "font.serif": [], # use latex default serif font + "font.sans-serif": ["DejaVu Sans"], # use a specific sans-serif font +}) + +plt.figure(figsize=(4.5, 2.5)) +plt.plot(range(5)) +plt.text(0.5, 3., "serif") +plt.text(0.5, 2., "monospace", family="monospace") +plt.text(2.5, 2., "sans-serif", family="sans-serif") +plt.text(2.5, 1., "comic sans", family="Comic Sans MS") +plt.xlabel("µ is not $\\mu$") +plt.tight_layout(.5) +``` + +## Custom preamble + +Full customization is possible by adding your own commands to the preamble. +Use the ``pgf.preamble`` parameter if you want to configure the math fonts, +using ``unicode-math`` for example, or for loading additional packages. Also, +if you want to do the font configuration yourself instead of using the fonts +specified in the rc parameters, make sure to disable ``pgf.rcfonts``. + +``` python +""" +============ +Pgf Preamble +============ + +""" + +import matplotlib as mpl +mpl.use("pgf") +import matplotlib.pyplot as plt +plt.rcParams.update({ + "font.family": "serif", # use serif/main font for text elements + "text.usetex": True, # use inline math for ticks + "pgf.rcfonts": False, # don't setup fonts from rc parameters + "pgf.preamble": [ + "\\usepackage{units}", # load additional packages + "\\usepackage{metalogo}", + "\\usepackage{unicode-math}", # unicode math setup + r"\setmathfont{xits-math.otf}", + r"\setmainfont{DejaVu Serif}", # serif font via preamble + ] +}) + +plt.figure(figsize=(4.5, 2.5)) +plt.plot(range(5)) +plt.xlabel("unicode text: я, ψ, €, ü, \\unitfrac[10]{°}{µm}") +plt.ylabel("\\XeLaTeX") +plt.legend(["unicode math: $λ=∑_i^∞ μ_i^2$"]) +plt.tight_layout(.5) +``` + +## Choosing the TeX system + +The TeX system to be used by matplotlib is chosen by the ``pgf.texsystem`` +parameter. Possible values are ``'xelatex'`` (default), ``'lualatex'`` and +``'pdflatex'``. Please note that when selecting pdflatex the fonts and +unicode handling must be configured in the preamble. + +``` python +""" +============= +Pgf Texsystem +============= + +""" + +import matplotlib.pyplot as plt +plt.rcParams.update({ + "pgf.texsystem": "pdflatex", + "pgf.preamble": [ + r"\usepackage[utf8x]{inputenc}", + r"\usepackage[T1]{fontenc}", + r"\usepackage{cmbright}", + ] +}) + +plt.figure(figsize=(4.5, 2.5)) +plt.plot(range(5)) +plt.text(0.5, 3., "serif", family="serif") +plt.text(0.5, 2., "monospace", family="monospace") +plt.text(2.5, 2., "sans-serif", family="sans-serif") +plt.xlabel(r"µ is not $\mu$") +plt.tight_layout(.5) +``` + +## Troubleshooting + +- Please note that the TeX packages found in some Linux distributions and +MiKTeX installations are dramatically outdated. Make sure to update your +package catalog and upgrade or install a recent TeX distribution. +- On Windows, the [``PATH``](https://matplotlib.orgfaq/environment_variables_faq.html#envvar-PATH) environment variable may need to be modified +to include the directories containing the latex, dvipng and ghostscript +executables. See [Environment Variables](https://matplotlib.orgfaq/environment_variables_faq.html#environment-variables) and +[Setting environment variables in windows](https://matplotlib.orgfaq/environment_variables_faq.html#setting-windows-environment-variables) for details. +- A limitation on Windows causes the backend to keep file handles that have +been opened by your application open. As a result, it may not be possible +to delete the corresponding files until the application closes (see +[#1324](https://github.com/matplotlib/matplotlib/issues/1324)). +- Sometimes the font rendering in figures that are saved to png images is +very bad. This happens when the pdftocairo tool is not available and +ghostscript is used for the pdf to png conversion. +- Make sure what you are trying to do is possible in a LaTeX document, +that your LaTeX syntax is valid and that you are using raw strings +if necessary to avoid unintended escape sequences. +- The ``pgf.preamble`` rc setting provides lots of flexibility, and lots of +ways to cause problems. When experiencing problems, try to minimalize or +disable the custom preamble. +- Configuring an ``unicode-math`` environment can be a bit tricky. The +TeXLive distribution for example provides a set of math fonts which are +usually not installed system-wide. XeTeX, unlike LuaLatex, cannot find +these fonts by their name, which is why you might have to specify +``\setmathfont{xits-math.otf}`` instead of ``\setmathfont{XITS Math}`` or +alternatively make the fonts available to your OS. See this +[tex.stackexchange.com question](http://tex.stackexchange.com/questions/43642) +for more details. +- If the font configuration used by matplotlib differs from the font setting +in yout LaTeX document, the alignment of text elements in imported figures +may be off. Check the header of your ``.pgf`` file if you are unsure about +the fonts matplotlib used for the layout. +- Vector images and hence ``.pgf`` files can become bloated if there are a lot +of objects in the graph. This can be the case for image processing or very +big scatter graphs. In an extreme case this can cause TeX to run out of +memory: "TeX capacity exceeded, sorry" You can configure latex to increase +the amount of memory available to generate the ``.pdf`` image as discussed on +[tex.stackexchange.com](http://tex.stackexchange.com/questions/7953). +Another way would be to "rasterize" parts of the graph causing problems +using either the ``rasterized=True`` keyword, or ``.set_rasterized(True)`` as per +[this example](https://matplotlib.orggallery/misc/rasterization_demo.html). +- If you still need help, please see [Getting help](https://matplotlib.orgfaq/troubleshooting_faq.html#reporting-problems) + +## Download + +- [Download Python source code: pgf.py](https://matplotlib.org/_downloads/33c57cdb935b0436624e8a3471dedc5e/pgf.py) +- [Download Jupyter notebook: pgf.ipynb](https://matplotlib.org/_downloads/216a4d9bdb6721e8ad7fda0b85a793ae/pgf.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/text/text_intro.md b/Python/matplotlab/text/text_intro.md new file mode 100644 index 00000000..7ccbb1ce --- /dev/null +++ b/Python/matplotlab/text/text_intro.md @@ -0,0 +1,494 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Text in Matplotlib Plots + +Introduction to plotting and working with text in Matplotlib. + +Matplotlib has extensive text support, including support for +mathematical expressions, truetype support for raster and +vector outputs, newline separated text with arbitrary +rotations, and unicode support. + +Because it embeds fonts directly in output documents, e.g., for postscript +or PDF, what you see on the screen is what you get in the hardcopy. +[FreeType](https://www.freetype.org/) support +produces very nice, antialiased fonts, that look good even at small +raster sizes. Matplotlib includes its own +[``matplotlib.font_manager``](https://matplotlib.orgapi/font_manager_api.html#module-matplotlib.font_manager) (thanks to Paul Barrett), which +implements a cross platform, ``W3C`` +compliant font finding algorithm. + +The user has a great deal of control over text properties (font size, font +weight, text location and color, etc.) with sensible defaults set in +the [rc file](https://matplotlib.org/introductory/customizing.html). +And significantly, for those interested in mathematical +or scientific figures, Matplotlib implements a large number of TeX +math symbols and commands, supporting [mathematical expressions](mathtext.html) anywhere in your figure. + +## Basic text commands + +The following commands are used to create text in the pyplot +interface and the object-oriented API: + + +--- + + + + + + +[pyplot](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.html#module-matplotlib.pyplot) API +OO API +description + + + +[[[text](https://matplotlib.org/../api/_as_gen/matplotlib.figure.[[Figure](https://matplotlib.org/../api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure)](https://matplotlib.org/../api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure).html#matplotlib.figure.Figure.text)](https://matplotlib.org/../api/_as_gen/matplotlib.axes.[[[[[Axes](https://matplotlib.org/../api/axes_api.html#matplotlib.axes.Axes)](https://matplotlib.org/../api/axes_api.html#matplotlib.axes.Axes)](https://matplotlib.org/../api/axes_api.html#matplotlib.axes.Axes)](https://matplotlib.org/../api/axes_api.html#matplotlib.axes.Axes)](https://matplotlib.org/../api/axes_api.html#matplotlib.axes.Axes).text.html#matplotlib.axes.Axes.text)](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) +text +Add text at an arbitrary location of +the Axes. + +[[annotate](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.annotate.html#matplotlib.axes.Axes.annotate)](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.annotate.html#matplotlib.pyplot.annotate) +annotate +Add an annotation, with an optional +arrow, at an arbitrary location of the +Axes. + +[xlabel](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.xlabel.html#matplotlib.pyplot.xlabel) +[set_xlabel](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.set_xlabel.html#matplotlib.axes.Axes.set_xlabel) +Add a label to the +Axes's x-axis. + +[ylabel](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.ylabel.html#matplotlib.pyplot.ylabel) +[set_ylabel](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.set_ylabel.html#matplotlib.axes.Axes.set_ylabel) +Add a label to the +Axes's y-axis. + +[title](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.title.html#matplotlib.pyplot.title) +[set_title](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.set_title.html#matplotlib.axes.Axes.set_title) +Add a title to the +Axes. + +[figtext](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.figtext.html#matplotlib.pyplot.figtext) +text +Add text at an arbitrary location of +the Figure. + +[[suptitle](https://matplotlib.org/../api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.suptitle)](https://matplotlib.org/../api/_as_gen/matplotlib.pyplot.suptitle.html#matplotlib.pyplot.suptitle) +suptitle +Add a title to the Figure. + + + + +All of these functions create and return a [``Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text) instance, which can be +configured with a variety of font and other properties. The example below +shows all of these commands in action, and more detail is provided in the +sections that follow. + +``` python +import matplotlib +import matplotlib.pyplot as plt + +fig = plt.figure() +fig.suptitle('bold figure suptitle', fontsize=14, fontweight='bold') + +ax = fig.add_subplot(111) +fig.subplots_adjust(top=0.85) +ax.set_title('axes title') + +ax.set_xlabel('xlabel') +ax.set_ylabel('ylabel') + +ax.text(3, 8, 'boxed italics text in data coords', style='italic', + bbox={'facecolor': 'red', 'alpha': 0.5, 'pad': 10}) + +ax.text(2, 6, r'an equation: $E=mc^2$', fontsize=15) + +ax.text(3, 2, 'unicode: Institut für Festkörperphysik') + +ax.text(0.95, 0.01, 'colored text in axes coords', + verticalalignment='bottom', horizontalalignment='right', + transform=ax.transAxes, + color='green', fontsize=15) + + +ax.plot([2], [1], 'o') +ax.annotate('annotate', xy=(2, 1), xytext=(3, 4), + arrowprops=dict(facecolor='black', shrink=0.05)) + +ax.axis([0, 10, 0, 10]) + +plt.show() +``` + +![sphx_glr_text_intro_001](https://matplotlib.org/_images/sphx_glr_text_intro_001.png) + +## Labels for x- and y-axis + +Specifying the labels for the x- and y-axis is straightforward, via the +[``set_xlabel``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_xlabel.html#matplotlib.axes.Axes.set_xlabel) and [``set_ylabel``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_ylabel.html#matplotlib.axes.Axes.set_ylabel) +methods. + +``` python +import matplotlib.pyplot as plt +import numpy as np + +x1 = np.linspace(0.0, 5.0, 100) +y1 = np.cos(2 * np.pi * x1) * np.exp(-x1) + +fig, ax = plt.subplots(figsize=(5, 3)) +fig.subplots_adjust(bottom=0.15, left=0.2) +ax.plot(x1, y1) +ax.set_xlabel('time [s]') +ax.set_ylabel('Damped oscillation [V]') + +plt.show() +``` + +![sphx_glr_text_intro_002](https://matplotlib.org/_images/sphx_glr_text_intro_002.png) + +The x- and y-labels are automatically placed so that they clear the x- and +y-ticklabels. Compare the plot below with that above, and note the y-label +is to the left of the one above. + +``` python +fig, ax = plt.subplots(figsize=(5, 3)) +fig.subplots_adjust(bottom=0.15, left=0.2) +ax.plot(x1, y1*10000) +ax.set_xlabel('time [s]') +ax.set_ylabel('Damped oscillation [V]') + +plt.show() +``` + +![sphx_glr_text_intro_003](https://matplotlib.org/_images/sphx_glr_text_intro_003.png) + +If you want to move the labels, you can specify the *labelpad* keyword +argument, where the value is points (1/72", the same unit used to specify +fontsizes). + +``` python +fig, ax = plt.subplots(figsize=(5, 3)) +fig.subplots_adjust(bottom=0.15, left=0.2) +ax.plot(x1, y1*10000) +ax.set_xlabel('time [s]') +ax.set_ylabel('Damped oscillation [V]', labelpad=18) + +plt.show() +``` + +![sphx_glr_text_intro_004](https://matplotlib.org/_images/sphx_glr_text_intro_004.png) + +Or, the labels accept all the [``Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text) keyword arguments, including +*position*, via which we can manually specify the label positions. Here we +put the xlabel to the far left of the axis. Note, that the y-coordinate of +this position has no effect - to adjust the y-position we need to use the +*labelpad* kwarg. + +``` python +fig, ax = plt.subplots(figsize=(5, 3)) +fig.subplots_adjust(bottom=0.15, left=0.2) +ax.plot(x1, y1) +ax.set_xlabel('time [s]', position=(0., 1e6), + horizontalalignment='left') +ax.set_ylabel('Damped oscillation [V]') + +plt.show() +``` + +![sphx_glr_text_intro_005](https://matplotlib.org/_images/sphx_glr_text_intro_005.png) + +All the labelling in this tutorial can be changed by manipulating the +[``matplotlib.font_manager.FontProperties``](https://matplotlib.orgapi/font_manager_api.html#matplotlib.font_manager.FontProperties) method, or by named kwargs to +[``set_xlabel``](https://matplotlib.orgapi/_as_gen/matplotlib.axes.Axes.set_xlabel.html#matplotlib.axes.Axes.set_xlabel) + +``` python +from matplotlib.font_manager import FontProperties + +font = FontProperties() +font.set_family('serif') +font.set_name('Times New Roman') +font.set_style('italic') + +fig, ax = plt.subplots(figsize=(5, 3)) +fig.subplots_adjust(bottom=0.15, left=0.2) +ax.plot(x1, y1) +ax.set_xlabel('time [s]', fontsize='large', fontweight='bold') +ax.set_ylabel('Damped oscillation [V]', fontproperties=font) + +plt.show() +``` + +![sphx_glr_text_intro_006](https://matplotlib.org/_images/sphx_glr_text_intro_006.png) + +Finally, we can use native TeX rendering in all text objects and have +multiple lines: + +``` python +fig, ax = plt.subplots(figsize=(5, 3)) +fig.subplots_adjust(bottom=0.2, left=0.2) +ax.plot(x1, np.cumsum(y1**2)) +ax.set_xlabel('time [s] \n This was a long experiment') +ax.set_ylabel(r'$\int\ Y^2\ dt\ \ [V^2 s]$') +plt.show() +``` + +![sphx_glr_text_intro_007](https://matplotlib.org/_images/sphx_glr_text_intro_007.png) + +## Titles + +Subplot titles are set in much the same way as labels, but there is +the *loc* keyword arguments that can change the position and justification +from the default value of ``loc=center``. + +``` python +fig, axs = plt.subplots(3, 1, figsize=(5, 6), tight_layout=True) +locs = ['center', 'left', 'right'] +for ax, loc in zip(axs, locs): + ax.plot(x1, y1) + ax.set_title('Title with loc at '+loc, loc=loc) +plt.show() +``` + +![sphx_glr_text_intro_008](https://matplotlib.org/_images/sphx_glr_text_intro_008.png) + +Vertical spacing for titles is controlled via ``rcParams["axes.titlepad"] = 6.0``, which +defaults to 5 points. Setting to a different value moves the title. + +``` python +fig, ax = plt.subplots(figsize=(5, 3)) +fig.subplots_adjust(top=0.8) +ax.plot(x1, y1) +ax.set_title('Vertically offset title', pad=30) +plt.show() +``` + +![sphx_glr_text_intro_009](https://matplotlib.org/_images/sphx_glr_text_intro_009.png) + +## Ticks and ticklabels + +Placing ticks and ticklabels is a very tricky aspect of making a figure. +Matplotlib does the best it can automatically, but it also offers a very +flexible framework for determining the choices for tick locations, and +how they are labelled. + +### Terminology + +*Axes* have an [``matplotlib.axis``](https://matplotlib.orgapi/axis_api.html#module-matplotlib.axis) object for the ``ax.xaxis`` +and ``ax.yaxis`` that +contain the information about how the labels in the axis are laid out. + +The axis API is explained in detail in the documentation to +[``axis``](https://matplotlib.orgapi/axis_api.html#module-matplotlib.axis). + +An Axis object has major and minor ticks. The Axis has a +``matplotlib.xaxis.set_major_locator`` and +``matplotlib.xaxis.set_minor_locator`` methods that use the data being plotted +to determine +the location of major and minor ticks. There are also +``matplotlib.xaxis.set_major_formatter`` and +``matplotlib.xaxis.set_minor_formatters`` methods that format the tick labels. + +### Simple ticks + +It often is convenient to simply define the +tick values, and sometimes the tick labels, overriding the default +locators and formatters. This is discouraged because it breaks itneractive +navigation of the plot. It also can reset the axis limits: note that +the second plot has the ticks we asked for, including ones that are +well outside the automatic view limits. + +``` python +fig, axs = plt.subplots(2, 1, figsize=(5, 3), tight_layout=True) +axs[0].plot(x1, y1) +axs[1].plot(x1, y1) +axs[1].xaxis.set_ticks(np.arange(0., 8.1, 2.)) +plt.show() +``` + +![sphx_glr_text_intro_010](https://matplotlib.org/_images/sphx_glr_text_intro_010.png) + +We can of course fix this after the fact, but it does highlight a +weakness of hard-coding the ticks. This example also changes the format +of the ticks: + +``` python +fig, axs = plt.subplots(2, 1, figsize=(5, 3), tight_layout=True) +axs[0].plot(x1, y1) +axs[1].plot(x1, y1) +ticks = np.arange(0., 8.1, 2.) +# list comprehension to get all tick labels... +tickla = ['%1.2f' % tick for tick in ticks] +axs[1].xaxis.set_ticks(ticks) +axs[1].xaxis.set_ticklabels(tickla) +axs[1].set_xlim(axs[0].get_xlim()) +plt.show() +``` + +![sphx_glr_text_intro_011](https://matplotlib.org/_images/sphx_glr_text_intro_011.png) + +### Tick Locators and Formatters + +Instead of making a list of all the tickalbels, we could have +used a [``matplotlib.ticker.FormatStrFormatter``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.FormatStrFormatter) and passed it to the +``ax.xaxis`` + +``` python +fig, axs = plt.subplots(2, 1, figsize=(5, 3), tight_layout=True) +axs[0].plot(x1, y1) +axs[1].plot(x1, y1) +ticks = np.arange(0., 8.1, 2.) +# list comprehension to get all tick labels... +formatter = matplotlib.ticker.StrMethodFormatter('{x:1.1f}') +axs[1].xaxis.set_ticks(ticks) +axs[1].xaxis.set_major_formatter(formatter) +axs[1].set_xlim(axs[0].get_xlim()) +plt.show() +``` + +![sphx_glr_text_intro_012](https://matplotlib.org/_images/sphx_glr_text_intro_012.png) + +And of course we could have used a non-default locator to set the +tick locations. Note we still pass in the tick values, but the +x-limit fix used above is *not* needed. + +``` python +fig, axs = plt.subplots(2, 1, figsize=(5, 3), tight_layout=True) +axs[0].plot(x1, y1) +axs[1].plot(x1, y1) +formatter = matplotlib.ticker.FormatStrFormatter('%1.1f') +locator = matplotlib.ticker.FixedLocator(ticks) +axs[1].xaxis.set_major_locator(locator) +axs[1].xaxis.set_major_formatter(formatter) +plt.show() +``` + +![sphx_glr_text_intro_013](https://matplotlib.org/_images/sphx_glr_text_intro_013.png) + +The default formatter is the [``matplotlib.ticker.MaxNLocator``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.MaxNLocator) called as +``ticker.MaxNLocator(self, nbins='auto', steps=[1, 2, 2.5, 5, 10])`` +The *steps* keyword contains a list of multiples that can be used for +tick values. i.e. in this case, 2, 4, 6 would be acceptable ticks, +as would 20, 40, 60 or 0.2, 0.4, 0.6. However, 3, 6, 9 would not be +acceptable because 3 doesn't appear in the list of steps. + +``nbins=auto`` uses an algorithm to determine how many ticks will +be acceptable based on how long the axis is. The fontsize of the +ticklabel is taken into account, but the length of the tick string +is not (because its not yet known.) In the bottom row, the +ticklabels are quite large, so we set ``nbins=4`` to make the +labels fit in the right-hand plot. + +``` python +fig, axs = plt.subplots(2, 2, figsize=(8, 5), tight_layout=True) +for n, ax in enumerate(axs.flat): + ax.plot(x1*10., y1) + +formatter = matplotlib.ticker.FormatStrFormatter('%1.1f') +locator = matplotlib.ticker.MaxNLocator(nbins='auto', steps=[1, 4, 10]) +axs[0, 1].xaxis.set_major_locator(locator) +axs[0, 1].xaxis.set_major_formatter(formatter) + +formatter = matplotlib.ticker.FormatStrFormatter('%1.5f') +locator = matplotlib.ticker.AutoLocator() +axs[1, 0].xaxis.set_major_formatter(formatter) +axs[1, 0].xaxis.set_major_locator(locator) + +formatter = matplotlib.ticker.FormatStrFormatter('%1.5f') +locator = matplotlib.ticker.MaxNLocator(nbins=4) +axs[1, 1].xaxis.set_major_formatter(formatter) +axs[1, 1].xaxis.set_major_locator(locator) + +plt.show() +``` + +![sphx_glr_text_intro_014](https://matplotlib.org/_images/sphx_glr_text_intro_014.png) + +Finally, we can specify functions for the formatter using +[``matplotlib.ticker.FuncFormatter``](https://matplotlib.orgapi/ticker_api.html#matplotlib.ticker.FuncFormatter). + +``` python +def formatoddticks(x, pos): + """Format odd tick positions + """ + if x % 2: + return '%1.2f' % x + else: + return '' + +fig, ax = plt.subplots(figsize=(5, 3), tight_layout=True) +ax.plot(x1, y1) +formatter = matplotlib.ticker.FuncFormatter(formatoddticks) +locator = matplotlib.ticker.MaxNLocator(nbins=6) +ax.xaxis.set_major_formatter(formatter) +ax.xaxis.set_major_locator(locator) + +plt.show() +``` + +![sphx_glr_text_intro_015](https://matplotlib.org/_images/sphx_glr_text_intro_015.png) + +### Dateticks + +Matplotlib can accept [``datetime.datetime``](https://docs.python.org/3/library/datetime.html#datetime.datetime) and ``numpy.datetime64`` +objects as plotting arguments. Dates and times require special +formatting, which can often benefit from manual intervention. In +order to help, dates have special Locators and Formatters, +defined in the [``matplotlib.dates``](https://matplotlib.orgapi/dates_api.html#module-matplotlib.dates) module. + +A simple example is as follows. Note how we have to rotate the +tick labels so that they don't over-run each other. + +``` python +import datetime + +fig, ax = plt.subplots(figsize=(5, 3), tight_layout=True) +base = datetime.datetime(2017, 1, 1, 0, 0, 1) +time = [base + datetime.timedelta(days=x) for x in range(len(y1))] + +ax.plot(time, y1) +ax.tick_params(axis='x', rotation=70) +plt.show() +``` + +![sphx_glr_text_intro_016](https://matplotlib.org/_images/sphx_glr_text_intro_016.png) + +We can pass a format +to [``matplotlib.dates.DateFormatter``](https://matplotlib.orgapi/dates_api.html#matplotlib.dates.DateFormatter). Also note that the 29th and the +next month are very close together. We can fix this by using the +``dates.DayLocator`` class, which allows us to specify a list of days of the +month to use. Similar formatters are listed in the [``matplotlib.dates``](https://matplotlib.orgapi/dates_api.html#module-matplotlib.dates) module. + +``` python +import matplotlib.dates as mdates + +locator = mdates.DayLocator(bymonthday=[1, 15]) +formatter = mdates.DateFormatter('%b %d') + +fig, ax = plt.subplots(figsize=(5, 3), tight_layout=True) +ax.xaxis.set_major_locator(locator) +ax.xaxis.set_major_formatter(formatter) +ax.plot(time, y1) +ax.tick_params(axis='x', rotation=70) +plt.show() +``` + +![sphx_glr_text_intro_017](https://matplotlib.org/_images/sphx_glr_text_intro_017.png) + +## Legends and Annotations + +- Legends: [Legend guide](https://matplotlib.org/intermediate/legend_guide.html) +- Annotations: [Annotations](annotations.html) + +**Total running time of the script:** ( 0 minutes 2.549 seconds) + +## Download + +- [Download Python source code: text_intro.py](https://matplotlib.org/_downloads/8eeacd0a48953caa8d20de07b4c4ef50/text_intro.py) +- [Download Jupyter notebook: text_intro.ipynb](https://matplotlib.org/_downloads/b1f39cf888a0a639ac54bae2e28dfe44/text_intro.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/text/text_props.md b/Python/matplotlab/text/text_props.md new file mode 100644 index 00000000..204c0e35 --- /dev/null +++ b/Python/matplotlab/text/text_props.md @@ -0,0 +1,336 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Text properties and layout + +Controlling properties of text and its layout with Matplotlib. + +The [``matplotlib.text.Text``](https://matplotlib.orgapi/text_api.html#matplotlib.text.Text) instances have a variety of +properties which can be configured via keyword arguments to the text +commands (e.g., [``title()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.title.html#matplotlib.pyplot.title), +[``xlabel()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.xlabel.html#matplotlib.pyplot.xlabel) and [``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text)). + + +--- + + + + + +Property +Value Type + + + +alpha +[[[[float](https://pillow.readthedocs.io/en/stable/reference/ImageMath.html#float)](https://pillow.readthedocs.io/en/stable/reference/ImageMath.html#float)](https://pillow.readthedocs.io/en/stable/reference/ImageMath.html#float)](https://pillow.readthedocs.io/en/stable/reference/ImageMath.html#float) + +background[[color](https://matplotlib.org/colors/colors.html)](https://matplotlib.org/colors/colors.html) +any matplotlib color + +bbox +[Rectangle](https://matplotlib.org/../api/_as_gen/matplotlib.patches.Rectangle.html#matplotlib.patches.Rectangle) prop dict plus key 'pad' which is a pad in points + +clip_box +a matplotlib.transform.Bbox instance + +clip_on +bool + +clip_path +a [Path](https://matplotlib.org/../api/path_api.html#matplotlib.path.Path) instance and a [[Transform](https://matplotlib.org/../api/transformations.html#matplotlib.transforms.Transform)](https://matplotlib.org/../api/transformations.html#matplotlib.transforms.Transform) instance, a [Patch](https://matplotlib.org/../api/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch) + +color +any matplotlib color + +family +[ 'serif' | 'sans-serif' | 'cursive' | 'fantasy' | 'monospace' ] + +fontproperties +a [FontProperties](https://matplotlib.org/../api/font_manager_api.html#matplotlib.font_manager.FontProperties) instance + +horizontalalignment or ha +[ 'center' | 'right' | 'left' ] + +label +any string + +linespacing +float + +multialignment +['left' | 'right' | 'center' ] + +name or fontname +string e.g., ['Sans' | 'Courier' | 'Helvetica' ...] + +picker +[None|float|boolean|callable] + +position +(x, y) + +rotation +[ angle in degrees | 'vertical' | 'horizontal' ] + +size or fontsize +[ size in points | relative size, e.g., 'smaller', 'x-large' ] + +style or fontstyle +[ 'normal' | 'italic' | 'oblique' ] + +text +string or anything printable with '%s' conversion + +transform +a Transform instance + +variant +[ 'normal' | 'small-caps' ] + +verticalalignment or va +[ 'center' | 'top' | 'bottom' | 'baseline' ] + +visible +bool + +weight or fontweight +[ 'normal' | 'bold' | 'heavy' | 'light' | 'ultrabold' | 'ultralight'] + +x +float + +y +float + +zorder +any number + + + + +You can lay out text with the alignment arguments +``horizontalalignment``, ``verticalalignment``, and +``multialignment``. ``horizontalalignment`` controls whether the x +positional argument for the text indicates the left, center or right +side of the text bounding box. ``verticalalignment`` controls whether +the y positional argument for the text indicates the bottom, center or +top side of the text bounding box. ``multialignment``, for newline +separated strings only, controls whether the different lines are left, +center or right justified. Here is an example which uses the +[``text()``](https://matplotlib.orgapi/_as_gen/matplotlib.pyplot.text.html#matplotlib.pyplot.text) command to show the various alignment +possibilities. The use of ``transform=ax.transAxes`` throughout the +code indicates that the coordinates are given relative to the axes +bounding box, with 0,0 being the lower left of the axes and 1,1 the +upper right. + +``` python +import matplotlib.pyplot as plt +import matplotlib.patches as patches + +# build a rectangle in axes coords +left, width = .25, .5 +bottom, height = .25, .5 +right = left + width +top = bottom + height + +fig = plt.figure() +ax = fig.add_axes([0, 0, 1, 1]) + +# axes coordinates are 0,0 is bottom left and 1,1 is upper right +p = patches.Rectangle( + (left, bottom), width, height, + fill=False, transform=ax.transAxes, clip_on=False + ) + +ax.add_patch(p) + +ax.text(left, bottom, 'left top', + horizontalalignment='left', + verticalalignment='top', + transform=ax.transAxes) + +ax.text(left, bottom, 'left bottom', + horizontalalignment='left', + verticalalignment='bottom', + transform=ax.transAxes) + +ax.text(right, top, 'right bottom', + horizontalalignment='right', + verticalalignment='bottom', + transform=ax.transAxes) + +ax.text(right, top, 'right top', + horizontalalignment='right', + verticalalignment='top', + transform=ax.transAxes) + +ax.text(right, bottom, 'center top', + horizontalalignment='center', + verticalalignment='top', + transform=ax.transAxes) + +ax.text(left, 0.5*(bottom+top), 'right center', + horizontalalignment='right', + verticalalignment='center', + rotation='vertical', + transform=ax.transAxes) + +ax.text(left, 0.5*(bottom+top), 'left center', + horizontalalignment='left', + verticalalignment='center', + rotation='vertical', + transform=ax.transAxes) + +ax.text(0.5*(left+right), 0.5*(bottom+top), 'middle', + horizontalalignment='center', + verticalalignment='center', + fontsize=20, color='red', + transform=ax.transAxes) + +ax.text(right, 0.5*(bottom+top), 'centered', + horizontalalignment='center', + verticalalignment='center', + rotation='vertical', + transform=ax.transAxes) + +ax.text(left, top, 'rotated\nwith newlines', + horizontalalignment='center', + verticalalignment='center', + rotation=45, + transform=ax.transAxes) + +ax.set_axis_off() +plt.show() +``` + +![sphx_glr_text_props_001](https://matplotlib.org/_images/sphx_glr_text_props_001.png) +# Default Font + +The base default font is controlled by a set of rcParams. To set the font +for mathematical expressions, use the rcParams beginning with ``mathtext`` +(see [mathtext](mathtext.html#mathtext-fonts)). + + +--- + + + + + +rcParam +usage + + + +'font.family' +List of either names of font or {'cursive', +'fantasy', 'monospace', 'sans', 'sans serif', +'sans-serif', 'serif'}. + +'font.style' +The default style, ex 'normal', +'italic'. + +'font.variant' +Default variant, ex 'normal', 'small-caps' +(untested) + +'font.stretch' +Default stretch, ex 'normal', 'condensed' +(incomplete) + +'font.weight' +Default weight. Either string or integer + +'font.size' +Default font size in points. Relative font sizes +('large', 'x-small') are computed against +this size. + + + + +The mapping between the family aliases (``{'cursive', 'fantasy', +'monospace', 'sans', 'sans serif', 'sans-serif', 'serif'}``) and actual font names +is controlled by the following rcParams: + + +--- + + + + + +family alias +rcParam with mappings + + + +'serif' +'font.serif' + +'monospace' +'font.monospace' + +'fantasy' +'font.fantasy' + +'cursive' +'font.cursive' + +{'sans', 'sans serif', 'sans-serif'} +'font.sans-serif' + + + + +which are lists of font names. + +## Text with non-latin glyphs + +As of v2.0 the [default font](https://matplotlib.orgusers/dflt_style_changes.html#default-changes-font) contains +glyphs for many western alphabets, but still does not cover all of the +glyphs that may be required by mpl users. For example, DejaVu has no +coverage of Chinese, Korean, or Japanese. + +To set the default font to be one that supports the code points you +need, prepend the font name to ``'font.family'`` or the desired alias +lists + +``` python +matplotlib.rcParams['font.sans-serif'] = ['Source Han Sans TW', 'sans-serif'] +``` + +or set it in your ``.matplotlibrc`` file: + +``` python +font.sans-serif: Source Han Sans TW, Arial, sans-serif +``` + +To control the font used on per-artist basis use the ``'name'``, +``'fontname'`` or ``'fontproperties'`` kwargs documented [above](#). + +On linux, [fc-list](https://linux.die.net/man/1/fc-list) can be a +useful tool to discover the font name; for example + +``` python +$ fc-list :lang=zh family +Noto to Sans Mono CJK TC,Noto Sans Mono CJK TC Bold +Noto Sans CJK TC,Noto Sans CJK TC Medium +Noto Sans CJK TC,Noto Sans CJK TC DemiLight +Noto Sans CJK KR,Noto Sans CJK KR Black +Noto Sans CJK TC,Noto Sans CJK TC Black +Noto Sans Mono CJK TC,Noto Sans Mono CJK TC Regular +Noto Sans CJK SC,Noto Sans CJK SC Light +``` + +lists all of the fonts that support Chinese. + +## Download + +- [Download Python source code: text_props.py](https://matplotlib.org/_downloads/ae6077d9637819ed9799d96f3fccde64/text_props.py) +- [Download Jupyter notebook: text_props.ipynb](https://matplotlib.org/_downloads/a34bdc8d86b130a33d92221e0c320b5b/text_props.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/text/usetex.md b/Python/matplotlab/text/usetex.md new file mode 100644 index 00000000..bda08290 --- /dev/null +++ b/Python/matplotlab/text/usetex.md @@ -0,0 +1,144 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Text rendering With LaTeX + +Rendering text with LaTeX in Matplotlib. + +Matplotlib has the option to use LaTeX to manage all text layout. This +option is available with the following backends: + +- Agg +- PS +- PDF + +The LaTeX option is activated by setting ``text.usetex : True`` in your rc +settings. Text handling with matplotlib's LaTeX support is slower than +matplotlib's very capable [mathtext](mathtext.html), but is +more flexible, since different LaTeX packages (font packages, math packages, +etc.) can be used. The results can be striking, especially when you take care +to use the same fonts in your figures as in the main document. + +Matplotlib's LaTeX support requires a working [LaTeX](http://www.tug.org) installation, [dvipng](http://www.nongnu.org/dvipng/) +(which may be included with your LaTeX installation), and [Ghostscript](https://ghostscript.com/) +(GPL Ghostscript 9.0 or later is required). The executables for these +external dependencies must all be located on your [``PATH``](https://matplotlib.orgfaq/environment_variables_faq.html#envvar-PATH). + +There are a couple of options to mention, which can be changed using +[rc settings](https://matplotlib.org/introductory/customizing.html). Here is an example +matplotlibrc file: + +``` python +font.family : serif +font.serif : Times, Palatino, New Century Schoolbook, Bookman, Computer Modern Roman +font.sans-serif : Helvetica, Avant Garde, Computer Modern Sans serif +font.cursive : Zapf Chancery +font.monospace : Courier, Computer Modern Typewriter + +text.usetex : true +``` + +The first valid font in each family is the one that will be loaded. If the +fonts are not specified, the Computer Modern fonts are used by default. All of +the other fonts are Adobe fonts. Times and Palatino each have their own +accompanying math fonts, while the other Adobe serif fonts make use of the +Computer Modern math fonts. See the [PSNFSS](http://www.ctan.org/tex-archive/macros/latex/required/psnfss/psnfss2e.pdf) documentation for more details. + +To use LaTeX and select Helvetica as the default font, without editing +matplotlibrc use: + +``` python +from matplotlib import rc +rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']}) +## for Palatino and other serif fonts use: +#rc('font',**{'family':'serif','serif':['Palatino']}) +rc('text', usetex=True) +``` + +Here is the standard example, ``tex_demo.py``: + +TeX Demo + +Note that display math mode (``$ e=mc^2 $``) is not supported, but adding the +command ``\displaystyle``, as in ``tex_demo.py``, will produce the same +results. + +::: tip Note + +Certain characters require special escaping in TeX, such as: + +``` python +# $ % & ~ _ ^ \ { } \( \) \[ \] +``` + +Therefore, these characters will behave differently depending on +the rcParam ``text.usetex`` flag. + +::: + +## usetex with unicode + +It is also possible to use unicode strings with the LaTeX text manager, here is +an example taken from ``tex_demo.py``. The axis labels include Unicode text: + +TeX Unicode Demo + +## Postscript options + +In order to produce encapsulated postscript files that can be embedded in a new +LaTeX document, the default behavior of matplotlib is to distill the output, +which removes some postscript operators used by LaTeX that are illegal in an +eps file. This step produces results which may be unacceptable to some users, +because the text is coarsely rasterized and converted to bitmaps, which are not +scalable like standard postscript, and the text is not searchable. One +workaround is to set ``ps.distiller.res`` to a higher value (perhaps 6000) +in your rc settings, which will produce larger files but may look better and +scale reasonably. A better workaround, which requires [Poppler](https://poppler.freedesktop.org/) or [Xpdf](http://www.xpdfreader.com/), can be +activated by changing the ``ps.usedistiller`` rc setting to ``xpdf``. This +alternative produces postscript without rasterizing text, so it scales +properly, can be edited in Adobe Illustrator, and searched text in pdf +documents. + +## Possible hangups + +- On Windows, the [``PATH``](https://matplotlib.orgfaq/environment_variables_faq.html#envvar-PATH) environment variable may need to be modified +to include the directories containing the latex, dvipng and ghostscript +executables. See [Environment Variables](https://matplotlib.orgfaq/environment_variables_faq.html#environment-variables) and +[Setting environment variables in windows](https://matplotlib.orgfaq/environment_variables_faq.html#setting-windows-environment-variables) for details. +- Using MiKTeX with Computer Modern fonts, if you get odd *Agg and PNG +results, go to MiKTeX/Options and update your format files +- On Ubuntu and Gentoo, the base texlive install does not ship with +the type1cm package. You may need to install some of the extra +packages to get all the goodies that come bundled with other latex +distributions. +- Some progress has been made so matplotlib uses the dvi files +directly for text layout. This allows latex to be used for text +layout with the pdf and svg backends, as well as the *Agg and PS +backends. In the future, a latex installation may be the only +external dependency. + +## Troubleshooting + +- Try deleting your ``.matplotlib/tex.cache`` directory. If you don't know +where to find ``.matplotlib``, see [matplotlib configuration and cache directory locations](https://matplotlib.orgfaq/troubleshooting_faq.html#locating-matplotlib-config-dir). +- Make sure LaTeX, dvipng and ghostscript are each working and on your +[``PATH``](https://matplotlib.orgfaq/environment_variables_faq.html#envvar-PATH). +- Make sure what you are trying to do is possible in a LaTeX document, +that your LaTeX syntax is valid and that you are using raw strings +if necessary to avoid unintended escape sequences. +- Most problems reported on the mailing list have been cleared up by +upgrading [Ghostscript](https://ghostscript.com/). If possible, please try upgrading to the +latest release before reporting problems to the list. +- The ``text.latex.preamble`` rc setting is not officially supported. This +option provides lots of flexibility, and lots of ways to cause +problems. Please disable this option before reporting problems to +the mailing list. +- If you still need help, please see [Getting help](https://matplotlib.orgfaq/troubleshooting_faq.html#reporting-problems) + +## Download + +- [Download Python source code: usetex.py](https://matplotlib.org/_downloads/57ba3a46dd639627a7c67fd5e227bb43/usetex.py) +- [Download Jupyter notebook: usetex.ipynb](https://matplotlib.org/_downloads/534707238f9dbb23f6e17e815b9a3f46/usetex.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/toolkits/axes_grid.md b/Python/matplotlab/toolkits/axes_grid.md new file mode 100644 index 00000000..be65334c --- /dev/null +++ b/Python/matplotlab/toolkits/axes_grid.md @@ -0,0 +1,497 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Overview of axes_grid1 toolkit + +Controlling the layout of plots with the axes_grid toolkit. + +## What is axes_grid1 toolkit? + +*axes_grid1* is a collection of helper classes to ease displaying +(multiple) images with matplotlib. In matplotlib, the axes location +(and size) is specified in the normalized figure coordinates, which +may not be ideal for displaying images that needs to have a given +aspect ratio. For example, it helps if you have a colorbar whose +height always matches that of the image. [ImageGrid](#imagegrid), [RGB Axes](#rgb-axes) and +[AxesDivider](#axesdivider) are helper classes that deals with adjusting the +location of (multiple) Axes. They provides a framework to adjust the +position of multiple axes at the drawing time. [ParasiteAxes](#parasiteaxes) +provides twinx(or twiny)-like features so that you can plot different +data (e.g., different y-scale) in a same Axes. [AnchoredArtists](#anchoredartists) +includes custom artists which are placed at some anchored position, +like the legend. + +Demo Axes Grid + +## axes_grid1 + +### ImageGrid + +A class that creates a grid of Axes. In matplotlib, the axes location +(and size) is specified in the normalized figure coordinates. This may +not be ideal for images that needs to be displayed with a given aspect +ratio. For example, displaying images of a same size with some fixed +padding between them cannot be easily done in matplotlib. ImageGrid is +used in such case. + +Simple Axesgrid + +- The position of each axes is determined at the drawing time (see +[AxesDivider](#axesdivider)), so that the size of the entire grid fits in the +given rectangle (like the aspect of axes). Note that in this example, +the paddings between axes are fixed even if you changes the figure +size. +- axes in the same column has a same axes width (in figure +coordinate), and similarly, axes in the same row has a same +height. The widths (height) of the axes in the same row (column) are +scaled according to their view limits (xlim or ylim). + +Simple Axes Grid +- xaxis are shared among axes in a same column. Similarly, yaxis are +shared among axes in a same row. Therefore, changing axis properties +(view limits, tick location, etc. either by plot commands or using +your mouse in interactive backends) of one axes will affect all +other shared axes. + +When initialized, ImageGrid creates given number (*ngrids* or *ncols* * +*nrows* if *ngrids* is None) of Axes instances. A sequence-like +interface is provided to access the individual Axes instances (e.g., +grid[0] is the first Axes in the grid. See below for the order of +axes). + +ImageGrid takes following arguments, + + +--- + + + + + + +Name +Default +Description + + + +fig +  +  + +rect +  +  + +nrows_ncols +  +number of rows and cols. e.g., (2,2) + +ngrids +None +number of grids. nrows x ncols if None + +direction +"row" +increasing direction of axes number. [row|column] + +axes_pad +0.02 +pad between axes in inches + +add_all +True +Add axes to figures if True + +share_all +False +xaxis & yaxis of all axes are shared if True + +aspect +True +aspect of axes + +label_mode +"L" +location of tick labels thaw will be displayed. +"1" (only the lower left axes), +"L" (left most and bottom most axes), +or "all". + +cbar_mode +None +[None|single|each] + +cbar_location +"right" +[right|top] + +cbar_pad +None +pad between image axes and colorbar axes + +cbar_size +"5%" +size of the colorbar + +axes_class +None +  + + + + +*rect* + +*direction* + +*aspect* + +*share_all* + +*direction* + +direction of increasing axes number. For "row", + + +--- + + + + + +grid[0] +grid[1] + +grid[2] +grid[3] + + + + +For "column", + + +--- + + + + + +grid[0] +grid[2] + +grid[1] +grid[3] + + + + +You can also create a colorbar (or colorbars). You can have colorbar +for each axes (cbar_mode="each"), or you can have a single colorbar +for the grid (cbar_mode="single"). The colorbar can be placed on your +right, or top. The axes for each colorbar is stored as a *cbar_axes* +attribute. + +The examples below show what you can do with ImageGrid. + +Demo Axes Grid + +### AxesDivider Class + +Behind the scene, the ImageGrid class and the RGBAxes class utilize the +AxesDivider class, whose role is to calculate the location of the axes +at drawing time. While a more about the AxesDivider is (will be) +explained in (yet to be written) AxesDividerGuide, direct use of the +AxesDivider class will not be necessary for most users. The +axes_divider module provides a helper function make_axes_locatable, +which can be useful. It takes a existing axes instance and create a +divider for it. + +``` python +ax = subplot(1,1,1) +divider = make_axes_locatable(ax) +``` + +*make_axes_locatable* returns an instance of the AxesLocator class, +derived from the Locator. It provides *append_axes* method that +creates a new axes on the given side of ("top", "right", "bottom" and +"left") of the original axes. + +### colorbar whose height (or width) in sync with the master axes + +Simple Colorbar + +#### scatter_hist.py with AxesDivider + +The "scatter_hist.py" example in mpl can be rewritten using +*make_axes_locatable*. + +``` python +axScatter = subplot(111) +axScatter.scatter(x, y) +axScatter.set_aspect(1.) + +# create new axes on the right and on the top of the current axes. +divider = make_axes_locatable(axScatter) +axHistx = divider.append_axes("top", size=1.2, pad=0.1, sharex=axScatter) +axHisty = divider.append_axes("right", size=1.2, pad=0.1, sharey=axScatter) + +# the scatter plot: +# histograms +bins = np.arange(-lim, lim + binwidth, binwidth) +axHistx.hist(x, bins=bins) +axHisty.hist(y, bins=bins, orientation='horizontal') +``` + +See the full source code below. + +Scatter Hist + +The scatter_hist using the AxesDivider has some advantage over the +original scatter_hist.py in mpl. For example, you can set the aspect +ratio of the scatter plot, even with the x-axis or y-axis is shared +accordingly. + +### ParasiteAxes + +The ParasiteAxes is an axes whose location is identical to its host +axes. The location is adjusted in the drawing time, thus it works even +if the host change its location (e.g., images). + +In most cases, you first create a host axes, which provides a few +method that can be used to create parasite axes. They are *twinx*, +*twiny* (which are similar to twinx and twiny in the matplotlib) and +*twin*. *twin* takes an arbitrary transformation that maps between the +data coordinates of the host axes and the parasite axes. *draw* +method of the parasite axes are never called. Instead, host axes +collects artists in parasite axes and draw them as if they belong to +the host axes, i.e., artists in parasite axes are merged to those of +the host axes and then drawn according to their zorder. The host and +parasite axes modifies some of the axes behavior. For example, color +cycle for plot lines are shared between host and parasites. Also, the +legend command in host, creates a legend that includes lines in the +parasite axes. To create a host axes, you may use *host_subplot* or +*host_axes* command. + +#### Example 1. twinx + +Parasite Simple + +#### Example 2. twin + +*twin* without a transform argument assumes that the parasite axes has the +same data transform as the host. This can be useful when you want the +top(or right)-axis to have different tick-locations, tick-labels, or +tick-formatter for bottom(or left)-axis. + +``` python +ax2 = ax.twin() # now, ax2 is responsible for "top" axis and "right" axis +ax2.set_xticks([0., .5*np.pi, np.pi, 1.5*np.pi, 2*np.pi]) +ax2.set_xticklabels(["0", r"$\frac{1}{2}\pi$", + r"$\pi$", r"$\frac{3}{2}\pi$", r"$2\pi$"]) +``` + +Simple Axisline4 + +A more sophisticated example using twin. Note that if you change the +x-limit in the host axes, the x-limit of the parasite axes will change +accordingly. + +Parasite Simple2 + +### AnchoredArtists + +It's a collection of artists whose location is anchored to the (axes) +bbox, like the legend. It is derived from *OffsetBox* in mpl, and +artist need to be drawn in the canvas coordinate. But, there is a +limited support for an arbitrary transform. For example, the ellipse +in the example below will have width and height in the data +coordinate. + +Simple Anchored Artists + +### InsetLocator + +[``mpl_toolkits.axes_grid1.inset_locator``](https://matplotlib.orgapi/_as_gen/mpl_toolkits.axes_grid1.inset_locator.html#module-mpl_toolkits.axes_grid1.inset_locator) provides helper classes +and functions to place your (inset) axes at the anchored position of +the parent axes, similarly to AnchoredArtist. + +Using [``mpl_toolkits.axes_grid1.inset_locator.inset_axes()``](https://matplotlib.orgapi/_as_gen/mpl_toolkits.axes_grid1.inset_locator.inset_axes.html#mpl_toolkits.axes_grid1.inset_locator.inset_axes), you +can have inset axes whose size is either fixed, or a fixed proportion +of the parent axes. For example,: + +``` python +inset_axes = inset_axes(parent_axes, + width="30%", # width = 30% of parent_bbox + height=1., # height : 1 inch + loc='lower left') +``` + +creates an inset axes whose width is 30% of the parent axes and whose +height is fixed at 1 inch. + +You may creates your inset whose size is determined so that the data +scale of the inset axes to be that of the parent axes multiplied by +some factor. For example, + +``` python +inset_axes = zoomed_inset_axes(ax, + 0.5, # zoom = 0.5 + loc='upper right') +``` + +creates an inset axes whose data scale is half of the parent axes. +Here is complete examples. + +Inset Locator Demo + +For example, ``zoomed_inset_axes()`` can be used when you want the +inset represents the zoom-up of the small portion in the parent axes. +And ``mpl_toolkits/axes_grid/inset_locator`` provides a helper +function ``mark_inset()`` to mark the location of the area +represented by the inset axes. + +Inset Locator Demo2 + +#### RGB Axes + +RGBAxes is a helper class to conveniently show RGB composite +images. Like ImageGrid, the location of axes are adjusted so that the +area occupied by them fits in a given rectangle. Also, the xaxis and +yaxis of each axes are shared. + +``` python +from mpl_toolkits.axes_grid1.axes_rgb import RGBAxes + +fig = plt.figure() +ax = RGBAxes(fig, [0.1, 0.1, 0.8, 0.8]) + +r, g, b = get_rgb() # r,g,b are 2-d images +ax.imshow_rgb(r, g, b, + origin="lower", interpolation="nearest") +``` + +Simple Rgb + +## AxesDivider + +The axes_divider module provides helper classes to adjust the axes +positions of a set of images at drawing time. + +- [``axes_size``](https://matplotlib.orgapi/_as_gen/mpl_toolkits.axes_grid1.axes_size.html#module-mpl_toolkits.axes_grid1.axes_size) provides a class of +units that are used to determine the size of each axes. For example, +you can specify a fixed size. +- ``Divider`` is the class +that calculates the axes position. It divides the given +rectangular area into several areas. The divider is initialized by +setting the lists of horizontal and vertical sizes on which the division +will be based. Then use +``new_locator()``, +which returns a callable object that can be used to set the +axes_locator of the axes. + +First, initialize the divider by specifying its grids, i.e., +horizontal and vertical. + +for example,: + +``` python +rect = [0.2, 0.2, 0.6, 0.6] +horiz=[h0, h1, h2, h3] +vert=[v0, v1, v2] +divider = Divider(fig, rect, horiz, vert) +``` + +where, rect is a bounds of the box that will be divided and h0,..h3, +v0,..v2 need to be an instance of classes in the +[``axes_size``](https://matplotlib.orgapi/_as_gen/mpl_toolkits.axes_grid1.axes_size.html#module-mpl_toolkits.axes_grid1.axes_size). They have *get_size* method +that returns a tuple of two floats. The first float is the relative +size, and the second float is the absolute size. Consider a following +grid. + + +--- + + + + + + + +v0 +  +  +  + +v1 +  +  +  + +h0,v2 +h1 +h2 +h3 + + + + +- v0 => 0, 2 +- v1 => 2, 0 +- v2 => 3, 0 + +The height of the bottom row is always 2 (axes_divider internally +assumes that the unit is inches). The first and the second rows have a +height ratio of 2:3. For example, if the total height of the grid is 6, +then the first and second row will each occupy 2/(2+3) and 3/(2+3) of +(6-1) inches. The widths of the horizontal columns will be similarly +determined. When the aspect ratio is set, the total height (or width) will +be adjusted accordingly. + +The [``mpl_toolkits.axes_grid1.axes_size``](https://matplotlib.orgapi/_as_gen/mpl_toolkits.axes_grid1.axes_size.html#module-mpl_toolkits.axes_grid1.axes_size) contains several classes +that can be used to set the horizontal and vertical configurations. For +example, for vertical configuration one could use: + +``` python +from mpl_toolkits.axes_grid1.axes_size import Fixed, Scaled +vert = [Fixed(2), Scaled(2), Scaled(3)] +``` + +After you set up the divider object, then you create a locator +instance that will be given to the axes object.: + +``` python +locator = divider.new_locator(nx=0, ny=1) +ax.set_axes_locator(locator) +``` + +The return value of the new_locator method is an instance of the +AxesLocator class. It is a callable object that returns the +location and size of the cell at the first column and the second row. +You may create a locator that spans over multiple cells.: + +``` python +locator = divider.new_locator(nx=0, nx=2, ny=1) +``` + +The above locator, when called, will return the position and size of +the cells spanning the first and second column and the first row. In +this example, it will return [0:2, 1]. + +See the example, + +Simple Axes Divider2 + +You can adjust the size of each axes according to its x or y +data limits (AxesX and AxesY). + +Simple Axes Divider3 + +## Download + +- [Download Python source code: axes_grid.py](https://matplotlib.org/_downloads/cc46224ebb7a7afda7d8b6f3a1e58c06/axes_grid.py) +- [Download Jupyter notebook: axes_grid.ipynb](https://matplotlib.org/_downloads/c0b6a1c863337a7a54913ff3820e598b/axes_grid.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/toolkits/axisartist.md b/Python/matplotlab/toolkits/axisartist.md new file mode 100644 index 00000000..3ec95dd5 --- /dev/null +++ b/Python/matplotlab/toolkits/axisartist.md @@ -0,0 +1,617 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# Overview of axisartist toolkit + +The axisartist toolkit tutorial. + +::: danger Warning + +*axisartist* uses a custom Axes class +(derived from the mpl's original Axes class). +As a side effect, some commands (mostly tick-related) do not work. + +::: + +The *axisartist* contains a custom Axes class that is meant to support +curvilinear grids (e.g., the world coordinate system in astronomy). +Unlike mpl's original Axes class which uses Axes.xaxis and Axes.yaxis +to draw ticks, ticklines, etc., axisartist uses a special +artist (AxisArtist) that can handle ticks, ticklines, etc. for +curved coordinate systems. + +Demo Floating Axis + +Since it uses special artists, some Matplotlib commands that work on +Axes.xaxis and Axes.yaxis may not work. + +## axisartist + +The *axisartist* module provides a custom (and very experimental) Axes +class, where each axis (left, right, top, and bottom) have a separate +associated artist which is responsible for drawing the axis-line, ticks, +ticklabels, and labels. You can also create your own axis, which can pass +through a fixed position in the axes coordinate, or a fixed position +in the data coordinate (i.e., the axis floats around when viewlimit +changes). + +The axes class, by default, has its xaxis and yaxis invisible, and +has 4 additional artists which are responsible for drawing the 4 axis spines in +"left", "right", "bottom", and "top". They are accessed as +ax.axis["left"], ax.axis["right"], and so on, i.e., ax.axis is a +dictionary that contains artists (note that ax.axis is still a +callable method and it behaves as an original Axes.axis method in +Matplotlib). + +To create an axes, + +``` python +import mpl_toolkits.axisartist as AA +fig = plt.figure() +ax = AA.Axes(fig, [0.1, 0.1, 0.8, 0.8]) +fig.add_axes(ax) +``` + +or to create a subplot + +``` python +ax = AA.Subplot(fig, 111) +fig.add_subplot(ax) +``` + +For example, you can hide the right and top spines using: + +``` python +ax.axis["right"].set_visible(False) +ax.axis["top"].set_visible(False) +``` + +Simple Axisline3 + +It is also possible to add a horizontal axis. For example, you may have an +horizontal axis at y=0 (in data coordinate). + +``` python +ax.axis["y=0"] = ax.new_floating_axis(nth_coord=0, value=0) +``` + +Simple Axisartist1 + +Or a fixed axis with some offset + +``` python +# make new (right-side) yaxis, but with some offset +ax.axis["right2"] = ax.new_fixed_axis(loc="right", + offset=(20, 0)) +``` + +### axisartist with ParasiteAxes + +Most commands in the axes_grid1 toolkit can take an axes_class keyword +argument, and the commands create an axes of the given class. For example, +to create a host subplot with axisartist.Axes, + +``` python +import mpl_toolkits.axisartist as AA +from mpl_toolkits.axes_grid1 import host_subplot + +host = host_subplot(111, axes_class=AA.Axes) +``` + +Here is an example that uses ParasiteAxes. + +Demo Parasite Axes2 + +### Curvilinear Grid + +The motivation behind the AxisArtist module is to support a curvilinear grid +and ticks. + +Demo Curvelinear Grid + +### Floating Axes + +AxisArtist also supports a Floating Axes whose outer axes are defined as +floating axis. + +Demo Floating Axes + +## axisartist namespace + +The *axisartist* namespace includes a derived Axes implementation. The +biggest difference is that the artists responsible to draw axis line, +ticks, ticklabel and axis labels are separated out from the mpl's Axis +class, which are much more than artists in the original mpl. This +change was strongly motivated to support curvilinear grid. Here are a +few things that mpl_toolkits.axisartist.Axes is different from original +Axes from mpl. + +- Axis elements (axis line(spine), ticks, ticklabel and axis labels) +are drawn by a AxisArtist instance. Unlike Axis, left, right, top +and bottom axis are drawn by separate artists. And each of them may +have different tick location and different tick labels. +- gridlines are drawn by a Gridlines instance. The change was +motivated that in curvilinear coordinate, a gridline may not cross +axis-lines (i.e., no associated ticks). In the original Axes class, +gridlines are tied to ticks. +- ticklines can be rotated if necessary (i.e, along the gridlines) + +In summary, all these changes was to support + +- a curvilinear grid. +- a floating axis + +Demo Floating Axis + +*mpl_toolkits.axisartist.Axes* class defines a *axis* attribute, which +is a dictionary of AxisArtist instances. By default, the dictionary +has 4 AxisArtist instances, responsible for drawing of left, right, +bottom and top axis. + +xaxis and yaxis attributes are still available, however they are set +to not visible. As separate artists are used for rendering axis, some +axis-related method in mpl may have no effect. +In addition to AxisArtist instances, the mpl_toolkits.axisartist.Axes will +have *gridlines* attribute (Gridlines), which obviously draws grid +lines. + +In both AxisArtist and Gridlines, the calculation of tick and grid +location is delegated to an instance of GridHelper class. +mpl_toolkits.axisartist.Axes class uses GridHelperRectlinear as a grid +helper. The GridHelperRectlinear class is a wrapper around the *xaxis* +and *yaxis* of mpl's original Axes, and it was meant to work as the +way how mpl's original axes works. For example, tick location changes +using set_ticks method and etc. should work as expected. But change in +artist properties (e.g., color) will not work in general, although +some effort has been made so that some often-change attributes (color, +etc.) are respected. + +## AxisArtist + +AxisArtist can be considered as a container artist with following +attributes which will draw ticks, labels, etc. + +- line +- major_ticks, major_ticklabels +- minor_ticks, minor_ticklabels +- offsetText +- label + +### line + +Derived from Line2d class. Responsible for drawing a spinal(?) line. + +### major_ticks, minor_ticks + +Derived from Line2d class. Note that ticks are markers. + +### major_ticklabels, minor_ticklabels + +Derived from Text. Note that it is not a list of Text artist, but a +single artist (similar to a collection). + +### axislabel + +Derived from Text. + +## Default AxisArtists + +By default, following for axis artists are defined.: + +``` python +ax.axis["left"], ax.axis["bottom"], ax.axis["right"], ax.axis["top"] +``` + +The ticklabels and axislabel of the top and the right axis are set to +not visible. + +For example, if you want to change the color attributes of +major_ticklabels of the bottom x-axis + +``` python +ax.axis["bottom"].major_ticklabels.set_color("b") +``` + +Similarly, to make ticklabels invisible + +``` python +ax.axis["bottom"].major_ticklabels.set_visible(False) +``` + +AxisArtist provides a helper method to control the visibility of ticks, +ticklabels, and label. To make ticklabel invisible, + +``` python +ax.axis["bottom"].toggle(ticklabels=False) +``` + +To make all of ticks, ticklabels, and (axis) label invisible + +``` python +ax.axis["bottom"].toggle(all=False) +``` + +To turn all off but ticks on + +``` python +ax.axis["bottom"].toggle(all=False, ticks=True) +``` + +To turn all on but (axis) label off + +``` python +ax.axis["bottom"].toggle(all=True, label=False)) +``` + +ax.axis's __getitem__ method can take multiple axis names. For +example, to turn ticklabels of "top" and "right" axis on, + +``` python +ax.axis["top","right"].toggle(ticklabels=True)) +``` + +Note that 'ax.axis["top","right"]' returns a simple proxy object that translate above code to something like below. + +``` python +for n in ["top","right"]: + ax.axis[n].toggle(ticklabels=True)) +``` + +So, any return values in the for loop are ignored. And you should not +use it anything more than a simple method. + +Like the list indexing ":" means all items, i.e., + +``` python +ax.axis[:].major_ticks.set_color("r") +``` + +changes tick color in all axis. + +## HowTo + +1. Changing tick locations and label. + +Same as the original mpl's axes.: + +``` python +ax.set_xticks([1,2,3]) +``` + +1. Changing axis properties like color, etc. + +Change the properties of appropriate artists. For example, to change +the color of the ticklabels: + +``` python +ax.axis["left"].major_ticklabels.set_color("r") +``` + +## Rotation and Alignment of TickLabels + +This is also quite different from the original mpl and can be +confusing. When you want to rotate the ticklabels, first consider +using "set_axis_direction" method. + +``` python +ax1.axis["left"].major_ticklabels.set_axis_direction("top") +ax1.axis["right"].label.set_axis_direction("left") +``` + +Simple Axis Direction01 + +The parameter for set_axis_direction is one of ["left", "right", +"bottom", "top"]. + +You must understand some underlying concept of directions. + +On the other hand, there is a concept of "axis_direction". This is a +default setting of above properties for each, "bottom", "left", "top", +and "right" axis. + + +--- + + + + + + + + + +? +? +left +bottom +right +top + +axislabel +direction +'-' +'+' +'+' +'-' + +axislabel +rotation +180 +0 +0 +180 + +axislabel +va +center +top +center +bottom + +axislabel +ha +right +center +right +center + +ticklabel +direction +'-' +'+' +'+' +'-' + +ticklabels +rotation +90 +0 +-90 +180 + +ticklabel +ha +right +center +right +center + +ticklabel +va +center +baseline +center +baseline + + + + +And, 'set_axis_direction("top")' means to adjust the text rotation +etc, for settings suitable for "top" axis. The concept of axis +direction can be more clear with curved axis. + +Demo Axis Direction + +The axis_direction can be adjusted in the AxisArtist level, or in the +level of its child artists, i.e., ticks, ticklabels, and axis-label. + +``` python +ax1.axis["left"].set_axis_direction("top") +``` + +changes axis_direction of all the associated artist with the "left" +axis, while + +``` python +ax1.axis["left"].major_ticklabels.set_axis_direction("top") +``` + +changes the axis_direction of only the major_ticklabels. Note that +set_axis_direction in the AxisArtist level changes the +ticklabel_direction and label_direction, while changing the +axis_direction of ticks, ticklabels, and axis-label does not affect +them. + +If you want to make ticks outward and ticklabels inside the axes, +use invert_ticklabel_direction method. + +``` python +ax.axis[:].invert_ticklabel_direction() +``` + +A related method is "set_tick_out". It makes ticks outward (as a +matter of fact, it makes ticks toward the opposite direction of the +default direction). + +``` python +ax.axis[:].major_ticks.set_tick_out(True) +``` + +Simple Axis Direction03 + +So, in summary, + +- AxisArtist's methods + +set_axis_direction : "left", "right", "bottom", or "top" +set_ticklabel_direction : "+" or "-" +set_axislabel_direction : "+" or "-" +invert_ticklabel_direction +- set_axis_direction : "left", "right", "bottom", or "top" +- set_ticklabel_direction : "+" or "-" +- set_axislabel_direction : "+" or "-" +- invert_ticklabel_direction +- Ticks' methods (major_ticks and minor_ticks) + +set_tick_out : True or False +set_ticksize : size in points +- set_tick_out : True or False +- set_ticksize : size in points +- TickLabels' methods (major_ticklabels and minor_ticklabels) + +set_axis_direction : "left", "right", "bottom", or "top" +set_rotation : angle with respect to the reference direction +set_ha and set_va : see below +- set_axis_direction : "left", "right", "bottom", or "top" +- set_rotation : angle with respect to the reference direction +- set_ha and set_va : see below +- AxisLabels' methods (label) + +set_axis_direction : "left", "right", "bottom", or "top" +set_rotation : angle with respect to the reference direction +set_ha and set_va +- set_axis_direction : "left", "right", "bottom", or "top" +- set_rotation : angle with respect to the reference direction +- set_ha and set_va + +### Adjusting ticklabels alignment + +Alignment of TickLabels are treated specially. See below + +Demo Ticklabel Alignment + +### Adjusting pad + +To change the pad between ticks and ticklabels + +``` python +ax.axis["left"].major_ticklabels.set_pad(10) +``` + +Or ticklabels and axis-label + +``` python +ax.axis["left"].label.set_pad(10) +``` + +Simple Axis Pad + +## GridHelper + +To actually define a curvilinear coordinate, you have to use your own +grid helper. A generalised version of grid helper class is supplied +and this class should suffice in most of cases. A user may provide +two functions which defines a transformation (and its inverse pair) +from the curved coordinate to (rectilinear) image coordinate. Note that +while ticks and grids are drawn for curved coordinate, the data +transform of the axes itself (ax.transData) is still rectilinear +(image) coordinate. + +``` python +from mpl_toolkits.axisartist.grid_helper_curvelinear \ + import GridHelperCurveLinear +from mpl_toolkits.axisartist import Subplot + +# from curved coordinate to rectlinear coordinate. +def tr(x, y): + x, y = np.asarray(x), np.asarray(y) + return x, y-x + +# from rectlinear coordinate to curved coordinate. +def inv_tr(x,y): + x, y = np.asarray(x), np.asarray(y) + return x, y+x + +grid_helper = GridHelperCurveLinear((tr, inv_tr)) + +ax1 = Subplot(fig, 1, 1, 1, grid_helper=grid_helper) + +fig.add_subplot(ax1) +``` + +You may use matplotlib's Transform instance instead (but a +inverse transformation must be defined). Often, coordinate range in a +curved coordinate system may have a limited range, or may have +cycles. In those cases, a more customized version of grid helper is +required. + +``` python +import mpl_toolkits.axisartist.angle_helper as angle_helper + +# PolarAxes.PolarTransform takes radian. However, we want our coordinate +# system in degree +tr = Affine2D().scale(np.pi/180., 1.) + PolarAxes.PolarTransform() + +# extreme finder : find a range of coordinate. +# 20, 20 : number of sampling points along x, y direction +# The first coordinate (longitude, but theta in polar) +# has a cycle of 360 degree. +# The second coordinate (latitude, but radius in polar) has a minimum of 0 +extreme_finder = angle_helper.ExtremeFinderCycle(20, 20, + lon_cycle = 360, + lat_cycle = None, + lon_minmax = None, + lat_minmax = (0, np.inf), + ) + +# Find a grid values appropriate for the coordinate (degree, +# minute, second). The argument is a approximate number of grids. +grid_locator1 = angle_helper.LocatorDMS(12) + +# And also uses an appropriate formatter. Note that,the +# acceptable Locator and Formatter class is a bit different than +# that of mpl's, and you cannot directly use mpl's Locator and +# Formatter here (but may be possible in the future). +tick_formatter1 = angle_helper.FormatterDMS() + +grid_helper = GridHelperCurveLinear(tr, + extreme_finder=extreme_finder, + grid_locator1=grid_locator1, + tick_formatter1=tick_formatter1 + ) +``` + +Again, the *transData* of the axes is still a rectilinear coordinate +(image coordinate). You may manually do conversion between two +coordinates, or you may use Parasite Axes for convenience.: + +``` python +ax1 = SubplotHost(fig, 1, 2, 2, grid_helper=grid_helper) + +# A parasite axes with given transform +ax2 = ParasiteAxesAuxTrans(ax1, tr, "equal") +# note that ax2.transData == tr + ax1.transData +# Anything you draw in ax2 will match the ticks and grids of ax1. +ax1.parasites.append(ax2) +``` + +Demo Curvelinear Grid + +## FloatingAxis + +A floating axis is an axis one of whose data coordinate is fixed, i.e, +its location is not fixed in Axes coordinate but changes as axes data +limits changes. A floating axis can be created using +*new_floating_axis* method. However, it is your responsibility that +the resulting AxisArtist is properly added to the axes. A recommended +way is to add it as an item of Axes's axis attribute.: + +``` python +# floating axis whose first (index starts from 0) coordinate +# (theta) is fixed at 60 + +ax1.axis["lat"] = axis = ax1.new_floating_axis(0, 60) +axis.label.set_text(r"$\theta = 60^{\circ}$") +axis.label.set_visible(True) +``` + +See the first example of this page. + +## Current Limitations and TODO's + +The code need more refinement. Here is a incomplete list of issues and TODO's + +- No easy way to support a user customized tick location (for +curvilinear grid). A new Locator class needs to be created. +- FloatingAxis may have coordinate limits, e.g., a floating axis of x += 0, but y only spans from 0 to 1. +- The location of axislabel of FloatingAxis needs to be optionally +given as a coordinate value. ex, a floating axis of x=0 with label at y=1 + +## Download + +- [Download Python source code: axisartist.py](https://matplotlib.org/_downloads/009aa8b612fe75c3b3046dbffcd0d1c7/axisartist.py) +- [Download Jupyter notebook: axisartist.ipynb](https://matplotlib.org/_downloads/47bc25cb4e9c18eccf1385f78c4ea405/axisartist.ipynb) + \ No newline at end of file diff --git a/Python/matplotlab/toolkits/mplot3d.md b/Python/matplotlab/toolkits/mplot3d.md new file mode 100644 index 00000000..9b8ea507 --- /dev/null +++ b/Python/matplotlab/toolkits/mplot3d.md @@ -0,0 +1,652 @@ +--- +sidebarDepth: 3 +sidebar: auto +--- + +# The mplot3d Toolkit + +Generating 3D plots using the mplot3d toolkit. + +Contents + +- [The mplot3d Toolkit](#the-mplot3d-toolkit) +[Getting started](#getting-started) +[Line plots](#line-plots) +[Scatter plots](#scatter-plots) +[Wireframe plots](#wireframe-plots) +[Surface plots](#surface-plots) +[Tri-Surface plots](#tri-surface-plots) +[Contour plots](#contour-plots) +[Filled contour plots](#filled-contour-plots) +[Polygon plots](#polygon-plots) +[Bar plots](#bar-plots) +[Quiver](#quiver) +[2D plots in 3D](#d-plots-in-3d) +[Text](#text) +[Subplotting](#subplotting) +- [Getting started](#getting-started) +[Line plots](#line-plots) +[Scatter plots](#scatter-plots) +[Wireframe plots](#wireframe-plots) +[Surface plots](#surface-plots) +[Tri-Surface plots](#tri-surface-plots) +[Contour plots](#contour-plots) +[Filled contour plots](#filled-contour-plots) +[Polygon plots](#polygon-plots) +[Bar plots](#bar-plots) +[Quiver](#quiver) +[2D plots in 3D](#d-plots-in-3d) +[Text](#text) +[Subplotting](#subplotting) +- [Line plots](#line-plots) +- [Scatter plots](#scatter-plots) +- [Wireframe plots](#wireframe-plots) +- [Surface plots](#surface-plots) +- [Tri-Surface plots](#tri-surface-plots) +- [Contour plots](#contour-plots) +- [Filled contour plots](#filled-contour-plots) +- [Polygon plots](#polygon-plots) +- [Bar plots](#bar-plots) +- [Quiver](#quiver) +- [2D plots in 3D](#d-plots-in-3d) +- [Text](#text) +- [Subplotting](#subplotting) + +## Getting started + +An Axes3D object is created just like any other axes using +the projection='3d' keyword. +Create a new [``matplotlib.figure.Figure``](https://matplotlib.orgapi/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure) and +add a new axes to it of type ``Axes3D``: + +``` python +import matplotlib.pyplot as plt +from mpl_toolkits.mplot3d import Axes3D +fig = plt.figure() +ax = fig.add_subplot(111, projection='3d') +``` + +*New in version 1.0.0:* This approach is the preferred method of creating a 3D axes. + +::: tip Note + +Prior to version 1.0.0, the method of creating a 3D axes was +different. For those using older versions of matplotlib, change +``ax = fig.add_subplot(111, projection='3d')`` +to ``ax = Axes3D(fig)``. + +::: + +See the [mplot3d FAQ](https://matplotlib.orgapi/toolkits/mplot3d/faq.html#toolkit-mplot3d-faq) for more information about the mplot3d +toolkit. + +### Line plots + + +``Axes3D.````plot``(*self*, *xs*, *ys*, **args*, *zdir='z'*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.plot)[¶](#mpl_toolkits.mplot3d.Axes3D.plot) + +Plot 2D or 3D data. + + +--- + + + +Parameters: +xs : 1D array-like +x coordinates of vertices. + +ys : 1D array-like +y coordinates of vertices. + +zs : scalar or 1D array-like +z coordinates of vertices; either one for all points or one for +each point. + +zdir : {'x', 'y', 'z'} +When plotting 2D data, the direction to use as z ('x', 'y' or 'z'); +defaults to 'z'. + +**kwargs +Other arguments are forwarded to [matplotlib.axes.Axes.plot](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot). + + + + + + + +Lines3d + +### Scatter plots + + +``Axes3D.````scatter``(*self*, *xs*, *ys*, *zs=0*, *zdir='z'*, *s=20*, *c=None*, *depthshade=True*, **args*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.scatter)[¶](#mpl_toolkits.mplot3d.Axes3D.scatter) + +Create a scatter plot. + + +--- + + + +Parameters: +xs, ys : array-like +The data positions. + +zs : float or array-like, optional, default: 0 +The z-positions. Either an array of the same length as xs and +ys or a single value to place all points in the same plane. + +zdir : {'x', 'y', 'z', '-x', '-y', '-z'}, optional, default: 'z' +The axis direction for the zs. This is useful when plotting 2D +data on a 3D Axes. The data must be passed as xs, ys. Setting +zdir to 'y' then plots the data to the x-z-plane. +See also [Plot 2D data on 3D plot](https://matplotlib.org/../gallery/mplot3d/2dcollections3d.html). + +s : scalar or array-like, optional, default: 20 +The marker size in points**2. Either an array of the same length +as xs and ys or a single value to make all markers the same +size. + +c : color, sequence, or sequence of color, optional +The marker color. Possible values: + +A single color format string. +A sequence of color specifications of length n. +A sequence of n numbers to be mapped to colors using cmap and +norm. +A 2-D array in which the rows are RGB or RGBA. + +For more details see the c argument of [[scatter](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter)](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter). + +depthshade : bool, optional, default: True +Whether to shade the scatter markers to give the appearance of +depth. + +**kwargs +All other arguments are passed on to scatter. + + + + +Returns: +paths : [PathCollection](https://matplotlib.org/../api/collections_api.html#matplotlib.collections.PathCollection) + + + + + + + +Scatter3d + +### Wireframe plots + + +``Axes3D.````plot_wireframe``(*self*, *X*, *Y*, *Z*, **args*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.plot_wireframe)[¶](#mpl_toolkits.mplot3d.Axes3D.plot_wireframe) + +Plot a 3D wireframe. + +::: tip Note + +The *rcount* and *ccount* kwargs, which both default to 50, +determine the maximum number of samples used in each direction. If +the input data is larger, it will be downsampled (by slicing) to +these numbers of points. + +::: + + +--- + + + +Parameters: +X, Y, Z : 2d arrays +Data values. + +rcount, ccount : int +Maximum number of samples used in each direction. If the input +data is larger, it will be downsampled (by slicing) to these +numbers of points. Setting a count to zero causes the data to be +not sampled in the corresponding direction, producing a 3D line +plot rather than a wireframe plot. Defaults to 50. + +New in version 2.0. + + +rstride, cstride : int +Downsampling stride in each direction. These arguments are +mutually exclusive with rcount and ccount. If only one of +rstride or cstride is set, the other defaults to 1. Setting a +stride to zero causes the data to be not sampled in the +corresponding direction, producing a 3D line plot rather than a +wireframe plot. +'classic' mode uses a default of rstride = cstride = 1 instead +of the new default of rcount = ccount = 50. + +**kwargs +Other arguments are forwarded to [Line3DCollection](https://matplotlib.org/../api/_as_gen/mpl_toolkits.mplot3d.art3d.Line3DCollection.html#mpl_toolkits.mplot3d.art3d.Line3DCollection). + + + + + + + +Wire3d + +### Surface plots + + +``Axes3D.````plot_surface``(*self*, *X*, *Y*, *Z*, **args*, *norm=None*, *vmin=None*, *vmax=None*, *lightsource=None*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.plot_surface)[¶](#mpl_toolkits.mplot3d.Axes3D.plot_surface) + +Create a surface plot. + +By default it will be colored in shades of a solid color, but it also +supports color mapping by supplying the *cmap* argument. + +::: tip Note + +The *rcount* and *ccount* kwargs, which both default to 50, +determine the maximum number of samples used in each direction. If +the input data is larger, it will be downsampled (by slicing) to +these numbers of points. + +::: + + +--- + + + +Parameters: +X, Y, Z : 2d arrays +Data values. + +rcount, ccount : int +Maximum number of samples used in each direction. If the input +data is larger, it will be downsampled (by slicing) to these +numbers of points. Defaults to 50. + +New in version 2.0. + + +rstride, cstride : int +Downsampling stride in each direction. These arguments are +mutually exclusive with rcount and ccount. If only one of +rstride or cstride is set, the other defaults to 10. +'classic' mode uses a default of rstride = cstride = 10 instead +of the new default of rcount = ccount = 50. + +color : color-like +Color of the surface patches. + +cmap : Colormap +Colormap of the surface patches. + +facecolors : array-like of colors. +Colors of each individual patch. + +norm : Normalize +Normalization for the colormap. + +vmin, vmax : float +Bounds for the normalization. + +shade : bool +Whether to shade the facecolors. Defaults to True. Shading is +always disabled when cmap is specified. + +lightsource : [LightSource](https://matplotlib.org/../api/_as_gen/matplotlib.colors.LightSource.html#matplotlib.colors.LightSource) +The lightsource to use when shade is True. + +**kwargs +Other arguments are forwarded to [Poly3DCollection](https://matplotlib.org/../api/_as_gen/mpl_toolkits.mplot3d.art3d.Poly3DCollection.html#mpl_toolkits.mplot3d.art3d.Poly3DCollection). + + + + + + + +Surface3d + +Surface3d 2 + +Surface3d 3 + +### Tri-Surface plots + + +``Axes3D.````plot_trisurf``(*self*, **args*, *color=None*, *norm=None*, *vmin=None*, *vmax=None*, *lightsource=None*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.plot_trisurf)[¶](#mpl_toolkits.mplot3d.Axes3D.plot_trisurf) + +Plot a triangulated surface. + +The (optional) triangulation can be specified in one of two ways; +either: + +``` python +plot_trisurf(triangulation, ...) +``` + +where triangulation is a [``Triangulation``](https://matplotlib.orgapi/tri_api.html#matplotlib.tri.Triangulation) +object, or: + +``` python +plot_trisurf(X, Y, ...) +plot_trisurf(X, Y, triangles, ...) +plot_trisurf(X, Y, triangles=triangles, ...) +``` + +in which case a Triangulation object will be created. See +[``Triangulation``](https://matplotlib.orgapi/tri_api.html#matplotlib.tri.Triangulation) for a explanation of +these possibilities. + +The remaining arguments are: + +``` python +plot_trisurf(..., Z) +``` + +where *Z* is the array of values to contour, one per point +in the triangulation. + + +--- + + + +Parameters: +X, Y, Z : array-like +Data values as 1D arrays. + +color +Color of the surface patches. + +cmap +A colormap for the surface patches. + +norm : Normalize +An instance of Normalize to map values to colors. + +vmin, vmax : scalar, optional, default: None +Minimum and maximum value to map. + +shade : bool +Whether to shade the facecolors. Defaults to True. Shading is +always disabled when cmap is specified. + +lightsource : [LightSource](https://matplotlib.org/../api/_as_gen/matplotlib.colors.LightSource.html#matplotlib.colors.LightSource) +The lightsource to use when shade is True. + +**kwargs +All other arguments are passed on to +[Poly3DCollection](https://matplotlib.org/../api/_as_gen/mpl_toolkits.mplot3d.art3d.Poly3DCollection.html#mpl_toolkits.mplot3d.art3d.Poly3DCollection) + + + + + + + +Examples + +([Source code](https://matplotlib.orggallery/mplot3d/trisurf3d.py), [png](https://matplotlib.orggallery/mplot3d/trisurf3d.png), [pdf](https://matplotlib.orggallery/mplot3d/trisurf3d.pdf)) + +![trisurf3d1](https://matplotlib.org/_images/trisurf3d1.png) + +([Source code](https://matplotlib.orggallery/mplot3d/trisurf3d_2.py), [png](https://matplotlib.orggallery/mplot3d/trisurf3d_2.png), [pdf](https://matplotlib.orggallery/mplot3d/trisurf3d_2.pdf)) + +![trisurf3d_21](https://matplotlib.org/_images/trisurf3d_21.png) + +*New in version 1.2.0:* This plotting function was added for the v1.2.0 release. + +Trisurf3d + +### Contour plots + + +``Axes3D.````contour``(*self*, *X*, *Y*, *Z*, **args*, *extend3d=False*, *stride=5*, *zdir='z'*, *offset=None*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.contour)[¶](#mpl_toolkits.mplot3d.Axes3D.contour) + +Create a 3D contour plot. + + +--- + + + +Parameters: +X, Y, Z : array-likes +Input data. + +extend3d : bool +Whether to extend contour in 3D; defaults to False. + +stride : int +Step size for extending contour. + +zdir : {'x', 'y', 'z'} +The direction to use; defaults to 'z'. + +offset : scalar +If specified, plot a projection of the contour lines at this +position in a plane normal to zdir + +*args, **kwargs +Other arguments are forwarded to [matplotlib.axes.Axes.contour](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.contour.html#matplotlib.axes.Axes.contour). + + + + +Returns: +matplotlib.contour.QuadContourSet + + + + + + + +Contour3d + +Contour3d 2 + +Contour3d 3 + +### Filled contour plots + + +``Axes3D.````contourf``(*self*, *X*, *Y*, *Z*, **args*, *zdir='z'*, *offset=None*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.contourf)[¶](#mpl_toolkits.mplot3d.Axes3D.contourf) + +Create a 3D filled contour plot. + + +--- + + + +Parameters: +X, Y, Z : array-likes +Input data. + +zdir : {'x', 'y', 'z'} +The direction to use; defaults to 'z'. + +offset : scalar +If specified, plot a projection of the contour lines at this +position in a plane normal to zdir + +*args, **kwargs +Other arguments are forwarded to [matplotlib.axes.Axes.contourf](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.contourf.html#matplotlib.axes.Axes.contourf). + + + + +Returns: +matplotlib.contour.QuadContourSet + + + + + + + +Notes + +*New in version 1.1.0:* The *zdir* and *offset* parameters. + +Contourf3d + +*New in version 1.1.0:* The feature demoed in the second contourf3d example was enabled as a +result of a bugfix for version 1.1.0. + +### Polygon plots + + +``Axes3D.````add_collection3d``(*self*, *col*, *zs=0*, *zdir='z'*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.add_collection3d)[¶](#mpl_toolkits.mplot3d.Axes3D.add_collection3d) + +Add a 3D collection object to the plot. + +2D collection types are converted to a 3D version by +modifying the object and adding z coordinate information. + +Supported are: + +- PolyCollection +- LineCollection +- PatchCollection + +Polys3d + +### Bar plots + + +``Axes3D.````bar``(*self*, *left*, *height*, *zs=0*, *zdir='z'*, **args*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.bar)[¶](#mpl_toolkits.mplot3d.Axes3D.bar) + +Add 2D bar(s). + + +--- + + + +Parameters: +left : 1D array-like +The x coordinates of the left sides of the bars. + +height : 1D array-like +The height of the bars. + +zs : scalar or 1D array-like +Z coordinate of bars; if a single value is specified, it will be +used for all bars. + +zdir : {'x', 'y', 'z'} +When plotting 2D data, the direction to use as z ('x', 'y' or 'z'); +defaults to 'z'. + +**kwargs +Other arguments are forwarded to [matplotlib.axes.Axes.bar](https://matplotlib.org/../api/_as_gen/matplotlib.axes.Axes.bar.html#matplotlib.axes.Axes.bar). + + + + +Returns: +mpl_toolkits.mplot3d.art3d.Patch3DCollection + + + + + + + +Bars3d + +### Quiver + + +``Axes3D.````quiver``(*X*, *Y*, *Z*, *U*, *V*, *W*, */*, *length=1*, *arrow_length_ratio=.3*, *pivot='tail'*, *normalize=False*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.quiver)[¶](#mpl_toolkits.mplot3d.Axes3D.quiver) + +Plot a 3D field of arrows. + +The arguments could be array-like or scalars, so long as they +they can be broadcast together. The arguments can also be +masked arrays. If an element in any of argument is masked, then +that corresponding quiver element will not be plotted. + + +--- + + + +Parameters: +X, Y, Z : array-like +The x, y and z coordinates of the arrow locations (default is +tail of arrow; see pivot kwarg) + +U, V, W : array-like +The x, y and z components of the arrow vectors + +length : float +The length of each quiver, default to 1.0, the unit is +the same with the axes + +arrow_length_ratio : float +The ratio of the arrow head with respect to the quiver, +default to 0.3 + +pivot : {'tail', 'middle', 'tip'} +The part of the arrow that is at the grid point; the arrow +rotates about this point, hence the name pivot. +Default is 'tail' + +normalize : bool +When True, all of the arrows will be the same length. This +defaults to False, where the arrows will be different lengths +depending on the values of u,v,w. + +**kwargs +Any additional keyword arguments are delegated to +[LineCollection](https://matplotlib.org/../api/collections_api.html#matplotlib.collections.LineCollection) + + + + + + + +Quiver3d + +### 2D plots in 3D + +2dcollections3d + +### Text + + +``Axes3D.````text``(*self*, *x*, *y*, *z*, *s*, *zdir=None*, ***kwargs*)[[source]](https://matplotlib.org_modules/mpl_toolkits/mplot3d/axes3d.html#Axes3D.text)[¶](#mpl_toolkits.mplot3d.Axes3D.text) + +Add text to the plot. kwargs will be passed on to Axes.text, +except for the ``zdir`` keyword, which sets the direction to be +used as the z direction. + +Text3d + +### Subplotting + +Having multiple 3D plots in a single figure is the same +as it is for 2D plots. Also, you can have both 2D and 3D plots +in the same figure. + +*New in version 1.0.0:* Subplotting 3D plots was added in v1.0.0. Earlier version can not +do this. + +Subplot3d + +## Download + +- [Download Python source code: mplot3d.py](https://matplotlib.org/_downloads/3227f29a1f1a9db7dd710eac0e54c41e/mplot3d.py) +- [Download Jupyter notebook: mplot3d.ipynb](https://matplotlib.org/_downloads/7105a36ab795ee5c5746ba7f95602c0d/mplot3d.ipynb) + \ No newline at end of file diff --git a/Python/pandas/getting_started/10min.md b/Python/pandas/getting_started/10min.md new file mode 100644 index 00000000..29121716 --- /dev/null +++ b/Python/pandas/getting_started/10min.md @@ -0,0 +1,1407 @@ +--- +meta: + - name: keywords + content: 快速入门pandas + - name: description + content: 本节是帮助 Pandas 新手快速上手的简介。烹饪指南里介绍了更多实用案例。本节以下列方式导入 Pandas 与 NumPy: +--- + +# 十分钟入门 Pandas + +本节是帮助 Pandas 新手快速上手的简介。[烹饪指南](/docs/user_guide/cookbook.html)里介绍了更多实用案例。 + +本节以下列方式导入 Pandas 与 NumPy: + +``` python +In [1]: import numpy as np + +In [2]: import pandas as pd +``` + +## 生成对象 + +详见[数据结构简介](/docs/getting_started/dsintro.html#dsintro)文档。 + +用值列表生成 [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) 时,Pandas 默认自动生成整数索引: + +``` python +In [3]: s = pd.Series([1, 3, 5, np.nan, 6, 8]) + +In [4]: s +Out[4]: +0 1.0 +1 3.0 +2 5.0 +3 NaN +4 6.0 +5 8.0 +dtype: float64 +``` + +用含日期时间索引与标签的 NumPy 数组生成 [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame): + +``` python +In [5]: dates = pd.date_range('20130101', periods=6) + +In [6]: dates +Out[6]: +DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', + '2013-01-05', '2013-01-06'], + dtype='datetime64[ns]', freq='D') + +In [7]: df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD')) + +In [8]: df +Out[8]: + A B C D +2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 +2013-01-04 0.721555 -0.706771 -1.039575 0.271860 +2013-01-05 -0.424972 0.567020 0.276232 -1.087401 +2013-01-06 -0.673690 0.113648 -1.478427 0.524988 +``` + +用 Series 字典对象生成 DataFrame: + +``` python +In [9]: df2 = pd.DataFrame({'A': 1., + ...: 'B': pd.Timestamp('20130102'), + ...: 'C': pd.Series(1, index=list(range(4)), dtype='float32'), + ...: 'D': np.array([3] * 4, dtype='int32'), + ...: 'E': pd.Categorical(["test", "train", "test", "train"]), + ...: 'F': 'foo'}) + ...: + +In [10]: df2 +Out[10]: + A B C D E F +0 1.0 2013-01-02 1.0 3 test foo +1 1.0 2013-01-02 1.0 3 train foo +2 1.0 2013-01-02 1.0 3 test foo +3 1.0 2013-01-02 1.0 3 train foo +``` + +DataFrame 的列有不同[数据类型](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes)。 + +``` python +In [11]: df2.dtypes +Out[11]: +A float64 +B datetime64[ns] +C float32 +D int32 +E category +F object +dtype: object +``` + +IPython支持 tab 键自动补全列名与公共属性。下面是部分可自动补全的属性: + +``` python +In [12]: df2. # noqa: E225, E999 +df2.A df2.bool +df2.abs df2.boxplot +df2.add df2.C +df2.add_prefix df2.clip +df2.add_suffix df2.clip_lower +df2.align df2.clip_upper +df2.all df2.columns +df2.any df2.combine +df2.append df2.combine_first +df2.apply df2.compound +df2.applymap df2.consolidate +df2.D +``` + +列 A、B、C、D 和 E 都可以自动补全;为简洁起见,此处只显示了部分属性。 + +## 查看数据 + +详见[基础用法](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics)文档。 + +下列代码说明如何查看 DataFrame 头部和尾部数据: + +``` python +In [13]: df.head() +Out[13]: + A B C D +2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 +2013-01-04 0.721555 -0.706771 -1.039575 0.271860 +2013-01-05 -0.424972 0.567020 0.276232 -1.087401 + +In [14]: df.tail(3) +Out[14]: + A B C D +2013-01-04 0.721555 -0.706771 -1.039575 0.271860 +2013-01-05 -0.424972 0.567020 0.276232 -1.087401 +2013-01-06 -0.673690 0.113648 -1.478427 0.524988 +``` + +显示索引与列名: + +``` python +In [15]: df.index +Out[15]: +DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', + '2013-01-05', '2013-01-06'], + dtype='datetime64[ns]', freq='D') + +In [16]: df.columns +Out[16]: Index(['A', 'B', 'C', 'D'], dtype='object') +``` + +[DataFrame.to_numpy()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy) 输出底层数据的 NumPy 对象。注意,[DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) 的列由多种数据类型组成时,该操作耗费系统资源较大,这也是 Pandas 和 NumPy 的本质区别:**NumPy 数组只有一种数据类型,DataFrame 每列的数据类型各不相同**。调用 [DataFrame.to_numpy()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy) 时,Pandas 查找支持 DataFrame 里所有数据类型的 NumPy 数据类型。还有一种数据类型是 `object`,可以把 DataFrame 列里的值强制转换为 Python 对象。 + +下面的 `df` 这个 [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) 里的值都是浮点数,[DataFrame.to_numpy()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy) 的操作会很快,而且不复制数据。 + +``` python +In [17]: df.to_numpy() +Out[17]: +array([[ 0.4691, -0.2829, -1.5091, -1.1356], + [ 1.2121, -0.1732, 0.1192, -1.0442], + [-0.8618, -2.1046, -0.4949, 1.0718], + [ 0.7216, -0.7068, -1.0396, 0.2719], + [-0.425 , 0.567 , 0.2762, -1.0874], + [-0.6737, 0.1136, -1.4784, 0.525 ]]) +``` + +`df2` 这个 [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) 包含了多种类型,[DataFrame.to_numpy()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy) 操作就会耗费较多资源。 + +``` python +In [18]: df2.to_numpy() +Out[18]: +array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'], + [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'], + [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'], + [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']], dtype=object) +``` + +::: tip 提醒 + +[DataFrame.to_numpy()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy) 的输出不包含行索引和列标签。 + +::: + +[describe()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html#pandas.DataFrame.describe) 可以快速查看数据的统计摘要: + +``` python +In [19]: df.describe() +Out[19]: + A B C D +count 6.000000 6.000000 6.000000 6.000000 +mean 0.073711 -0.431125 -0.687758 -0.233103 +std 0.843157 0.922818 0.779887 0.973118 +min -0.861849 -2.104569 -1.509059 -1.135632 +25% -0.611510 -0.600794 -1.368714 -1.076610 +50% 0.022070 -0.228039 -0.767252 -0.386188 +75% 0.658444 0.041933 -0.034326 0.461706 +max 1.212112 0.567020 0.276232 1.071804 +``` + +转置数据: + +``` python +In [20]: df.T +Out[20]: + 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06 +A 0.469112 1.212112 -0.861849 0.721555 -0.424972 -0.673690 +B -0.282863 -0.173215 -2.104569 -0.706771 0.567020 0.113648 +C -1.509059 0.119209 -0.494929 -1.039575 0.276232 -1.478427 +D -1.135632 -1.044236 1.071804 0.271860 -1.087401 0.524988 +``` + +按轴排序: + +``` python +In [21]: df.sort_index(axis=1, ascending=False) +Out[21]: + D C B A +2013-01-01 -1.135632 -1.509059 -0.282863 0.469112 +2013-01-02 -1.044236 0.119209 -0.173215 1.212112 +2013-01-03 1.071804 -0.494929 -2.104569 -0.861849 +2013-01-04 0.271860 -1.039575 -0.706771 0.721555 +2013-01-05 -1.087401 0.276232 0.567020 -0.424972 +2013-01-06 0.524988 -1.478427 0.113648 -0.673690 +``` + +按值排序: + +``` python +In [22]: df.sort_values(by='B') +Out[22]: + A B C D +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 +2013-01-04 0.721555 -0.706771 -1.039575 0.271860 +2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 +2013-01-06 -0.673690 0.113648 -1.478427 0.524988 +2013-01-05 -0.424972 0.567020 0.276232 -1.087401 +``` + +## 选择 + +::: tip 提醒 + +选择、设置标准 Python / Numpy 的表达式已经非常直观,交互也很方便,但对于生产代码,我们还是推荐优化过的 Pandas 数据访问方法:`.at`、`.iat`、`.loc` 和 `.iloc`。 + +::: + +详见[索引与选择数据](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing)、[多层索引与高级索引](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced)文档。 + +### 获取数据 + +选择单列,产生 `Series`,与 `df.A` 等效: + +``` python +In [23]: df['A'] +Out[23]: +2013-01-01 0.469112 +2013-01-02 1.212112 +2013-01-03 -0.861849 +2013-01-04 0.721555 +2013-01-05 -0.424972 +2013-01-06 -0.673690 +Freq: D, Name: A, dtype: float64 +``` + +用 [ ] 切片行: + +``` python +In [24]: df[0:3] +Out[24]: + A B C D +2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 + +In [25]: df['20130102':'20130104'] +Out[25]: + A B C D +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 +2013-01-04 0.721555 -0.706771 -1.039575 0.271860 +``` + +### 按标签选择 + +详见[按标签选择](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-label)。 + +用标签提取一行数据: + +``` python +In [26]: df.loc[dates[0]] +Out[26]: +A 0.469112 +B -0.282863 +C -1.509059 +D -1.135632 +Name: 2013-01-01 00:00:00, dtype: float64 +``` + +用标签选择多列数据: + +``` python +In [27]: df.loc[:, ['A', 'B']] +Out[27]: + A B +2013-01-01 0.469112 -0.282863 +2013-01-02 1.212112 -0.173215 +2013-01-03 -0.861849 -2.104569 +2013-01-04 0.721555 -0.706771 +2013-01-05 -0.424972 0.567020 +2013-01-06 -0.673690 0.113648 +``` + +用标签切片,包含行与列结束点: + +``` python +In [28]: df.loc['20130102':'20130104', ['A', 'B']] +Out[28]: + A B +2013-01-02 1.212112 -0.173215 +2013-01-03 -0.861849 -2.104569 +2013-01-04 0.721555 -0.706771 +``` + +返回对象降维: + +``` python +In [29]: df.loc['20130102', ['A', 'B']] +Out[29]: +A 1.212112 +B -0.173215 +Name: 2013-01-02 00:00:00, dtype: float64 +``` + +提取标量值: + +``` python +In [30]: df.loc[dates[0], 'A'] +Out[30]: 0.46911229990718628 +``` + +快速访问标量,与上述方法等效: + +``` python +In [31]: df.at[dates[0], 'A'] +Out[31]: 0.46911229990718628 +``` + +### 按位置选择 + +详见[按位置选择](http://Pandas.pydata.org/Pandas-docs/stable/indexing.html#indexing-integer)。 + +用整数位置选择: + +``` python +In [32]: df.iloc[3] +Out[32]: +A 0.721555 +B -0.706771 +C -1.039575 +D 0.271860 +Name: 2013-01-04 00:00:00, dtype: float64 +``` + +类似 NumPy / Python,用整数切片: + +``` python +In [33]: df.iloc[3:5, 0:2] +Out[33]: + A B +2013-01-04 0.721555 -0.706771 +2013-01-05 -0.424972 0.567020 +``` + +类似 NumPy / Python,用整数列表按位置切片: + +``` python +In [34]: df.iloc[[1, 2, 4], [0, 2]] +Out[34]: + A C +2013-01-02 1.212112 0.119209 +2013-01-03 -0.861849 -0.494929 +2013-01-05 -0.424972 0.276232 +``` + +显式整行切片: + +``` python +In [35]: df.iloc[1:3, :] +Out[35]: + A B C D +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 +``` + +显式整列切片: + +``` python +In [36]: df.iloc[:, 1:3] +Out[36]: + B C +2013-01-01 -0.282863 -1.509059 +2013-01-02 -0.173215 0.119209 +2013-01-03 -2.104569 -0.494929 +2013-01-04 -0.706771 -1.039575 +2013-01-05 0.567020 0.276232 +2013-01-06 0.113648 -1.478427 +``` + +显式提取值: + +``` python +In [37]: df.iloc[1, 1] +Out[37]: -0.17321464905330858 +``` + +快速访问标量,与上述方法等效: + +``` python +In [38]: df.iat[1, 1] +Out[38]: -0.17321464905330858 +``` + +### 布尔索引 + +用单列的值选择数据: + +``` python +In [39]: df[df.A > 0] +Out[39]: + A B C D +2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 +2013-01-04 0.721555 -0.706771 -1.039575 0.271860 +``` + +选择 DataFrame 里满足条件的值: + +``` python +In [40]: df[df > 0] +Out[40]: + A B C D +2013-01-01 0.469112 NaN NaN NaN +2013-01-02 1.212112 NaN 0.119209 NaN +2013-01-03 NaN NaN NaN 1.071804 +2013-01-04 0.721555 NaN NaN 0.271860 +2013-01-05 NaN 0.567020 0.276232 NaN +2013-01-06 NaN 0.113648 NaN 0.524988 +``` + +用 [isin()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html#pandas.Series.isin) 筛选: + +``` python +In [41]: df2 = df.copy() + +In [42]: df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three'] + +In [43]: df2 +Out[43]: + A B C D E +2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 one +2013-01-02 1.212112 -0.173215 0.119209 -1.044236 one +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 two +2013-01-04 0.721555 -0.706771 -1.039575 0.271860 three +2013-01-05 -0.424972 0.567020 0.276232 -1.087401 four +2013-01-06 -0.673690 0.113648 -1.478427 0.524988 three + +In [44]: df2[df2['E'].isin(['two', 'four'])] +Out[44]: + A B C D E +2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 two +2013-01-05 -0.424972 0.567020 0.276232 -1.087401 four +``` + +### 赋值 + +用索引自动对齐新增列的数据: + +``` python +In [45]: s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20130102', periods=6)) + +In [46]: s1 +Out[46]: +2013-01-02 1 +2013-01-03 2 +2013-01-04 3 +2013-01-05 4 +2013-01-06 5 +2013-01-07 6 +Freq: D, dtype: int64 + +In [47]: df['F'] = s1 +``` + +按标签赋值: + +``` python +In [48]: df.at[dates[0], 'A'] = 0 +``` + +按位置赋值: + +``` python +In [49]: df.iat[0, 1] = 0 +``` + +按 NumPy 数组赋值: + +``` python +In [50]: df.loc[:, 'D'] = np.array([5] * len(df)) +``` + +上述赋值结果: + +``` python +In [51]: df +Out[51]: + A B C D F +2013-01-01 0.000000 0.000000 -1.509059 5 NaN +2013-01-02 1.212112 -0.173215 0.119209 5 1.0 +2013-01-03 -0.861849 -2.104569 -0.494929 5 2.0 +2013-01-04 0.721555 -0.706771 -1.039575 5 3.0 +2013-01-05 -0.424972 0.567020 0.276232 5 4.0 +2013-01-06 -0.673690 0.113648 -1.478427 5 5.0 +``` + +用 `where` 条件赋值: + +``` python +In [52]: df2 = df.copy() + +In [53]: df2[df2 > 0] = -df2 + +In [54]: df2 +Out[54]: + A B C D F +2013-01-01 0.000000 0.000000 -1.509059 -5 NaN +2013-01-02 -1.212112 -0.173215 -0.119209 -5 -1.0 +2013-01-03 -0.861849 -2.104569 -0.494929 -5 -2.0 +2013-01-04 -0.721555 -0.706771 -1.039575 -5 -3.0 +2013-01-05 -0.424972 -0.567020 -0.276232 -5 -4.0 +2013-01-06 -0.673690 -0.113648 -1.478427 -5 -5.0 +``` + +## 缺失值 + +Pandas 主要用 `np.nan` 表示缺失数据。 计算时,默认不包含空值。详见[缺失数据](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data)。 + +重建索引(reindex)可以更改、添加、删除指定轴的索引,并返回数据副本,即不更改原数据。 + +``` python +In [55]: df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E']) + +In [56]: df1.loc[dates[0]:dates[1], 'E'] = 1 + +In [57]: df1 +Out[57]: + A B C D F E +2013-01-01 0.000000 0.000000 -1.509059 5 NaN 1.0 +2013-01-02 1.212112 -0.173215 0.119209 5 1.0 1.0 +2013-01-03 -0.861849 -2.104569 -0.494929 5 2.0 NaN +2013-01-04 0.721555 -0.706771 -1.039575 5 3.0 NaN +``` + +删除所有含缺失值的行: + +``` python +In [58]: df1.dropna(how='any') +Out[58]: + A B C D F E +2013-01-02 1.212112 -0.173215 0.119209 5 1.0 1.0 +``` + +填充缺失值: + +``` python +In [59]: df1.fillna(value=5) +Out[59]: + A B C D F E +2013-01-01 0.000000 0.000000 -1.509059 5 5.0 1.0 +2013-01-02 1.212112 -0.173215 0.119209 5 1.0 1.0 +2013-01-03 -0.861849 -2.104569 -0.494929 5 2.0 5.0 +2013-01-04 0.721555 -0.706771 -1.039575 5 3.0 5.0 +``` + +提取 `nan` 值的布尔掩码: + +``` python +In [60]: pd.isna(df1) +Out[60]: + A B C D F E +2013-01-01 False False False False True False +2013-01-02 False False False False False False +2013-01-03 False False False False False True +2013-01-04 False False False False False True +``` + +## 运算 + +详见[二进制操作](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-binop)。 + +### 统计 + +一般情况下,运算时**排除**缺失值。 + +描述性统计: + +``` python +In [61]: df.mean() +Out[61]: +A -0.004474 +B -0.383981 +C -0.687758 +D 5.000000 +F 3.000000 +dtype: float64 +``` + +在另一个轴(即,行)上执行同样的操作: + +``` python +In [62]: df.mean(1) +Out[62]: +2013-01-01 0.872735 +2013-01-02 1.431621 +2013-01-03 0.707731 +2013-01-04 1.395042 +2013-01-05 1.883656 +2013-01-06 1.592306 +Freq: D, dtype: float64 +``` + +不同维度对象运算时,要先对齐。 此外,Pandas 自动沿指定维度广播。 + +``` python +In [63]: s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2) + +In [64]: s +Out[64]: +2013-01-01 NaN +2013-01-02 NaN +2013-01-03 1.0 +2013-01-04 3.0 +2013-01-05 5.0 +2013-01-06 NaN +Freq: D, dtype: float64 + +In [65]: df.sub(s, axis='index') +Out[65]: + A B C D F +2013-01-01 NaN NaN NaN NaN NaN +2013-01-02 NaN NaN NaN NaN NaN +2013-01-03 -1.861849 -3.104569 -1.494929 4.0 1.0 +2013-01-04 -2.278445 -3.706771 -4.039575 2.0 0.0 +2013-01-05 -5.424972 -4.432980 -4.723768 0.0 -1.0 +2013-01-06 NaN NaN NaN NaN NaN +``` + +### Apply 函数 + +Apply 函数处理数据: + +``` python +In [66]: df.apply(np.cumsum) +Out[66]: + A B C D F +2013-01-01 0.000000 0.000000 -1.509059 5 NaN +2013-01-02 1.212112 -0.173215 -1.389850 10 1.0 +2013-01-03 0.350263 -2.277784 -1.884779 15 3.0 +2013-01-04 1.071818 -2.984555 -2.924354 20 6.0 +2013-01-05 0.646846 -2.417535 -2.648122 25 10.0 +2013-01-06 -0.026844 -2.303886 -4.126549 30 15.0 + +In [67]: df.apply(lambda x: x.max() - x.min()) +Out[67]: +A 2.073961 +B 2.671590 +C 1.785291 +D 0.000000 +F 4.000000 +dtype: float64 +``` + +### 直方图 + +详见[直方图与离散化](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-discretization)。 + +``` python +In [68]: s = pd.Series(np.random.randint(0, 7, size=10)) + +In [69]: s +Out[69]: +0 4 +1 2 +2 1 +3 2 +4 6 +5 4 +6 4 +7 6 +8 4 +9 4 +dtype: int64 + +In [70]: s.value_counts() +Out[70]: +4 5 +6 2 +2 2 +1 1 +dtype: int64 +``` + +### 字符串方法 + +Series 的 `str` 属性包含一组字符串处理功能,如下列代码所示。注意,`str` 的模式匹配默认使用[正则表达式](https://docs.python.org/3/library/re.html)。详见[矢量字符串方法](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#text-string-methods)。 + +``` python +In [71]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) + +In [72]: s.str.lower() +Out[72]: +0 a +1 b +2 c +3 aaba +4 baca +5 NaN +6 caba +7 dog +8 cat +dtype: object +``` + +## 合并(Merge) + +### 结合(Concat) + +Pandas 提供了多种将 Series、DataFrame 对象组合在一起的功能,用索引与关联代数功能的多种设置逻辑可执行连接(join)与合并(merge)操作。 + +详见[合并](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging)。 + +[`concat()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html#pandas.concat) 用于连接 Pandas 对象: + +``` python +In [73]: df = pd.DataFrame(np.random.randn(10, 4)) + +In [74]: df +Out[74]: + 0 1 2 3 +0 -0.548702 1.467327 -1.015962 -0.483075 +1 1.637550 -1.217659 -0.291519 -1.745505 +2 -0.263952 0.991460 -0.919069 0.266046 +3 -0.709661 1.669052 1.037882 -1.705775 +4 -0.919854 -0.042379 1.247642 -0.009920 +5 0.290213 0.495767 0.362949 1.548106 +6 -1.131345 -0.089329 0.337863 -0.945867 +7 -0.932132 1.956030 0.017587 -0.016692 +8 -0.575247 0.254161 -1.143704 0.215897 +9 1.193555 -0.077118 -0.408530 -0.862495 + +# 分解为多组 +In [75]: pieces = [df[:3], df[3:7], df[7:]] + +In [76]: pd.concat(pieces) +Out[76]: + 0 1 2 3 +0 -0.548702 1.467327 -1.015962 -0.483075 +1 1.637550 -1.217659 -0.291519 -1.745505 +2 -0.263952 0.991460 -0.919069 0.266046 +3 -0.709661 1.669052 1.037882 -1.705775 +4 -0.919854 -0.042379 1.247642 -0.009920 +5 0.290213 0.495767 0.362949 1.548106 +6 -1.131345 -0.089329 0.337863 -0.945867 +7 -0.932132 1.956030 0.017587 -0.016692 +8 -0.575247 0.254161 -1.143704 0.215897 +9 1.193555 -0.077118 -0.408530 -0.862495 +``` + +### 连接(join) + +SQL 风格的合并。 详见[数据库风格连接](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging-join)。 + +``` python +In [77]: left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]}) + +In [78]: right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]}) + +In [79]: left +Out[79]: + key lval +0 foo 1 +1 foo 2 + +In [80]: right +Out[80]: + key rval +0 foo 4 +1 foo 5 + +In [81]: pd.merge(left, right, on='key') +Out[81]: + key lval rval +0 foo 1 4 +1 foo 1 5 +2 foo 2 4 +3 foo 2 5 +``` + +这里还有一个例子: + +``` python +In [82]: left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]}) + +In [83]: right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]}) + +In [84]: left +Out[84]: + key lval +0 foo 1 +1 bar 2 + +In [85]: right +Out[85]: + key rval +0 foo 4 +1 bar 5 + +In [86]: pd.merge(left, right, on='key') +Out[86]: + key lval rval +0 foo 1 4 +1 bar 2 5 +``` + +### 追加(Append) + +为 DataFrame 追加行。详见[追加](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging-concatenation)文档。 + +``` python +In [87]: df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D']) + +In [88]: df +Out[88]: + A B C D +0 1.346061 1.511763 1.627081 -0.990582 +1 -0.441652 1.211526 0.268520 0.024580 +2 -1.577585 0.396823 -0.105381 -0.532532 +3 1.453749 1.208843 -0.080952 -0.264610 +4 -0.727965 -0.589346 0.339969 -0.693205 +5 -0.339355 0.593616 0.884345 1.591431 +6 0.141809 0.220390 0.435589 0.192451 +7 -0.096701 0.803351 1.715071 -0.708758 + +In [89]: s = df.iloc[3] + +In [90]: df.append(s, ignore_index=True) +Out[90]: + A B C D +0 1.346061 1.511763 1.627081 -0.990582 +1 -0.441652 1.211526 0.268520 0.024580 +2 -1.577585 0.396823 -0.105381 -0.532532 +3 1.453749 1.208843 -0.080952 -0.264610 +4 -0.727965 -0.589346 0.339969 -0.693205 +5 -0.339355 0.593616 0.884345 1.591431 +6 0.141809 0.220390 0.435589 0.192451 +7 -0.096701 0.803351 1.715071 -0.708758 +8 1.453749 1.208843 -0.080952 -0.264610 +``` + +## 分组(Grouping) + +“group by” 指的是涵盖下列一项或多项步骤的处理流程: + +* **分割**:按条件把数据分割成多组; +* **应用**:为每组单独应用函数; +* **组合**:将处理结果组合成一个数据结构。 + +详见[分组](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby)。 + +``` python +In [91]: df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', + ....: 'foo', 'bar', 'foo', 'foo'], + ....: 'B': ['one', 'one', 'two', 'three', + ....: 'two', 'two', 'one', 'three'], + ....: 'C': np.random.randn(8), + ....: 'D': np.random.randn(8)}) + ....: + +In [92]: df +Out[92]: + A B C D +0 foo one -1.202872 -0.055224 +1 bar one -1.814470 2.395985 +2 foo two 1.018601 1.552825 +3 bar three -0.595447 0.166599 +4 foo two 1.395433 0.047609 +5 bar two -0.392670 -0.136473 +6 foo one 0.007207 -0.561757 +7 foo three 1.928123 -1.623033 +``` + +先分组,再用 [`sum()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html#pandas.DataFrame.sum)函数计算每组的汇总数据: + +``` python +In [93]: df.groupby('A').sum() +Out[93]: + C D +A +bar -2.802588 2.42611 +foo 3.146492 -0.63958 +``` + +多列分组后,生成多层索引,也可以应用 `sum` 函数: + +``` python +In [94]: df.groupby(['A', 'B']).sum() +Out[94]: + C D +A B +bar one -1.814470 2.395985 + three -0.595447 0.166599 + two -0.392670 -0.136473 +foo one -1.195665 -0.616981 + three 1.928123 -1.623033 + two 2.414034 1.600434 +``` + +## 重塑(Reshaping) + +详见[多层索引](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-hierarchical)与[重塑](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#reshaping-stacking)。 + +### 堆叠(Stack) + +``` python +In [95]: tuples = list(zip(*[['bar', 'bar', 'baz', 'baz', + ....: 'foo', 'foo', 'qux', 'qux'], + ....: ['one', 'two', 'one', 'two', + ....: 'one', 'two', 'one', 'two']])) + ....: + +In [96]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) + +In [97]: df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B']) + +In [98]: df2 = df[:4] + +In [99]: df2 +Out[99]: + A B +first second +bar one 0.029399 -0.542108 + two 0.282696 -0.087302 +baz one -1.575170 1.771208 + two 0.816482 1.100230 +``` + +[`stack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack)方法把 DataFrame 列压缩至一层: + +``` python +In [100]: stacked = df2.stack() + +In [101]: stacked +Out[101]: +first second + B -0.542108 + two A 0.282696 + B -0.087302 +baz one A -1.575170 + B 1.771208 + two A 0.816482 + B 1.100230 +dtype: float64 +``` + +**压缩**后的 DataFrame 或 Series 具有多层索引, [`stack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack) 的逆操作是 [`unstack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html#pandas.DataFrame.unstack),默认为拆叠最后一层: + +``` python +In [102]: stacked.unstack() +Out[102]: + A B +first second +bar one 0.029399 -0.542108 + two 0.282696 -0.087302 +baz one -1.575170 1.771208 + two 0.816482 1.100230 + +In [103]: stacked.unstack(1) +Out[103]: +second one two +first +bar A 0.029399 0.282696 + B -0.542108 -0.087302 +baz A -1.575170 0.816482 + B 1.771208 1.100230 + +In [104]: stacked.unstack(0) +Out[104]: +first bar baz +second +one A 0.029399 -1.575170 + B -0.542108 1.771208 +two A 0.282696 0.816482 + B -0.087302 1.100230 +``` + +## 数据透视表(Pivot Tables) + +详见[数据透视表](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#reshaping-pivot)。 + +``` python +In [105]: df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 3, + .....: 'B': ['A', 'B', 'C'] * 4, + .....: 'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 2, + .....: 'D': np.random.randn(12), + .....: 'E': np.random.randn(12)}) + .....: + +In [106]: df +Out[106]: + A B C D E +0 one A foo 1.418757 -0.179666 +1 one B foo -1.879024 1.291836 +2 two C foo 0.536826 -0.009614 +3 three A bar 1.006160 0.392149 +4 one B bar -0.029716 0.264599 +5 one C bar -1.146178 -0.057409 +6 two A foo 0.100900 -1.425638 +7 three B foo -1.035018 1.024098 +8 one C foo 0.314665 -0.106062 +9 one A bar -0.773723 1.824375 +10 two B bar -1.170653 0.595974 +11 three C bar 0.648740 1.167115 +``` + +用上述数据生成数据透视表非常简单: + +``` python +In [107]: pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C']) +Out[107]: +C bar foo +A B +one A -0.773723 1.418757 + B -0.029716 -1.879024 + C -1.146178 0.314665 +three A 1.006160 NaN + B NaN -1.035018 + C 0.648740 NaN +two A NaN 0.100900 + B -1.170653 NaN + C NaN 0.536826 +``` + +## 时间序列(TimeSeries) + +Pandas 为频率转换时重采样提供了虽然简单易用,但强大高效的功能,如,将秒级的数据转换为 5 分钟为频率的数据。这种操作常见于财务应用程序,但又不仅限于此。详见[时间序列](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries)。 + +``` python +In [108]: rng = pd.date_range('1/1/2012', periods=100, freq='S') + +In [109]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng) + +In [110]: ts.resample('5Min').sum() +Out[110]: +2012-01-01 25083 +Freq: 5T, dtype: int64 +``` + +时区表示: + +``` python +In [111]: rng = pd.date_range('3/6/2012 00:00', periods=5, freq='D') + +In [112]: ts = pd.Series(np.random.randn(len(rng)), rng) + +In [113]: ts +Out[113]: +2012-03-06 0.464000 +2012-03-07 0.227371 +2012-03-08 -0.496922 +2012-03-09 0.306389 +2012-03-10 -2.290613 +Freq: D, dtype: float64 + +In [114]: ts_utc = ts.tz_localize('UTC') + +In [115]: ts_utc +Out[115]: +2012-03-06 00:00:00+00:00 0.464000 +2012-03-07 00:00:00+00:00 0.227371 +2012-03-08 00:00:00+00:00 -0.496922 +2012-03-09 00:00:00+00:00 0.306389 +2012-03-10 00:00:00+00:00 -2.290613 +Freq: D, dtype: float64 +``` + +转换成其它时区: + +``` python +In [116]: ts_utc.tz_convert('US/Eastern') +Out[116]: +2012-03-05 19:00:00-05:00 0.464000 +2012-03-06 19:00:00-05:00 0.227371 +2012-03-07 19:00:00-05:00 -0.496922 +2012-03-08 19:00:00-05:00 0.306389 +2012-03-09 19:00:00-05:00 -2.290613 +Freq: D, dtype: float64 +``` + +转换时间段: + +``` python +In [117]: rng = pd.date_range('1/1/2012', periods=5, freq='M') + +In [118]: ts = pd.Series(np.random.randn(len(rng)), index=rng) + +In [119]: ts +Out[119]: +2012-01-31 -1.134623 +2012-02-29 -1.561819 +2012-03-31 -0.260838 +2012-04-30 0.281957 +2012-05-31 1.523962 +Freq: M, dtype: float64 + +In [120]: ps = ts.to_period() + +In [121]: ps +Out[121]: +2012-01 -1.134623 +2012-02 -1.561819 +2012-03 -0.260838 +2012-04 0.281957 +2012-05 1.523962 +Freq: M, dtype: float64 + +In [122]: ps.to_timestamp() +Out[122]: +2012-01-01 -1.134623 +2012-02-01 -1.561819 +2012-03-01 -0.260838 +2012-04-01 0.281957 +2012-05-01 1.523962 +Freq: MS, dtype: float64 +``` + +Pandas 函数可以很方便地转换时间段与时间戳。下例把以 11 月为结束年份的季度频率转换为下一季度月末上午 9 点: + +``` python +In [123]: prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV') + +In [124]: ts = pd.Series(np.random.randn(len(prng)), prng) + +In [125]: ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9 + +In [126]: ts.head() +Out[126]: +1990-03-01 09:00 -0.902937 +1990-06-01 09:00 0.068159 +1990-09-01 09:00 -0.057873 +1990-12-01 09:00 -0.368204 +1991-03-01 09:00 -1.144073 +Freq: H, dtype: float64 +``` + +## 类别型(Categoricals) + +Pandas 的 DataFrame 里可以包含类别数据。完整文档详见[类别简介](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html#categorical) 和 [API 文档](https://pandas.pydata.org/pandas-docs/stable/reference/arrays.html#api-arrays-categorical)。 + +``` python +In [127]: df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6], + .....: "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']}) + .....: +``` + +将 `grade` 的原生数据转换为类别型数据: + +``` python +In [128]: df["grade"] = df["raw_grade"].astype("category") + +In [129]: df["grade"] +Out[129]: +0 a +1 b +2 b +3 a +4 a +5 e +Name: grade, dtype: category +Categories (3, object): [a, b, e] +``` + +用有含义的名字重命名不同类型,调用 `Series.cat.categories`。 + +``` python +In [130]: df["grade"].cat.categories = ["very good", "good", "very bad"] +``` + +重新排序各类别,并添加缺失类,`Series.cat` 的方法默认返回新 `Series`。 + +``` python +In [131]: df["grade"] = df["grade"].cat.set_categories(["very bad", "bad", "medium", + .....: "good", "very good"]) + .....: + +In [132]: df["grade"] +Out[132]: +0 very good +1 good +2 good +3 very good +4 very good +5 very bad +Name: grade, dtype: category +Categories (5, object): [very bad, bad, medium, good, very good] +``` + +注意,这里是按生成类别时的顺序排序,不是按词汇排序: + +``` python +In [133]: df.sort_values(by="grade") +Out[133]: + id raw_grade grade +5 6 e very bad +1 2 b good +2 3 b good +0 1 a very good +3 4 a very good +4 5 a very good +``` + +按类列分组(groupby)时,即便某类别为空,也会显示: + +``` python +In [134]: df.groupby("grade").size() +Out[134]: +grade +very bad 1 +bad 0 +medium 0 +good 2 +very good 3 +dtype: int64 +``` + +## 可视化 + +详见[可视化](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#visualization)文档。 + +``` python +In [135]: ts = pd.Series(np.random.randn(1000), + .....: index=pd.date_range('1/1/2000', periods=1000)) + .....: + +In [136]: ts = ts.cumsum() + +In [137]: ts.plot() +Out[137]: +``` + +![可视化](https://static.pypandas.cn/public/static/images/series_plot_basic.png) + +DataFrame 的 [plot()](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#visualization) 方法可以快速绘制所有带标签的列: + +``` python +In [138]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index, + .....: columns=['A', 'B', 'C', 'D']) + .....: + +In [139]: df = df.cumsum() + +In [140]: plt.figure() +Out[140]:
+ +In [141]: df.plot() +Out[141]: + +In [142]: plt.legend(loc='best') +Out[142]: +``` + +![可视化2](https://static.pypandas.cn/public/static/images/frame_plot_basic.png) + +## 数据输入 / 输出 + +### CSV + +[写入 CSV 文件](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-store-in-csv)。 + +``` python +In [143]: df.to_csv('foo.csv') +``` + +读取 CSV 文件数据: + +``` python +In [144]: pd.read_csv('foo.csv') +Out[144]: + Unnamed: 0 A B C D +0 2000-01-01 0.266457 -0.399641 -0.219582 1.186860 +1 2000-01-02 -1.170732 -0.345873 1.653061 -0.282953 +2 2000-01-03 -1.734933 0.530468 2.060811 -0.515536 +3 2000-01-04 -1.555121 1.452620 0.239859 -1.156896 +4 2000-01-05 0.578117 0.511371 0.103552 -2.428202 +5 2000-01-06 0.478344 0.449933 -0.741620 -1.962409 +6 2000-01-07 1.235339 -0.091757 -1.543861 -1.084753 +.. ... ... ... ... ... +993 2002-09-20 -10.628548 -9.153563 -7.883146 28.313940 +994 2002-09-21 -10.390377 -8.727491 -6.399645 30.914107 +995 2002-09-22 -8.985362 -8.485624 -4.669462 31.367740 +996 2002-09-23 -9.558560 -8.781216 -4.499815 30.518439 +997 2002-09-24 -9.902058 -9.340490 -4.386639 30.105593 +998 2002-09-25 -10.216020 -9.480682 -3.933802 29.758560 +999 2002-09-26 -11.856774 -10.671012 -3.216025 29.369368 + +[1000 rows x 5 columns] +``` + +### HDF5 + +详见 [HDFStores](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-hdf5) 文档。 + +写入 HDF5 Store: + +``` python +In [145]: df.to_hdf('foo.h5', 'df') +``` + +读取 HDF5 Store: + +``` python +In [146]: pd.read_hdf('foo.h5', 'df') +Out[146]: + A B C D +2000-01-01 0.266457 -0.399641 -0.219582 1.186860 +2000-01-02 -1.170732 -0.345873 1.653061 -0.282953 +2000-01-03 -1.734933 0.530468 2.060811 -0.515536 +2000-01-04 -1.555121 1.452620 0.239859 -1.156896 +2000-01-05 0.578117 0.511371 0.103552 -2.428202 +2000-01-06 0.478344 0.449933 -0.741620 -1.962409 +2000-01-07 1.235339 -0.091757 -1.543861 -1.084753 +... ... ... ... ... +2002-09-20 -10.628548 -9.153563 -7.883146 28.313940 +2002-09-21 -10.390377 -8.727491 -6.399645 30.914107 +2002-09-22 -8.985362 -8.485624 -4.669462 31.367740 +2002-09-23 -9.558560 -8.781216 -4.499815 30.518439 +2002-09-24 -9.902058 -9.340490 -4.386639 30.105593 +2002-09-25 -10.216020 -9.480682 -3.933802 29.758560 +2002-09-26 -11.856774 -10.671012 -3.216025 29.369368 + +[1000 rows x 4 columns] +``` + +### Excel + +详见 [Excel](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-excel) 文档。 + +写入 Excel 文件: + +``` python +In [147]: df.to_excel('foo.xlsx', sheet_name='Sheet1') +``` + +读取 Excel 文件: + +``` python +In [148]: pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA']) +Out[148]: + Unnamed: 0 A B C D +0 2000-01-01 0.266457 -0.399641 -0.219582 1.186860 +1 2000-01-02 -1.170732 -0.345873 1.653061 -0.282953 +2 2000-01-03 -1.734933 0.530468 2.060811 -0.515536 +3 2000-01-04 -1.555121 1.452620 0.239859 -1.156896 +4 2000-01-05 0.578117 0.511371 0.103552 -2.428202 +5 2000-01-06 0.478344 0.449933 -0.741620 -1.962409 +6 2000-01-07 1.235339 -0.091757 -1.543861 -1.084753 +.. ... ... ... ... ... +993 2002-09-20 -10.628548 -9.153563 -7.883146 28.313940 +994 2002-09-21 -10.390377 -8.727491 -6.399645 30.914107 +995 2002-09-22 -8.985362 -8.485624 -4.669462 31.367740 +996 2002-09-23 -9.558560 -8.781216 -4.499815 30.518439 +997 2002-09-24 -9.902058 -9.340490 -4.386639 30.105593 +998 2002-09-25 -10.216020 -9.480682 -3.933802 29.758560 +999 2002-09-26 -11.856774 -10.671012 -3.216025 29.369368 + +[1000 rows x 5 columns] +``` + +## 各种坑(Gotchas) + +执行某些操作,将触发异常,如: + +``` python +>>> if pd.Series([False, True, False]): +... print("I was true") +Traceback + ... +ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all(). +``` + +参阅[比较操作](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-compare)文档,查看错误提示与解决方案。 + +详见[各种坑](https://pandas.pydata.org/Pandas-docs/stable/gotchas.html#gotchas)文档。 \ No newline at end of file diff --git a/Python/pandas/getting_started/README.md b/Python/pandas/getting_started/README.md new file mode 100644 index 00000000..bb05e875 --- /dev/null +++ b/Python/pandas/getting_started/README.md @@ -0,0 +1,59 @@ +--- +meta: + - name: keywords + content: Pandas快速入门 + - name: description + content: 这里是Pandas快速入门的中文文档目录结构。 +--- + +# 快速入门 + +- [Pandas概览](overview.html) + - [数据结构](overview.html#data-structures) + - [大小可变与数据复制](overview.html#mutability-and-copying-of-data) + - [获得支持](overview.html#getting-support) + - [社区](overview.html#community) + - [项目监管](overview.html#project-governance) + - [开发团队](overview.html#development-team) + - [机构合作伙伴](overview.html#institutional-partners) + - [许可协议](overview.html#license) +- [十分钟入门Pandas](10min.html) + - [对象创建](10min.html#object-creation) + - [查看数据](10min.html#viewing-data) + - [选择](10min.html#selection) + - [缺失值](10min.html#missing-data) + - [操作](10min.html#operations) + - [合并(Merge)](10min.html#merge) + - [分组(Grouping)](10min.html#grouping) + - [重塑(Reshaping)](10min.html#reshaping) + - [时间序列(Time Series)](10min.html#time-series) + - [分类](10min.html#categoricals) + - [绘图](10min.html#plotting) + - [数据输入 / 输出](10min.html#getting-data-in-out) + - [坑(Gotchas)](10min.html#gotchas) +- [基础用法](basics.html) + - [Head 与 Tail](basics.html#head-and-tail) + - [属性与底层数据](basics.html#attributes-and-underlying-data) + - [加速操作](basics.html#accelerated-operations) + - [二进制操作](basics.html#flexible-binary-operations) + - [描述性统计](basics.html#descriptive-statistics) + - [函数](basics.html#function-application) + - [重置索引与更换标签](basics.html#reindexing-and-altering-labels) + - [迭代](basics.html#iteration) + - [.dt 访问器](basics.html#dt-accessor) + - [矢量化字符串方法](basics.html#vectorized-string-methods) + - [排序](basics.html#sorting) + - [复制](basics.html#copying) + - [数据类型](basics.html#dtypes) + - [基于 `dtype` 选择列](basics.html#selecting-columns-based-on-dtype) +- [数据结构简介](dsintro.html) + - [Series](dsintro.html#series) + - [DataFrame](dsintro.html#dataframe) +- [与其它工具比较](comparison.html) + - [与 R 语言比较](comparison.html#comparison-with-r-r-libraries) + - [与 SQL 比较](comparison.html#comparison-with-sql) + - [与 SAS 比较](comparison.html#comparison-with-sas) + - [与 Stata 比较](comparison.html#comparison-with-stata) +- [教程资料](tutorials.html) + - [官方指南](tutorials.html#internal-guides) + - [社区指南](tutorials.html#community-guides) \ No newline at end of file diff --git a/Python/pandas/getting_started/basics.md b/Python/pandas/getting_started/basics.md new file mode 100644 index 00000000..1e17d912 --- /dev/null +++ b/Python/pandas/getting_started/basics.md @@ -0,0 +1,3828 @@ +--- +meta: + - name: keywords + content: Pandas基础用法 + - name: description + content: 本节介绍 Pandas 数据结构的基础用法。下列代码创建上一节用过的示例数据对象: +--- + +# 基础用法 + +本节介绍 Pandas 数据结构的基础用法。下列代码创建上一节用过的示例数据对象: + +``` python +In [1]: index = pd.date_range('1/1/2000', periods=8) + +In [2]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) + +In [3]: df = pd.DataFrame(np.random.randn(8, 3), index=index, + ...: columns=['A', 'B', 'C']) + ...: +``` +## Head 与 Tail + +[`head()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html#pandas.DataFrame.head) 与 [`tail()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html#pandas.DataFrame.tail) 用于快速预览 Series 与 DataFrame,默认显示 5 条数据,也可以指定显示数据的数量。 + +```python +In [4]: long_series = pd.Series(np.random.randn(1000)) + +In [5]: long_series.head() +Out[5]: +0 -1.157892 +1 -1.344312 +2 0.844885 +3 1.075770 +4 -0.109050 +dtype: float64 + +In [6]: long_series.tail(3) +Out[6]: +997 -0.289388 +998 -1.020544 +999 0.589993 +dtype: float64 +``` +## 属性与底层数据 + +Pandas 可以通过多个属性访问元数据: + +- **shape**: + - 输出对象的轴维度,与 ndarray 一致 + +- **轴标签** + + - **Series**: *Index* (仅有此轴) + - **DataFrame**: *Index* (行) 与*列* + +注意: **为属性赋值是安全的**! + +```python +In [7]: df[:2] +Out[7]: + A B C +2000-01-01 -0.173215 0.119209 -1.044236 +2000-01-02 -0.861849 -2.104569 -0.494929 + +In [8]: df.columns = [x.lower() for x in df.columns] + +In [9]: df +Out[9]: + a b c +2000-01-01 -0.173215 0.119209 -1.044236 +2000-01-02 -0.861849 -2.104569 -0.494929 +2000-01-03 1.071804 0.721555 -0.706771 +2000-01-04 -1.039575 0.271860 -0.424972 +2000-01-05 0.567020 0.276232 -1.087401 +2000-01-06 -0.673690 0.113648 -1.478427 +2000-01-07 0.524988 0.404705 0.577046 +2000-01-08 -1.715002 -1.039268 -0.370647 +``` + +Pandas 对象([`Index`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index "pandas.Index"), [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series"), [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame "pandas.DataFrame"))相当于数组的容器,用于存储数据、执行计算。大部分类型的底层数组都是 [`numpy.ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray "(in NumPy v1.16)")。不过,Pandas 与第三方支持库一般都会扩展 NumPy 类型系统,添加自定义数组(见[数据类型](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes))。 + +`.array` 属性用于提取 [`Index`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index "pandas.Index") 或 [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 里的数据。 + +```python +In [10]: s.array +Out[10]: + +[ 0.4691122999071863, -0.2828633443286633, -1.5090585031735124, + -1.1356323710171934, 1.2121120250208506] +Length: 5, dtype: float64 + +In [11]: s.index.array +Out[11]: + +['a', 'b', 'c', 'd', 'e'] +Length: 5, dtype: object +``` +[`array`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.array.html#pandas.Series.array "pandas.Series.array") 一般指 [`ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionArray.html#pandas.api.extensions.ExtensionArray "pandas.api.extensions.ExtensionArray")。至于什么是 [`ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionArray.html#pandas.api.extensions.ExtensionArray "pandas.api.extensions.ExtensionArray") 及 Pandas 为什么要用 [`ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionArray.html#pandas.api.extensions.ExtensionArray "pandas.api.extensions.ExtensionArray") 不是本节要说明的内容。更多信息请参阅[数据类型](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes)。 + +提取 NumPy 数组,用 [`to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy "pandas.Series.to_numpy") 或 `numpy.asarray()`。 + +```python +In [12]: s.to_numpy() +Out[12]: array([ 0.4691, -0.2829, -1.5091, -1.1356, 1.2121]) + +In [13]: np.asarray(s) +Out[13]: array([ 0.4691, -0.2829, -1.5091, -1.1356, 1.2121]) +``` + +`Series` 与 `Index` 的类型是 [`ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionArray.html#pandas.api.extensions.ExtensionArray "pandas.api.extensions.ExtensionArray") 时, [`to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy "pandas.Series.to_numpy") 会复制数据,并强制转换值。详情见[数据类型](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes)。 + +[`to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy "pandas.Series.to_numpy") 可以控制 [`numpy.ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray "(in NumPy v1.16)") 生成的数据类型。以带时区的 datetime 为例,NumPy 未提供时区信息的 datetime 数据类型,Pandas 则提供了两种表现形式: + +1. 一种是带 [`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 的 [`numpy.ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray "(in NumPy v1.16)"),提供了正确的 `tz` 信息。 + +2. 另一种是 `datetime64[ns]`,这也是一种 [`numpy.ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray "(in NumPy v1.16)"),值被转换为 UTC,但去掉了时区信息。 + +时区信息可以用 `dtype=object` 保存。 + +```python +In [14]: ser = pd.Series(pd.date_range('2000', periods=2, tz="CET")) + +In [15]: ser.to_numpy(dtype=object) +Out[15]: +array([Timestamp('2000-01-01 00:00:00+0100', tz='CET', freq='D'), + Timestamp('2000-01-02 00:00:00+0100', tz='CET', freq='D')], + dtype=object) +``` +或用 `dtype='datetime64[ns]'` 去除。 + +```python +In [16]: ser.to_numpy(dtype="datetime64[ns]") +Out[16]: +array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'], + dtype='datetime64[ns]') +``` + +提取 `DataFrame` 里的**原数据**稍微有点复杂。DataFrame 里所有列的数据类型都一样时,[`DataFrame.to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy "pandas.DataFrame.to_numpy") 返回底层数据: + +```python +In [17]: df.to_numpy() +Out[17]: +array([[-0.1732, 0.1192, -1.0442], + [-0.8618, -2.1046, -0.4949], + [ 1.0718, 0.7216, -0.7068], + [-1.0396, 0.2719, -0.425 ], + [ 0.567 , 0.2762, -1.0874], + [-0.6737, 0.1136, -1.4784], + [ 0.525 , 0.4047, 0.577 ], + [-1.715 , -1.0393, -0.3706]]) +``` +DataFrame 为同构型数据时,Pandas 直接修改原始 `ndarray`,所做修改会直接反应在数据结构里。对于异质型数据,即 DataFrame 列的数据类型不一样时,就不是这种操作模式了。与轴标签不同,不能为值的属性赋值。 + +::: tip 注意 + +处理异质型数据时,输出结果 `ndarray` 的数据类型适用于涉及的各类数据。若 DataFrame 里包含字符串,输出结果的数据类型就是 `object`。要是只有浮点数或整数,则输出结果的数据类型是浮点数。 + +::: + +以前,Pandas 推荐用 [`Series.values`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.values.html#pandas.Series.values "pandas.Series.values") 或 [`DataFrame.values`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html#pandas.DataFrame.values "pandas.DataFrame.values") 从 Series 或 DataFrame 里提取数据。旧有代码库或在线教程里仍在用这种操作,但 Pandas 已改进了此功能,现在,推荐用 `.array` 或 `to_numpy` 提取数据,别再用 `.values` 了。`.values` 有以下几个缺点: + +1. Series 含[扩展类型](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-extension-types)时,[Series.values](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.values.html#pandas.Series.values) 无法判断到底是该返回 NumPy `array`,还是返回 `ExtensionArray`。而 [`Series.array`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.array.html#pandas.Series.array "pandas.Series.array") 则只返回 [`ExtensionArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionArray.html#pandas.api.extensions.ExtensionArray "pandas.api.extensions.ExtensionArray"),且不会复制数据。[`Series.to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy "pandas.Series.to_numpy") 则返回 NumPy 数组,其代价是需要复制、并强制转换数据的值。 + +2. DataFrame 含多种数据类型时,[`DataFrame.values`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.values.html#pandas.DataFrame.values "pandas.DataFrame.values") 会复制数据,并将数据的值强制转换同一种数据类型,这是一种代价较高的操作。[`DataFrame.to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy "pandas.DataFrame.to_numpy") 则返回 NumPy 数组,这种方式更清晰,也不会把 DataFrame 里的数据都当作一种类型。 + +## 加速操作 + +借助 `numexpr` 与 `bottleneck` 支持库,Pandas 可以加速特定类型的二进制数值与布尔操作。 + +处理大型数据集时,这两个支持库特别有用,加速效果也非常明显。 `numexpr` 使用智能分块、缓存与多核技术。`bottleneck` 是一组专属 cython 例程,处理含 `nans` 值的数组时,特别快。 + +请看下面这个例子(`DataFrame` 包含 100 列 X 10 万行数据): + +| 操作 | 0.11.0版 (ms) | 旧版 (ms) | 提升比率 | +| :---------: | :-----------: | :-------: | :------: | +| `df1 > df2` | 13.32 | 125.35 | 0.1063 | +| `df1 * df2` | 21.71 | 36.63 | 0.5928 | +| `df1 + df2` | 22.04 | 36.50 | 0.6039 | + +强烈建议安装这两个支持库,更多信息,请参阅[推荐支持库](https://pandas.pydata.org/pandas-docs/stable/install.html#install-recommended-dependencies)。 + +这两个支持库默认为启用状态,可用以下选项设置: + +*0.20.0 版新增。* + +```python +pd.set_option('compute.use_bottleneck', False) +pd.set_option('compute.use_numexpr', False) +``` + +## 二进制操作 + +Pandas 数据结构之间执行二进制操作,要注意下列两个关键点: + +* 多维(DataFrame)与低维(Series)对象之间的广播机制; +* 计算中的缺失值处理。 + +这两个问题可以同时处理,但下面先介绍怎么分开处理。 + +### 匹配/广播机制 + +DataFrame 支持 [`add()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add.html#pandas.DataFrame.add "pandas.DataFrame.add")、[`sub()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sub.html#pandas.DataFrame.sub "pandas.DataFrame.sub")、[`mul()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mul.html#pandas.DataFrame.mul "pandas.DataFrame.mul")、[`div()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.div.html#pandas.DataFrame.div "pandas.DataFrame.div") 及 [`radd()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.radd.html#pandas.DataFrame.radd "pandas.DataFrame.radd")、[`rsub()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rsub.html#pandas.DataFrame.rsub "pandas.DataFrame.rsub") 等方法执行二进制操作。广播机制重点关注输入的 Series。通过 `axis` 关键字,匹配 *index* 或 *columns* 即可调用这些函数。 + +```python +In [18]: df = pd.DataFrame({ + ....: 'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']), + ....: 'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']), + ....: 'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])}) + ....: + +In [19]: df +Out[19]: + one two three +a 1.394981 1.772517 NaN +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [20]: row = df.iloc[1] + +In [21]: column = df['two'] + +In [22]: df.sub(row, axis='columns') +Out[22]: + one two three +a 1.051928 -0.139606 NaN +b 0.000000 0.000000 0.000000 +c 0.352192 -0.433754 1.277825 +d NaN -1.632779 -0.562782 + +In [23]: df.sub(row, axis=1) +Out[23]: + one two three +a 1.051928 -0.139606 NaN +b 0.000000 0.000000 0.000000 +c 0.352192 -0.433754 1.277825 +d NaN -1.632779 -0.562782 + +In [24]: df.sub(column, axis='index') +Out[24]: + one two three +a -0.377535 0.0 NaN +b -1.569069 0.0 -1.962513 +c -0.783123 0.0 -0.250933 +d NaN 0.0 -0.892516 + +In [25]: df.sub(column, axis=0) +Out[25]: + one two three +a -0.377535 0.0 NaN +b -1.569069 0.0 -1.962513 +c -0.783123 0.0 -0.250933 +d NaN 0.0 -0.892516 +``` +还可以用 Series 对齐多层索引 DataFrame 的某一层级。 + +```python +In [26]: dfmi = df.copy() + +In [27]: dfmi.index = pd.MultiIndex.from_tuples([(1, 'a'), (1, 'b'), + ....: (1, 'c'), (2, 'a')], + ....: names=['first', 'second']) + ....: + +In [28]: dfmi.sub(column, axis=0, level='second') +Out[28]: + one two three +first second +1 a -0.377535 0.000000 NaN + b -1.569069 0.000000 -1.962513 + c -0.783123 0.000000 -0.250933 +2 a NaN -1.493173 -2.385688 +``` + +Series 与 Index 还支持 [`divmod()`](https://docs.python.org/3/library/functions.html#divmod "(in Python v3.7)") 内置函数,该函数同时执行向下取整除与模运算,返回两个与左侧类型相同的元组。示例如下: + +```python +In [29]: s = pd.Series(np.arange(10)) + +In [30]: s +Out[30]: +0 0 +1 1 +2 2 +3 3 +4 4 +5 5 +6 6 +7 7 +8 8 +9 9 +dtype: int64 + +In [31]: div, rem = divmod(s, 3) + +In [32]: div +Out[32]: +0 0 +1 0 +2 0 +3 1 +4 1 +5 1 +6 2 +7 2 +8 2 +9 3 +dtype: int64 + +In [33]: rem +Out[33]: +0 0 +1 1 +2 2 +3 0 +4 1 +5 2 +6 0 +7 1 +8 2 +9 0 +dtype: int64 + +In [34]: idx = pd.Index(np.arange(10)) + +In [35]: idx +Out[35]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64') + +In [36]: div, rem = divmod(idx, 3) + +In [37]: div +Out[37]: Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2, 3], dtype='int64') + +In [38]: rem +Out[38]: Int64Index([0, 1, 2, 0, 1, 2, 0, 1, 2, 0], dtype='int64') +``` +[`divmod()`](https://docs.python.org/3/library/functions.html#divmod "(in Python v3.7)") 还支持元素级运算: + +```python +In [39]: div, rem = divmod(s, [2, 2, 3, 3, 4, 4, 5, 5, 6, 6]) + +In [40]: div +Out[40]: +0 0 +1 0 +2 0 +3 1 +4 1 +5 1 +6 1 +7 1 +8 1 +9 1 +dtype: int64 + +In [41]: rem +Out[41]: +0 0 +1 1 +2 2 +3 0 +4 0 +5 1 +6 1 +7 2 +8 2 +9 3 +dtype: int64 +``` + +### 缺失值与填充缺失值操作 + +Series 与 DataFrame 的算数函数支持 `fill_value` 选项,即用指定值替换某个位置的缺失值。比如,两个 DataFrame 相加,除非两个 DataFrame 里同一个位置都有缺失值,其相加的和仍为 `NaN`,如果只有一个 DataFrame 里存在缺失值,则可以用 `fill_value` 指定一个值来替代 `NaN`,当然,也可以用 `fillna` 把 `NaN` 替换为想要的值。 + +::: tip 注意 + +下面第 43 条代码里,Pandas 官档没有写 df2 是哪里来的,这里补上,与 df 类似。 ```python +df2 = pd.DataFrame({ + ....: 'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']), + ....: 'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']), + ....: 'three': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd'])}) + ....: +``` +::: + +```python +In [42]: df +Out[42]: + one two three +a 1.394981 1.772517 NaN +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [43]: df2 +Out[43]: + one two three +a 1.394981 1.772517 1.000000 +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [44]: df + df2 +Out[44]: + one two three +a 2.789963 3.545034 NaN +b 0.686107 3.824246 -0.100780 +c 1.390491 2.956737 2.454870 +d NaN 0.558688 -1.226343 + +In [45]: df.add(df2, fill_value=0) +Out[45]: + one two three +a 2.789963 3.545034 1.000000 +b 0.686107 3.824246 -0.100780 +c 1.390491 2.956737 2.454870 +d NaN 0.558688 -1.226343 +``` + +### 比较操作 + +与上一小节的算数运算类似,Series 与 DataFrame 还支持 `eq`、`ne`、`lt`、`gt`、`le`、`ge` 等二进制比较操作的方法: + +| 序号 | 缩写 | 英文 | 中文 | +| :--: | :--: | :----------------------: | :------: | +| 1 | eq | equal to | 等于 | +| 2 | ne | not equal to | 不等于 | +| 3 | lt | less than | 小于 | +| 4 | gt | greater than | 大于 | +| 5 | le | less than or equal to | 小于等于 | +| 6 | ge | greater than or equal to | 大于等于 | + +```python +In [46]: df.gt(df2) +Out[46]: + one two three +a False False False +b False False False +c False False False +d False False False + +In [47]: df2.ne(df) +Out[47]: + one two three +a False False True +b False False False +c False False False +d True False False +``` + +这些操作生成一个与左侧输入对象类型相同的 Pandas 对象,即,dtype 为 `bool`。`boolean` 对象可用于索引操作,参阅[布尔索引](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-boolean)。 + +### 布尔简化 + +[`empty`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.empty.html#pandas.DataFrame.empty "pandas.DataFrame.empty")、[`any()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.any.html#pandas.DataFrame.any "pandas.DataFrame.any")、[`all()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.all.html#pandas.DataFrame.all "pandas.DataFrame.all")、[`bool()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.bool.html#pandas.DataFrame.bool "pandas.DataFrame.bool") 可以把数据汇总简化至单个布尔值。 + +```python +In [48]: (df > 0).all() +Out[48]: +one False +two True +three False +dtype: bool + +In [49]: (df > 0).any() +Out[49]: +one True +two True +three True +dtype: bool +``` + +还可以进一步把上面的结果简化为单个布尔值。 + +```python +In [50]: (df > 0).any().any() +Out[50]: True +``` + +通过 [`empty`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.empty.html#pandas.DataFrame.empty "pandas.DataFrame.empty") 属性,可以验证 Pandas 对象是否为**空**。 + +```python +In [51]: df.empty +Out[51]: False + +In [52]: pd.DataFrame(columns=list('ABC')).empty +Out[52]: True +``` + +用 [`bool()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.bool.html#pandas.DataFrame.bool "pandas.DataFrame.bool") 方法验证单元素 pandas 对象的布尔值。 + +```python +In [53]: pd.Series([True]).bool() +Out[53]: True + +In [54]: pd.Series([False]).bool() +Out[54]: False + +In [55]: pd.DataFrame([[True]]).bool() +Out[55]: True + +In [56]: pd.DataFrame([[False]]).bool() +Out[56]: False +``` +::: danger 警告 + +以下代码: +```python +>>> if df: +... pass +``` + +或 + +```python +>>> df and df2 +``` + +上述代码试图比对多个值,因此,这两种操作都会触发错误: + +```python +ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all(). +``` + +::: + +了解详情,请参阅[各种坑](https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#gotchas-truth)小节的内容。 + +### 比较对象是否等效 + +一般情况下,多种方式都能得出相同的结果。以 `df + df` 与 `df * 2` 为例。应用上一小节学到的知识,测试这两种计算方式的结果是否一致,一般人都会用 `(df + df == df * 2).all()`,不过,这个表达式的结果是 `False`: + +```python +In [57]: df + df == df * 2 +Out[57]: + one two three +a True True False +b True True True +c True True True +d False True True + +In [58]: (df + df == df * 2).all() +Out[58]: +one False +two True +three False +dtype: bool +``` + +注意:布尔型 DataFrame `df + df == df * 2` 中有 `False` 值!这是因为两个 `NaN` 值的比较结果为**不等**: + +```python +In [59]: np.nan == np.nan +Out[59]: False +``` + +为了验证数据是否等效,Series 与 DataFrame 等 N 维框架提供了 [`equals()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.equals.html#pandas.DataFrame.equals "pandas.DataFrame.equals") 方法,用这个方法验证 `NaN` 值的结果为**相等**。 + +```python +In [60]: (df + df).equals(df * 2) +Out[60]: True +``` + +注意:Series 与 DataFrame 索引的顺序必须一致,验证结果才能为 `True`: + +```python +In [61]: df1 = pd.DataFrame({'col': ['foo', 0, np.nan]}) + +In [62]: df2 = pd.DataFrame({'col': [np.nan, 0, 'foo']}, index=[2, 1, 0]) + +In [63]: df1.equals(df2) +Out[63]: False + +In [64]: df1.equals(df2.sort_index()) +Out[64]: True +``` + +### 比较 array 型对象 + +用标量值与 Pandas 数据结构对比数据元素非常简单: + +```python +In [65]: pd.Series(['foo', 'bar', 'baz']) == 'foo' +Out[65]: +0 True +1 False +2 False +dtype: bool + +In [66]: pd.Index(['foo', 'bar', 'baz']) == 'foo' +Out[66]: array([ True, False, False]) +``` + +Pandas 还能对比两个等长 `array` 对象里的数据元素: + +```python +In [67]: pd.Series(['foo', 'bar', 'baz']) == pd.Index(['foo', 'bar', 'qux']) +Out[67]: +0 True +1 True +2 False +dtype: bool + +In [68]: pd.Series(['foo', 'bar', 'baz']) == np.array(['foo', 'bar', 'qux']) +Out[68]: +0 True +1 True +2 False +dtype: bool +``` + +对比不等长的 `Index` 或 `Series` 对象会触发 `ValueError`: + +```python +In [55]: pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo', 'bar']) +ValueError: Series lengths must match to compare + +In [56]: pd.Series(['foo', 'bar', 'baz']) == pd.Series(['foo']) +ValueError: Series lengths must match to compare +``` + +注意: 这里的操作与 NumPy 的广播机制不同: + +```python +In [69]: np.array([1, 2, 3]) == np.array([2]) +Out[69]: array([False, True, False]) +``` + +NumPy 无法执行广播操作时,返回 `False`: + +```python +In [70]: np.array([1, 2, 3]) == np.array([1, 2]) +Out[70]: False +``` + +### 合并重叠数据集 + +有时,要合并两个相似的数据集,两个数据集里的其中一个的数据比另一个多。比如,展示特定经济指标的两个数据序列,其中一个是“高质量”指标,另一个是“低质量”指标。一般来说,低质量序列可能包含更多的历史数据,或覆盖更广的数据。因此,要合并这两个 DataFrame 对象,其中一个 DataFrame 中的缺失值将按指定条件用另一个 DataFrame 里类似标签中的数据进行填充。要实现这一操作,请用下列代码中的 [`combine_first()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first "pandas.DataFrame.combine_first") 函数。 + +```python +In [71]: df1 = pd.DataFrame({'A': [1., np.nan, 3., 5., np.nan], + ....: 'B': [np.nan, 2., 3., np.nan, 6.]}) + ....: + +In [72]: df2 = pd.DataFrame({'A': [5., 2., 4., np.nan, 3., 7.], + ....: 'B': [np.nan, np.nan, 3., 4., 6., 8.]}) + ....: + +In [73]: df1 +Out[73]: + A B +0 1.0 NaN +1 NaN 2.0 +2 3.0 3.0 +3 5.0 NaN +4 NaN 6.0 + +In [74]: df2 +Out[74]: + A B +0 5.0 NaN +1 2.0 NaN +2 4.0 3.0 +3 NaN 4.0 +4 3.0 6.0 +5 7.0 8.0 + +In [75]: df1.combine_first(df2) +Out[75]: + A B +0 1.0 NaN +1 2.0 2.0 +2 3.0 3.0 +3 5.0 4.0 +4 3.0 6.0 +5 7.0 8.0 +``` + +### DataFrame 通用合并方法 + +上述 [`combine_first()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first "pandas.DataFrame.combine_first") 方法调用了更普适的 [`DataFrame.combine()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine.html#pandas.DataFrame.combine "pandas.DataFrame.combine") 方法。该方法提取另一个 DataFrame 及合并器函数,并将之与输入的 DataFrame 对齐,再传递与 Series 配对的合并器函数(比如,名称相同的列)。 + +下面的代码复现了上述的 [`combine_first()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first "pandas.DataFrame.combine_first") 函数: + +```python +In [76]: def combiner(x, y): + ....: return np.where(pd.isna(x), y, x) + ....: +``` + +## 描述性统计 + +[Series](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-stats) 与 [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#api-dataframe-stats) 支持大量计算描述性统计的方法与操作。这些方法大部分都是 [`sum()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sum.html#pandas.DataFrame.sum "pandas.DataFrame.sum")、[`mean()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mean.html#pandas.DataFrame.mean "pandas.DataFrame.mean")、[`quantile()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html#pandas.DataFrame.quantile "pandas.DataFrame.quantile") 等聚合函数,其输出结果比原始数据集小;此外,还有输出结果与原始数据集同样大小的 [`cumsum()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html#pandas.DataFrame.cumsum "pandas.DataFrame.cumsum") 、 [`cumprod()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumprod.html#pandas.DataFrame.cumprod "pandas.DataFrame.cumprod") 等函数。这些方法都基本上都接受 `axis` 参数,如, `ndarray.{sum,std,…}`,但这里的 `axis` 可以用名称或整数指定: + +* **Series**:无需 `axis` 参数 +* **DataFrame**: + * `index`,即 `axis=0`,默认值 + * `columns`, 即 `axis=1` + +示例如下: + +```python +In [77]: df +Out[77]: + one two three +a 1.394981 1.772517 NaN +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [78]: df.mean(0) +Out[78]: +one 0.811094 +two 1.360588 +three 0.187958 +dtype: float64 + +In [79]: df.mean(1) +Out[79]: +a 1.583749 +b 0.734929 +c 1.133683 +d -0.166914 +dtype: float64 +``` + +上述方法都支持 `skipna` 关键字,指定是否要排除缺失数据,默认值为 `True`。 + +```python +In [80]: df.sum(0, skipna=False) +Out[80]: +one NaN +two 5.442353 +three NaN +dtype: float64 + +In [81]: df.sum(axis=1, skipna=True) +Out[81]: +a 3.167498 +b 2.204786 +c 3.401050 +d -0.333828 +dtype: float64 +``` + +结合广播机制或算数操作,可以描述不同统计过程,比如标准化,即渲染数据零均值与标准差 1,这种操作非常简单: + +```python +In [82]: ts_stand = (df - df.mean()) / df.std() + +In [83]: ts_stand.std() +Out[83]: +one 1.0 +two 1.0 +three 1.0 +dtype: float64 + +In [84]: xs_stand = df.sub(df.mean(1), axis=0).div(df.std(1), axis=0) + +In [85]: xs_stand.std(1) +Out[85]: +a 1.0 +b 1.0 +c 1.0 +d 1.0 +dtype: float64 +``` + +注 : [`cumsum()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html#pandas.DataFrame.cumsum) 与 [`cumprod()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumprod.html#pandas.DataFrame.cumprod) 等方法保留 `NaN` 值的位置。这与 [`expanding()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.expanding.html#pandas.DataFrame.expanding) 和 [`rolling()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#pandas.DataFrame.rolling) 略显不同,详情请参阅[本文](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#stats-moments-expanding-note)。 + +```python +In [86]: df.cumsum() +Out[86]: + one two three +a 1.394981 1.772517 NaN +b 1.738035 3.684640 -0.050390 +c 2.433281 5.163008 1.177045 +d NaN 5.442353 0.563873 +``` + +下表为常用函数汇总表。每个函数都支持 `level` 参数,仅在数据对象为[结构化 Index](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-hierarchical) 时使用。 + +| 函数 | 描述 | +| :--------: | :----------------------: | +| `count` | 统计非空值数量 | +| `sum` | 汇总值 | +| `mean` | 平均值 | +| `mad` | 平均绝对偏差 | +| `median` | 算数中位数 | +| `min` | 最小值 | +| `max` | 最大值 | +| `mode` | 众数 | +| `abs` | 绝对值 | +| `prod` | 乘积 | +| `std` | 贝塞尔校正的样本标准偏差 | +| `var` | 无偏方差 | +| `sem` | 平均值的标准误差 | +| `skew` | 样本偏度 (第三阶) | +| `kurt` | 样本峰度 (第四阶) | +| `quantile` | 样本分位数 (不同 % 的值) | +| `cumsum` | 累加 | +| `cumprod` | 累乘 | +| `cummax` | 累积最大值 | +| `cummin` | 累积最小值 | + +注意:NumPy 的 `mean`、`std`、`sum` 等方法默认不统计 Series 里的空值。 + +```python +In [87]: np.mean(df['one']) +Out[87]: 0.8110935116651192 + +In [88]: np.mean(df['one'].to_numpy()) +Out[88]: nan +``` + +[`Series.nunique()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.nunique.html#pandas.Series.nunique) 返回 Series 里所有非空值的唯一值。 + +```python +In [89]: series = pd.Series(np.random.randn(500)) + +In [90]: series[20:500] = np.nan + +In [91]: series[10:20] = 5 + +In [92]: series.nunique() +Out[92]: 11 +``` + +### 数据总结:`describe` + +[`describe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html#pandas.DataFrame.describe) 函数计算 Series 与 DataFrame 数据列的各种数据统计量,注意,这里排除了**空值**。 + +```python +In [93]: series = pd.Series(np.random.randn(1000)) + +In [94]: series[::2] = np.nan + +In [95]: series.describe() +Out[95]: +count 500.000000 +mean -0.021292 +std 1.015906 +min -2.683763 +25% -0.699070 +50% -0.069718 +75% 0.714483 +max 3.160915 +dtype: float64 + +In [96]: frame = pd.DataFrame(np.random.randn(1000, 5), + ....: columns=['a', 'b', 'c', 'd', 'e']) + ....: + +In [97]: frame.iloc[::2] = np.nan + +In [98]: frame.describe() +Out[98]: + a b c d e +count 500.000000 500.000000 500.000000 500.000000 500.000000 +mean 0.033387 0.030045 -0.043719 -0.051686 0.005979 +std 1.017152 0.978743 1.025270 1.015988 1.006695 +min -3.000951 -2.637901 -3.303099 -3.159200 -3.188821 +25% -0.647623 -0.576449 -0.712369 -0.691338 -0.691115 +50% 0.047578 -0.021499 -0.023888 -0.032652 -0.025363 +75% 0.729907 0.775880 0.618896 0.670047 0.649748 +max 2.740139 2.752332 3.004229 2.728702 3.240991 +``` + +此外,还可以指定输出结果包含的分位数: + +```python +In [99]: series.describe(percentiles=[.05, .25, .75, .95]) +Out[99]: +count 500.000000 +mean -0.021292 +std 1.015906 +min -2.683763 +5% -1.645423 +25% -0.699070 +50% -0.069718 +75% 0.714483 +95% 1.711409 +max 3.160915 +dtype: float64 +``` + +一般情况下,默认值包含**中位数**。 + +对于非数值型 Series 对象, [`describe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.describe.html#pandas.Series.describe) 返回值的总数、唯一值数量、出现次数最多的值及出现的次数。 + +```python +In [100]: s = pd.Series(['a', 'a', 'b', 'b', 'a', 'a', np.nan, 'c', 'd', 'a']) + +In [101]: s.describe() +Out[101]: +count 9 +unique 4 +top a +freq 5 +dtype: object +``` + +注意:对于混合型的 DataFrame 对象, [`describe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.describe.html#pandas.Series.describe) 只返回数值列的汇总统计量,如果没有数值列,则只显示类别型的列。 + +```python +In [102]: frame = pd.DataFrame({'a': ['Yes', 'Yes', 'No', 'No'], 'b': range(4)}) + +In [103]: frame.describe() +Out[103]: + b +count 4.000000 +mean 1.500000 +std 1.290994 +min 0.000000 +25% 0.750000 +50% 1.500000 +75% 2.250000 +max 3.000000 +``` +`include/exclude` 参数的值为列表,用该参数可以控制包含或排除的数据类型。这里还有一个特殊值,`all`: + +```python +In [104]: frame.describe(include=['object']) +Out[104]: + a +count 4 +unique 2 +top Yes +freq 2 + +In [105]: frame.describe(include=['number']) +Out[105]: + b +count 4.000000 +mean 1.500000 +std 1.290994 +min 0.000000 +25% 0.750000 +50% 1.500000 +75% 2.250000 +max 3.000000 + +In [106]: frame.describe(include='all') +Out[106]: + a b +count 4 4.000000 +unique 2 NaN +top Yes NaN +freq 2 NaN +mean NaN 1.500000 +std NaN 1.290994 +min NaN 0.000000 +25% NaN 0.750000 +50% NaN 1.500000 +75% NaN 2.250000 +max NaN 3.000000 +``` +本功能依托于 [`select_dtypes`](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-selectdtypes),要了解该参数接受哪些输入内容请参阅[本文](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-selectdtypes)。 + +### 最大值与最小值对应的索引 + +Series 与 DataFrame 的 [`idxmax()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmax.html#pandas.DataFrame.idxmax) 与 [`idxmin()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html#pandas.DataFrame.idxmin) 函数计算最大值与最小值对应的索引。 + +```python +In [107]: s1 = pd.Series(np.random.randn(5)) + +In [108]: s1 +Out[108]: +0 1.118076 +1 -0.352051 +2 -1.242883 +3 -1.277155 +4 -0.641184 +dtype: float64 + +In [109]: s1.idxmin(), s1.idxmax() +Out[109]: (3, 0) + +In [110]: df1 = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C']) + +In [111]: df1 +Out[111]: + A B C +0 -0.327863 -0.946180 -0.137570 +1 -0.186235 -0.257213 -0.486567 +2 -0.507027 -0.871259 -0.111110 +3 2.000339 -2.430505 0.089759 +4 -0.321434 -0.033695 0.096271 + +In [112]: df1.idxmin(axis=0) +Out[112]: +A 2 +B 3 +C 1 +dtype: int64 + +In [113]: df1.idxmax(axis=1) +Out[113]: +0 C +1 A +2 C +3 A +4 C +dtype: object +``` + +多行或多列中存在多个最大值或最小值时,[`idxmax()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmax.html#pandas.DataFrame.idxmax) 与 [`idxmin()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.idxmin.html#pandas.DataFrame.idxmin) 只返回匹配到的第一个值的 `Index`: + +```python +In [114]: df3 = pd.DataFrame([2, 1, 1, 3, np.nan], columns=['A'], index=list('edcba')) + +In [115]: df3 +Out[115]: + A +e 2.0 +d 1.0 +c 1.0 +b 3.0 +a NaN + +In [116]: df3['A'].idxmin() +Out[116]: 'd' +``` +::: tip 注意 + +`idxmin` 与 `idxmax` 对应 NumPy 里的 `argmin` 与 `argmax`。 + +::: + +### 值计数(直方图)与众数 + +Series 的 [`value_counts()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html#pandas.Series.value_counts) 方法及顶级函数计算一维数组中数据值的直方图,还可以用作常规数组的函数: + +```python +In [117]: data = np.random.randint(0, 7, size=50) + +In [118]: data +Out[118]: +array([6, 6, 2, 3, 5, 3, 2, 5, 4, 5, 4, 3, 4, 5, 0, 2, 0, 4, 2, 0, 3, 2, + 2, 5, 6, 5, 3, 4, 6, 4, 3, 5, 6, 4, 3, 6, 2, 6, 6, 2, 3, 4, 2, 1, + 6, 2, 6, 1, 5, 4]) + +In [119]: s = pd.Series(data) + +In [120]: s.value_counts() +Out[120]: +6 10 +2 10 +4 9 +5 8 +3 8 +0 3 +1 2 +dtype: int64 + +In [121]: pd.value_counts(data) +Out[121]: +6 10 +2 10 +4 9 +5 8 +3 8 +0 3 +1 2 +dtype: int64 +``` + +与上述操作类似,还可以统计 Series 或 DataFrame 的众数,即出现频率最高的值: + +```python +In [122]: s5 = pd.Series([1, 1, 3, 3, 3, 5, 5, 7, 7, 7]) + +In [123]: s5.mode() +Out[123]: +0 3 +1 7 +dtype: int64 + +In [124]: df5 = pd.DataFrame({"A": np.random.randint(0, 7, size=50), + .....: "B": np.random.randint(-10, 15, size=50)}) + .....: + +In [125]: df5.mode() +Out[125]: + A B +0 1.0 -9 +1 NaN 10 +2 NaN 13 +``` + +### 离散化与分位数 + +[`cut() 函数`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut)(以值为依据实现分箱)及 [`qcut() 函数`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html#pandas.qcut)(以样本分位数为依据实现分箱)用于连续值的离散化: + +```python +In [126]: arr = np.random.randn(20) + +In [127]: factor = pd.cut(arr, 4) + +In [128]: factor +Out[128]: +[(-0.251, 0.464], (-0.968, -0.251], (0.464, 1.179], (-0.251, 0.464], (-0.968, -0.251], ..., (-0.251, 0.464], (-0.968, -0.251], (-0.968, -0.251], (-0.968, -0.251], (-0.968, -0.251]] +Length: 20 +Categories (4, interval[float64]): [(-0.968, -0.251] < (-0.251, 0.464] < (0.464, 1.179] < + (1.179, 1.893]] + +In [129]: factor = pd.cut(arr, [-5, -1, 0, 1, 5]) + +In [130]: factor +Out[130]: +[(0, 1], (-1, 0], (0, 1], (0, 1], (-1, 0], ..., (-1, 0], (-1, 0], (-1, 0], (-1, 0], (-1, 0]] +Length: 20 +Categories (4, interval[int64]): [(-5, -1] < (-1, 0] < (0, 1] < (1, 5]] +``` + +[`qcut()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html#pandas.qcut) 计算样本分位数。比如,下列代码按等距分位数分割正态分布的数据: + +```python +In [131]: arr = np.random.randn(30) + +In [132]: factor = pd.qcut(arr, [0, .25, .5, .75, 1]) + +In [133]: factor +Out[133]: +[(0.569, 1.184], (-2.278, -0.301], (-2.278, -0.301], (0.569, 1.184], (0.569, 1.184], ..., (-0.301, 0.569], (1.184, 2.346], (1.184, 2.346], (-0.301, 0.569], (-2.278, -0.301]] +Length: 30 +Categories (4, interval[float64]): [(-2.278, -0.301] < (-0.301, 0.569] < (0.569, 1.184] < + (1.184, 2.346]] + +In [134]: pd.value_counts(factor) +Out[134]: +(1.184, 2.346] 8 +(-2.278, -0.301] 8 +(0.569, 1.184] 7 +(-0.301, 0.569] 7 +dtype: int64 +``` + +定义分箱时,还可以传递无穷值: + +```python +In [135]: arr = np.random.randn(20) + +In [136]: factor = pd.cut(arr, [-np.inf, 0, np.inf]) + +In [137]: factor +Out[137]: +[(-inf, 0.0], (0.0, inf], (0.0, inf], (-inf, 0.0], (-inf, 0.0], ..., (-inf, 0.0], (-inf, 0.0], (-inf, 0.0], (0.0, inf], (0.0, inf]] +Length: 20 +Categories (2, interval[float64]): [(-inf, 0.0] < (0.0, inf]] +``` + +## 函数应用 + +不管是为 Pandas 对象应用自定义函数,还是应用第三方函数,都离不开以下三种方法。用哪种方法取决于操作的对象是 `DataFrame`,还是 `Series` ;是行、列,还是元素。 + +1. 表级函数应用:[`pipe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pipe.html#pandas.DataFrame.pipe) +2. 行列级函数应用: [`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply) + +3. 聚合 API: [`agg()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html#pandas.DataFrame.agg) 与 [`transform()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html#pandas.DataFrame.transform) +4. 元素级函数应用:[`applymap()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap) + +### 表级函数应用 + +虽然可以把 DataFrame 与 Series 传递给函数,不过链式调用函数时,最好使用 [`pipe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pipe.html#pandas.DataFrame.pipe) 方法。对比以下两种方式: + +```python +# f、g、h 是提取、返回 `DataFrames` 的函数 +>>> f(g(h(df), arg1=1), arg2=2, arg3=3) +``` + +下列代码与上述代码等效: + +```python +>>> (df.pipe(h) +... .pipe(g, arg1=1) +... .pipe(f, arg2=2, arg3=3)) +``` + +Pandas 鼓励使用第二种方式,即链式方法。在链式方法中调用自定义函数或第三方支持库函数时,用 `pipe` 更容易,与用 Pandas 自身方法一样。 + +上例中,`f`、`g` 与 `h` 这几个函数都把 `DataFrame` 当作首位参数。要是想把数据作为第二个参数,该怎么办?本例中,`pipe` 为元组 (`callable,data_keyword`)形式。`.pipe` 把 `DataFrame` 作为元组里指定的参数。 + +下例用 statsmodels 拟合回归。该 API 先接收一个公式,`DataFrame` 是第二个参数,`data`。要传递函数,则要用`pipe` 接收关键词对 (`sm.ols,'data'`)。 + +```python +In [138]: import statsmodels.formula.api as sm + +In [139]: bb = pd.read_csv('data/baseball.csv', index_col='id') + +In [140]: (bb.query('h > 0') + .....: .assign(ln_h=lambda df: np.log(df.h)) + .....: .pipe((sm.ols, 'data'), 'hr ~ ln_h + year + g + C(lg)') + .....: .fit() + .....: .summary() + .....: ) + .....: +Out[140]: + +""" + OLS Regression Results +============================================================================== +Dep. Variable: hr R-squared: 0.685 +Model: OLS Adj. R-squared: 0.665 +Method: Least Squares F-statistic: 34.28 +Date: Thu, 22 Aug 2019 Prob (F-statistic): 3.48e-15 +Time: 15:48:59 Log-Likelihood: -205.92 +No. Observations: 68 AIC: 421.8 +Df Residuals: 63 BIC: 432.9 +Df Model: 4 +Covariance Type: nonrobust +=============================================================================== + coef std err t P>|t| [0.025 0.975] +------------------------------------------------------------------------------- +Intercept -8484.7720 4664.146 -1.819 0.074 -1.78e+04 835.780 +C(lg)[T.NL] -2.2736 1.325 -1.716 0.091 -4.922 0.375 +ln_h -1.3542 0.875 -1.547 0.127 -3.103 0.395 +year 4.2277 2.324 1.819 0.074 -0.417 8.872 +g 0.1841 0.029 6.258 0.000 0.125 0.243 +============================================================================== +Omnibus: 10.875 Durbin-Watson: 1.999 +Prob(Omnibus): 0.004 Jarque-Bera (JB): 17.298 +Skew: 0.537 Prob(JB): 0.000175 +Kurtosis: 5.225 Cond. No. 1.49e+07 +============================================================================== + +Warnings: +[1] Standard Errors assume that the covariance matrix of the errors is correctly specified. +[2] The condition number is large, 1.49e+07. This might indicate that there are +strong multicollinearity or other numerical problems. +""" +``` + +unix 的 `pipe` 与后来出现的 [dplyr](https://github.com/hadley/dplyr) 及 [magrittr](https://github.com/smbache/magrittr) 启发了`pipe` 方法,在此,引入了 R 语言里用于读取 pipe 的操作符 (`%>%`)。`pipe` 的实现思路非常清晰,仿佛 Python 源生的一样。强烈建议大家阅读 [`pipe()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pipe.html#pandas.DataFrame.pipe) 的源代码。 + +### 行列级函数应用 + +[`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply) 方法沿着 DataFrame 的轴应用函数,比如,描述性统计方法,该方法支持 `axis` 参数。 + +```python +In [141]: df.apply(np.mean) +Out[141]: +one 0.811094 +two 1.360588 +three 0.187958 +dtype: float64 + +In [142]: df.apply(np.mean, axis=1) +Out[142]: +a 1.583749 +b 0.734929 +c 1.133683 +d -0.166914 +dtype: float64 + +In [143]: df.apply(lambda x: x.max() - x.min()) +Out[143]: +one 1.051928 +two 1.632779 +three 1.840607 +dtype: float64 + +In [144]: df.apply(np.cumsum) +Out[144]: + one two three +a 1.394981 1.772517 NaN +b 1.738035 3.684640 -0.050390 +c 2.433281 5.163008 1.177045 +d NaN 5.442353 0.563873 + +In [145]: df.apply(np.exp) +Out[145]: + one two three +a 4.034899 5.885648 NaN +b 1.409244 6.767440 0.950858 +c 2.004201 4.385785 3.412466 +d NaN 1.322262 0.541630 +``` + +[`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply) 方法还支持通过函数名字符串调用函数。 + +```python +In [146]: df.apply('mean') +Out[146]: +one 0.811094 +two 1.360588 +three 0.187958 +dtype: float64 + +In [147]: df.apply('mean', axis=1) +Out[147]: +a 1.583749 +b 0.734929 +c 1.133683 +d -0.166914 +dtype: float64 +``` + +默认情况下,[`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply) 调用的函数返回的类型会影响 `DataFrame.apply` 输出结果的类型。 + +* 函数返回的是 `Series` 时,最终输出结果是 `DataFrame`。输出的列与函数返回的 `Series` 索引相匹配。 + +* 函数返回其它任意类型时,输出结果是 `Series`。 + +`result_type` 会覆盖默认行为,该参数有三个选项:`reduce`、`broadcast`、`expand`。这些选项决定了列表型返回值是否扩展为 `DataFrame`。 + +用好 [`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") 可以了解数据集的很多信息。比如可以提取每列的最大值对应的日期: + +```python +In [148]: tsdf = pd.DataFrame(np.random.randn(1000, 3), columns=['A', 'B', 'C'], + .....: index=pd.date_range('1/1/2000', periods=1000)) + .....: + +In [149]: tsdf.apply(lambda x: x.idxmax()) +Out[149]: +A 2000-08-06 +B 2001-01-18 +C 2001-07-18 +dtype: datetime64[ns] +``` + +还可以向 [`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") 方法传递额外的参数与关键字参数。比如下例中要应用的这个函数: + +```python +def subtract_and_divide(x, sub, divide=1): + return (x - sub) / divide +``` + +可以用下列方式应用该函数: + +```python +df.apply(subtract_and_divide, args=(5,), divide=3) +``` + +为每行或每列执行 `Series` 方法的功能也很实用: + +```python +In [150]: tsdf +Out[150]: + A B C +2000-01-01 -0.158131 -0.232466 0.321604 +2000-01-02 -1.810340 -3.105758 0.433834 +2000-01-03 -1.209847 -1.156793 -0.136794 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 -0.653602 0.178875 1.008298 +2000-01-09 1.007996 0.462824 0.254472 +2000-01-10 0.307473 0.600337 1.643950 + +In [151]: tsdf.apply(pd.Series.interpolate) +Out[151]: + A B C +2000-01-01 -0.158131 -0.232466 0.321604 +2000-01-02 -1.810340 -3.105758 0.433834 +2000-01-03 -1.209847 -1.156793 -0.136794 +2000-01-04 -1.098598 -0.889659 0.092225 +2000-01-05 -0.987349 -0.622526 0.321243 +2000-01-06 -0.876100 -0.355392 0.550262 +2000-01-07 -0.764851 -0.088259 0.779280 +2000-01-08 -0.653602 0.178875 1.008298 +2000-01-09 1.007996 0.462824 0.254472 +2000-01-10 0.307473 0.600337 1.643950 +``` +[`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") 有一个参数 `raw`,默认值为 `False`,在应用函数前,使用该参数可以将每行或列转换为 `Series`。该参数为 `True` 时,传递的函数接收 ndarray 对象,若不需要索引功能,这种操作能显著提高性能。 + +### 聚合 API + +*0.20.0 版新增*。 + +聚合 API 可以快速、简洁地执行多个聚合操作。Pandas 对象支持多个类似的 API,如 [groupby API](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby-aggregate)、[window functions API](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#stats-aggregate)、[resample API](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-aggregate)。聚合函数为[`DataFrame.aggregate()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html#pandas.DataFrame.aggregate "pandas.DataFrame.aggregate"),它的别名是 [`DataFrame.agg()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html#pandas.DataFrame.agg "pandas.DataFrame.agg")。 + +此处用与上例类似的 `DataFrame`: + +```python +In [152]: tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], + .....: index=pd.date_range('1/1/2000', periods=10)) + .....: + +In [153]: tsdf.iloc[3:7] = np.nan + +In [154]: tsdf +Out[154]: + A B C +2000-01-01 1.257606 1.004194 0.167574 +2000-01-02 -0.749892 0.288112 -0.757304 +2000-01-03 -0.207550 -0.298599 0.116018 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 0.814347 -0.257623 0.869226 +2000-01-09 -0.250663 -1.206601 0.896839 +2000-01-10 2.169758 -1.333363 0.283157 +``` + +应用单个函数时,该操作与 [`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") 等效,这里也可以用字符串表示聚合函数名。下面的聚合函数输出的结果为 `Series`: + +```python +In [155]: tsdf.agg(np.sum) +Out[155]: +A 3.033606 +B -1.803879 +C 1.575510 +dtype: float64 + +In [156]: tsdf.agg('sum') +Out[156]: +A 3.033606 +B -1.803879 +C 1.575510 +dtype: float64 + +# 因为应用的是单个函数,该操作与`.sum()` 是等效的 +In [157]: tsdf.sum() +Out[157]: +A 3.033606 +B -1.803879 +C 1.575510 +dtype: float64 +``` + +`Series` 单个聚合操作返回标量值: + +```python +In [158]: tsdf.A.agg('sum') +Out[158]: 3.033606102414146 +``` + +### 多函数聚合 + +还可以用列表形式传递多个聚合函数。每个函数在输出结果 `DataFrame` 里以行的形式显示,行名是每个聚合函数的函数名。 + +```python +In [159]: tsdf.agg(['sum']) +Out[159]: + A B C +sum 3.033606 -1.803879 1.57551 +``` + +多个函数输出多行: + +```python +In [160]: tsdf.agg(['sum', 'mean']) +Out[160]: + A B C +sum 3.033606 -1.803879 1.575510 +mean 0.505601 -0.300647 0.262585 +``` + +`Series` 聚合多函数返回结果还是 `Series`,索引为函数名: + +```python +In [161]: tsdf.A.agg(['sum', 'mean']) +Out[161]: +sum 3.033606 +mean 0.505601 +Name: A, dtype: float64 +``` +传递 `lambda` 函数时,输出名为 `` 的行: + +```python +In [162]: tsdf.A.agg(['sum', lambda x: x.mean()]) +Out[162]: +sum 3.033606 + 0.505601 +Name: A, dtype: float64 +``` + +应用自定义函数时,该函数名为输出结果的行名: + +```python +In [163]: def mymean(x): + .....: return x.mean() + .....: + +In [164]: tsdf.A.agg(['sum', mymean]) +Out[164]: +sum 3.033606 +mymean 0.505601 +Name: A, dtype: float64 +``` + +### 用字典实现聚合 + +指定为哪些列应用哪些聚合函数时,需要把包含列名与标量(或标量列表)的字典传递给 `DataFrame.agg`。 + +注意:这里输出结果的顺序不是固定的,要想让输出顺序与输入顺序一致,请使用 `OrderedDict`。 + +```python +In [165]: tsdf.agg({'A': 'mean', 'B': 'sum'}) +Out[165]: +A 0.505601 +B -1.803879 +dtype: float64 +``` +输入的参数是列表时,输出结果为 `DataFrame`,并以矩阵形式显示所有聚合函数的计算结果,且输出结果由所有唯一函数组成。未执行聚合操作的列输出结果为 `NaN` 值: + +```python +In [166]: tsdf.agg({'A': ['mean', 'min'], 'B': 'sum'}) +Out[166]: + A B +mean 0.505601 NaN +min -0.749892 NaN +sum NaN -1.803879 +``` + +### 多种数据类型(Dtype) + +与 `groupby` 的 `.agg` 操作类似,DataFrame 含不能执行聚合的数据类型时,`.agg` 只计算可聚合的列: + +```python +In [167]: mdf = pd.DataFrame({'A': [1, 2, 3], + .....: 'B': [1., 2., 3.], + .....: 'C': ['foo', 'bar', 'baz'], + .....: 'D': pd.date_range('20130101', periods=3)}) + .....: + +In [168]: mdf.dtypes +Out[168]: +A int64 +B float64 +C object +D datetime64[ns] +dtype: object +``` + +```python +In [169]: mdf.agg(['min', 'sum']) +Out[169]: + A B C D +min 1 1.0 bar 2013-01-01 +sum 6 6.0 foobarbaz NaT +``` + +### 自定义 Describe + +`.agg()` 可以创建类似于内置 [describe 函数](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-describe) 的自定义 describe 函数。 + +```python +In [170]: from functools import partial + +In [171]: q_25 = partial(pd.Series.quantile, q=0.25) + +In [172]: q_25.__name__ = '25%' + +In [173]: q_75 = partial(pd.Series.quantile, q=0.75) + +In [174]: q_75.__name__ = '75%' + +In [175]: tsdf.agg(['count', 'mean', 'std', 'min', q_25, 'median', q_75, 'max']) +Out[175]: + A B C +count 6.000000 6.000000 6.000000 +mean 0.505601 -0.300647 0.262585 +std 1.103362 0.887508 0.606860 +min -0.749892 -1.333363 -0.757304 +25% -0.239885 -0.979600 0.128907 +median 0.303398 -0.278111 0.225365 +75% 1.146791 0.151678 0.722709 +max 2.169758 1.004194 0.896839 +``` +### Transform API + +*0.20.0 版新增*。 + +[`transform()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html#pandas.DataFrame.transform "pandas.DataFrame.transform") 方法的返回结果与原始数据的索引相同,大小相同。与 `.agg` API 类似,该 API 支持同时处理多种操作,不用一个一个操作。 + +下面,先创建一个 DataFrame: + +```python +In [176]: tsdf = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], + .....: index=pd.date_range('1/1/2000', periods=10)) + .....: + +In [177]: tsdf.iloc[3:7] = np.nan + +In [178]: tsdf +Out[178]: + A B C +2000-01-01 -0.428759 -0.864890 -0.675341 +2000-01-02 -0.168731 1.338144 -1.279321 +2000-01-03 -1.621034 0.438107 0.903794 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 0.254374 -1.240447 -0.201052 +2000-01-09 -0.157795 0.791197 -1.144209 +2000-01-10 -0.030876 0.371900 0.061932 +``` + +这里转换的是整个 DataFrame。`.transform()` 支持 NumPy 函数、字符串函数及自定义函数。 + +```python +In [179]: tsdf.transform(np.abs) +Out[179]: + A B C +2000-01-01 0.428759 0.864890 0.675341 +2000-01-02 0.168731 1.338144 1.279321 +2000-01-03 1.621034 0.438107 0.903794 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 0.254374 1.240447 0.201052 +2000-01-09 0.157795 0.791197 1.144209 +2000-01-10 0.030876 0.371900 0.061932 + +In [180]: tsdf.transform('abs') +Out[180]: + A B C +2000-01-01 0.428759 0.864890 0.675341 +2000-01-02 0.168731 1.338144 1.279321 +2000-01-03 1.621034 0.438107 0.903794 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 0.254374 1.240447 0.201052 +2000-01-09 0.157795 0.791197 1.144209 +2000-01-10 0.030876 0.371900 0.061932 + +In [181]: tsdf.transform(lambda x: x.abs()) +Out[181]: + A B C +2000-01-01 0.428759 0.864890 0.675341 +2000-01-02 0.168731 1.338144 1.279321 +2000-01-03 1.621034 0.438107 0.903794 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 0.254374 1.240447 0.201052 +2000-01-09 0.157795 0.791197 1.144209 +2000-01-10 0.030876 0.371900 0.061932 +``` +这里的 [`transform()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html#pandas.DataFrame.transform "pandas.DataFrame.transform") 接受单个函数;与 ufunc 等效。 + +```python +In [182]: np.abs(tsdf) +Out[182]: + A B C +2000-01-01 0.428759 0.864890 0.675341 +2000-01-02 0.168731 1.338144 1.279321 +2000-01-03 1.621034 0.438107 0.903794 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 0.254374 1.240447 0.201052 +2000-01-09 0.157795 0.791197 1.144209 +2000-01-10 0.030876 0.371900 0.061932 +``` + + `.transform()` 向 `Series` 传递单个函数时,返回的结果也是单个 `Series`。 + +```python +In [183]: tsdf.A.transform(np.abs) +Out[183]: +2000-01-01 0.428759 +2000-01-02 0.168731 +2000-01-03 1.621034 +2000-01-04 NaN +2000-01-05 NaN +2000-01-06 NaN +2000-01-07 NaN +2000-01-08 0.254374 +2000-01-09 0.157795 +2000-01-10 0.030876 +Freq: D, Name: A, dtype: float64 +``` + +### 多函数 Transform + +`transform()` 调用多个函数时,生成多层索引 DataFrame。第一层是原始数据集的列名;第二层是 `transform()` 调用的函数名。 + +```python +In [184]: tsdf.transform([np.abs, lambda x: x + 1]) +Out[184]: + A B C + absolute absolute absolute +2000-01-01 0.428759 0.571241 0.864890 0.135110 0.675341 0.324659 +2000-01-02 0.168731 0.831269 1.338144 2.338144 1.279321 -0.279321 +2000-01-03 1.621034 -0.621034 0.438107 1.438107 0.903794 1.903794 +2000-01-04 NaN NaN NaN NaN NaN NaN +2000-01-05 NaN NaN NaN NaN NaN NaN +2000-01-06 NaN NaN NaN NaN NaN NaN +2000-01-07 NaN NaN NaN NaN NaN NaN +2000-01-08 0.254374 1.254374 1.240447 -0.240447 0.201052 0.798948 +2000-01-09 0.157795 0.842205 0.791197 1.791197 1.144209 -0.144209 +2000-01-10 0.030876 0.969124 0.371900 1.371900 0.061932 1.061932 +``` + +为 Series 应用多个函数时,输出结果是 DataFrame,列名是 `transform()` 调用的函数名。 + +```python +In [185]: tsdf.A.transform([np.abs, lambda x: x + 1]) +Out[185]: + absolute +2000-01-01 0.428759 0.571241 +2000-01-02 0.168731 0.831269 +2000-01-03 1.621034 -0.621034 +2000-01-04 NaN NaN +2000-01-05 NaN NaN +2000-01-06 NaN NaN +2000-01-07 NaN NaN +2000-01-08 0.254374 1.254374 +2000-01-09 0.157795 0.842205 +2000-01-10 0.030876 0.969124 +``` + +### 用字典执行 `transform` 操作 + +函数字典可以为每列执行指定 `transform()` 操作。 + +```python +In [186]: tsdf.transform({'A': np.abs, 'B': lambda x: x + 1}) +Out[186]: + A B +2000-01-01 0.428759 0.135110 +2000-01-02 0.168731 2.338144 +2000-01-03 1.621034 1.438107 +2000-01-04 NaN NaN +2000-01-05 NaN NaN +2000-01-06 NaN NaN +2000-01-07 NaN NaN +2000-01-08 0.254374 -0.240447 +2000-01-09 0.157795 1.791197 +2000-01-10 0.030876 1.371900 +``` + +`transform()` 的参数是列表字典时,生成的是以 `transform()` 调用的函数为名的多层索引 DataFrame。 + +```python +In [187]: tsdf.transform({'A': np.abs, 'B': [lambda x: x + 1, 'sqrt']}) +Out[187]: + A B + absolute sqrt +2000-01-01 0.428759 0.135110 NaN +2000-01-02 0.168731 2.338144 1.156782 +2000-01-03 1.621034 1.438107 0.661897 +2000-01-04 NaN NaN NaN +2000-01-05 NaN NaN NaN +2000-01-06 NaN NaN NaN +2000-01-07 NaN NaN NaN +2000-01-08 0.254374 -0.240447 NaN +2000-01-09 0.157795 1.791197 0.889493 +2000-01-10 0.030876 1.371900 0.609836 +``` + +### 元素级函数应用 + +并非所有函数都能矢量化,即接受 NumPy 数组,返回另一个数组或值,DataFrame 的 [`applymap()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap "pandas.DataFrame.applymap") 及 Series 的 [`map()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html#pandas.Series.map "pandas.Series.map") ,支持任何接收单个值并返回单个值的 Python 函数。 + +示例如下: + +```python +In [188]: df4 +Out[188]: + one two three +a 1.394981 1.772517 NaN +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [189]: def f(x): + .....: return len(str(x)) + .....: + +In [190]: df4['one'].map(f) +Out[190]: +a 18 +b 19 +c 18 +d 3 +Name: one, dtype: int64 + +In [191]: df4.applymap(f) +Out[191]: + one two three +a 18 17 3 +b 19 18 20 +c 18 18 16 +d 3 19 19 +``` + +[`Series.map()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html#pandas.Series.map "pandas.Series.map") 还有个功能,可以“连接”或“映射”第二个 Series 定义的值。这与 [merging / joining 功能](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging)联系非常紧密: + +```python +In [192]: s = pd.Series(['six', 'seven', 'six', 'seven', 'six'], + .....: index=['a', 'b', 'c', 'd', 'e']) + .....: + +In [193]: t = pd.Series({'six': 6., 'seven': 7.}) + +In [194]: s +Out[194]: +a six +b seven +c six +d seven +e six +dtype: object + +In [195]: s.map(t) +Out[195]: +a 6.0 +b 7.0 +c 6.0 +d 7.0 +e 6.0 +dtype: float64 +``` + +## 重置索引与更换标签 + +[`reindex()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html#pandas.Series.reindex "pandas.Series.reindex") 是 Pandas 里实现数据对齐的基本方法,该方法执行几乎所有功能都要用到的标签对齐功能。 `reindex` 指的是沿着指定轴,让数据与给定的一组标签进行匹配。该功能完成以下几项操作: + +* 让现有数据匹配一组新标签,并重新排序; +* 在无数据但有标签的位置插入缺失值(`NA`)标记; +* 如果指定,则按逻辑**填充**无标签的数据,该操作多见于时间序列数据。 + +示例如下: + +```python +In [196]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) + +In [197]: s +Out[197]: +a 1.695148 +b 1.328614 +c 1.234686 +d -0.385845 +e -1.326508 +dtype: float64 + +In [198]: s.reindex(['e', 'b', 'f', 'd']) +Out[198]: +e -1.326508 +b 1.328614 +f NaN +d -0.385845 +dtype: float64 +``` + +本例中,原 Series 里没有标签 `f` ,因此,输出结果里 `f` 对应的值为 `NaN`。 + +DataFrame 支持同时 `reindex` 索引与列: + +```python +In [199]: df +Out[199]: + one two three +a 1.394981 1.772517 NaN +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [200]: df.reindex(index=['c', 'f', 'b'], columns=['three', 'two', 'one']) +Out[200]: + three two one +c 1.227435 1.478369 0.695246 +f NaN NaN NaN +b -0.050390 1.912123 0.343054 +``` + +`reindex` 还支持 `axis` 关键字: + +```python +In [201]: df.reindex(['c', 'f', 'b'], axis='index') +Out[201]: + one two three +c 0.695246 1.478369 1.227435 +f NaN NaN NaN +b 0.343054 1.912123 -0.050390 +``` + +注意:不同对象可以**共享** `Index` 包含的轴标签。比如,有一个 Series,还有一个 DataFrame,可以执行下列操作: + +```python +In [202]: rs = s.reindex(df.index) + +In [203]: rs +Out[203]: +a 1.695148 +b 1.328614 +c 1.234686 +d -0.385845 +dtype: float64 + +In [204]: rs.index is df.index +Out[204]: True +``` + +这里指的是,重置后,Series 的索引与 DataFrame 的索引是同一个 Python 对象。 + +*0.21.0 版新增*。 + +[`DataFrame.reindex()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex "pandas.DataFrame.reindex") 还支持 “轴样式”调用习语,可以指定单个 `labels` 参数,并指定应用于哪个 `axis`。 + +```python +In [205]: df.reindex(['c', 'f', 'b'], axis='index') +Out[205]: + one two three +c 0.695246 1.478369 1.227435 +f NaN NaN NaN +b 0.343054 1.912123 -0.050390 + +In [206]: df.reindex(['three', 'two', 'one'], axis='columns') +Out[206]: + three two one +a NaN 1.772517 1.394981 +b -0.050390 1.912123 0.343054 +c 1.227435 1.478369 0.695246 +d -0.613172 0.279344 NaN +``` +::: tip 注意 + +[多层索引与高级索引](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced)介绍了怎样用更简洁的方式重置索引。 + +::: + +::: tip 注意 + +编写注重性能的代码时,最好花些时间深入理解 `reindex`:**预对齐数据后,操作会更快**。两个未对齐的 DataFrame 相加,后台操作会执行 `reindex`。探索性分析时很难注意到这点有什么不同,这是因为 `reindex` 已经进行了高度优化,但需要注重 CPU 周期时,显式调用 `reindex` 还是有一些影响的。 + +::: + +### 重置索引,并与其它对象对齐 + +提取一个对象,并用另一个具有相同标签的对象 `reindex` 该对象的轴。这种操作的语法虽然简单,但未免有些啰嗦。这时,最好用 [`reindex_like()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex_like.html#pandas.DataFrame.reindex_like "pandas.DataFrame.reindex_like") 方法,这是一种既有效,又简单的方式: + +```python +In [207]: df2 +Out[207]: + one two +a 1.394981 1.772517 +b 0.343054 1.912123 +c 0.695246 1.478369 + +In [208]: df3 +Out[208]: + one two +a 0.583888 0.051514 +b -0.468040 0.191120 +c -0.115848 -0.242634 + +In [209]: df.reindex_like(df2) +Out[209]: + one two +a 1.394981 1.772517 +b 0.343054 1.912123 +c 0.695246 1.478369 +``` + +### 用 `align` 对齐多个对象 + +[`align()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.align.html#pandas.Series.align "pandas.Series.align") 方法是对齐两个对象最快的方式,该方法支持 `join` 参数(请参阅 [joining 与 merging](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging)): + +* `join='outer'`:使用两个对象索引的合集,默认值 +* `join='left'`:使用左侧调用对象的索引 +* `join='right'`:使用右侧传递对象的索引 +* `join='inner'`:使用两个对象索引的交集 + +该方法返回重置索引后的两个 Series 元组: + +```python +In [210]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) + +In [211]: s1 = s[:4] + +In [212]: s2 = s[1:] + +In [213]: s1.align(s2) +Out[213]: +(a -0.186646 + b -1.692424 + c -0.303893 + d -1.425662 + e NaN + dtype: float64, a NaN + b -1.692424 + c -0.303893 + d -1.425662 + e 1.114285 + dtype: float64) + +In [214]: s1.align(s2, join='inner') +Out[214]: +(b -1.692424 + c -0.303893 + d -1.425662 + dtype: float64, b -1.692424 + c -0.303893 + d -1.425662 + dtype: float64) + +In [215]: s1.align(s2, join='left') +Out[215]: +(a -0.186646 + b -1.692424 + c -0.303893 + d -1.425662 + dtype: float64, a NaN + b -1.692424 + c -0.303893 + d -1.425662 + dtype: float64) +``` + +默认条件下, `join` 方法既应用于索引,也应用于列: + +```python +In [216]: df.align(df2, join='inner') +Out[216]: +( one two + a 1.394981 1.772517 + b 0.343054 1.912123 + c 0.695246 1.478369, one two + a 1.394981 1.772517 + b 0.343054 1.912123 + c 0.695246 1.478369) +``` + +`align` 方法还支持 `axis` 选项,用来指定要对齐的轴: + +```python +In [217]: df.align(df2, join='inner', axis=0) +Out[217]: +( one two three + a 1.394981 1.772517 NaN + b 0.343054 1.912123 -0.050390 + c 0.695246 1.478369 1.227435, one two + a 1.394981 1.772517 + b 0.343054 1.912123 + c 0.695246 1.478369) +``` + +如果把 Series 传递给 [`DataFrame.align()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.align.html#pandas.DataFrame.align "pandas.DataFrame.align"),可以用 `axis` 参数选择是在 DataFrame 的索引,还是列上对齐两个对象: + + +```python +In [218]: df.align(df2.iloc[0], axis=1) +Out[218]: +( one three two + a 1.394981 NaN 1.772517 + b 0.343054 -0.050390 1.912123 + c 0.695246 1.227435 1.478369 + d NaN -0.613172 0.279344, one 1.394981 + three NaN + two 1.772517 + Name: a, dtype: float64) +``` + +| 方法 | 动作 | +| :--------------- | :----------------- | +| pad / ffill | 先前填充 | +| bfill / backfill | 向后填充 | +| nearest | 从最近的索引值填充 | + +下面用一个简单的 Series 展示 `fill` 方法: + +```python +In [219]: rng = pd.date_range('1/3/2000', periods=8) + +In [220]: ts = pd.Series(np.random.randn(8), index=rng) + +In [221]: ts2 = ts[[0, 3, 6]] + +In [222]: ts +Out[222]: +2000-01-03 0.183051 +2000-01-04 0.400528 +2000-01-05 -0.015083 +2000-01-06 2.395489 +2000-01-07 1.414806 +2000-01-08 0.118428 +2000-01-09 0.733639 +2000-01-10 -0.936077 +Freq: D, dtype: float64 + +In [223]: ts2 +Out[223]: +2000-01-03 0.183051 +2000-01-06 2.395489 +2000-01-09 0.733639 +dtype: float64 + +In [224]: ts2.reindex(ts.index) +Out[224]: +2000-01-03 0.183051 +2000-01-04 NaN +2000-01-05 NaN +2000-01-06 2.395489 +2000-01-07 NaN +2000-01-08 NaN +2000-01-09 0.733639 +2000-01-10 NaN +Freq: D, dtype: float64 + +In [225]: ts2.reindex(ts.index, method='ffill') +Out[225]: +2000-01-03 0.183051 +2000-01-04 0.183051 +2000-01-05 0.183051 +2000-01-06 2.395489 +2000-01-07 2.395489 +2000-01-08 2.395489 +2000-01-09 0.733639 +2000-01-10 0.733639 +Freq: D, dtype: float64 + +In [226]: ts2.reindex(ts.index, method='bfill') +Out[226]: +2000-01-03 0.183051 +2000-01-04 2.395489 +2000-01-05 2.395489 +2000-01-06 2.395489 +2000-01-07 0.733639 +2000-01-08 0.733639 +2000-01-09 0.733639 +2000-01-10 NaN +Freq: D, dtype: float64 + +In [227]: ts2.reindex(ts.index, method='nearest') +Out[227]: +2000-01-03 0.183051 +2000-01-04 0.183051 +2000-01-05 2.395489 +2000-01-06 2.395489 +2000-01-07 2.395489 +2000-01-08 0.733639 +2000-01-09 0.733639 +2000-01-10 0.733639 +Freq: D, dtype: float64 +``` + +上述操作要求索引按递增或递减**排序**。 + +注意:除了 `method='nearest'`,用 [`fillna`](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-fillna) 或 [`interpolate`](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-interpolate) 也能实现同样的效果: + +```python +In [228]: ts2.reindex(ts.index).fillna(method='ffill') +Out[228]: +2000-01-03 0.183051 +2000-01-04 0.183051 +2000-01-05 0.183051 +2000-01-06 2.395489 +2000-01-07 2.395489 +2000-01-08 2.395489 +2000-01-09 0.733639 +2000-01-10 0.733639 +Freq: D, dtype: float64 +``` + +如果索引不是按递增或递减排序,[`reindex()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html#pandas.Series.reindex "pandas.Series.reindex") 会触发 ValueError 错误。[`fillna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.fillna.html#pandas.Series.fillna "pandas.Series.fillna") 与 [`interpolate()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.interpolate.html#pandas.Series.interpolate "pandas.Series.interpolate") 则不检查索引的排序。 + +### 重置索引填充的限制 + +`limit` 与 `tolerance` 参数可以控制 `reindex` 的填充操作。`limit` 限定了连续匹配的最大数量: + +```python +In [229]: ts2.reindex(ts.index, method='ffill', limit=1) +Out[229]: +2000-01-03 0.183051 +2000-01-04 0.183051 +2000-01-05 NaN +2000-01-06 2.395489 +2000-01-07 2.395489 +2000-01-08 NaN +2000-01-09 0.733639 +2000-01-10 0.733639 +Freq: D, dtype: float64 +``` + +反之,`tolerance` 限定了索引与索引器值之间的最大距离: + +```python +In [230]: ts2.reindex(ts.index, method='ffill', tolerance='1 day') +Out[230]: +2000-01-03 0.183051 +2000-01-04 0.183051 +2000-01-05 NaN +2000-01-06 2.395489 +2000-01-07 2.395489 +2000-01-08 NaN +2000-01-09 0.733639 +2000-01-10 0.733639 +Freq: D, dtype: float64 +``` + +注意:索引为 `DatetimeIndex`、`TimedeltaIndex` 或 `PeriodIndex` 时,`tolerance` 会尽可能将这些索引强制转换为 `Timedelta`,这里要求用户用恰当的字符串设定 `tolerance` 参数。 + +### 去掉轴上的标签 + +[`drop()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html#pandas.DataFrame.drop "pandas.DataFrame.drop") 函数与 `reindex` 经常配合使用,该函数用于删除轴上的一组标签: + +```python +In [231]: df +Out[231]: + one two three +a 1.394981 1.772517 NaN +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [232]: df.drop(['a', 'd'], axis=0) +Out[232]: + one two three +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 + +In [233]: df.drop(['one'], axis=1) +Out[233]: + two three +a 1.772517 NaN +b 1.912123 -0.050390 +c 1.478369 1.227435 +d 0.279344 -0.613172 +``` + +注意:下面的代码可以运行,但不够清晰: + +```python +In [234]: df.reindex(df.index.difference(['a', 'd'])) +Out[234]: + one two three +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +``` + +### 重命名或映射标签 + +[`rename()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html#pandas.DataFrame.rename "pandas.DataFrame.rename") 方法支持按不同的轴基于映射(字典或 Series)调整标签。 + +```python +In [235]: s +Out[235]: +a -0.186646 +b -1.692424 +c -0.303893 +d -1.425662 +e 1.114285 +dtype: float64 + +In [236]: s.rename(str.upper) +Out[236]: +A -0.186646 +B -1.692424 +C -0.303893 +D -1.425662 +E 1.114285 +dtype: float64 +``` + +如果调用的是函数,该函数在处理标签时,必须返回一个值,而且生成的必须是一组唯一值。此外,`rename()` 还可以调用字典或 Series。 + +```python +In [237]: df.rename(columns={'one': 'foo', 'two': 'bar'}, + .....: index={'a': 'apple', 'b': 'banana', 'd': 'durian'}) + .....: +Out[237]: + foo bar three +apple 1.394981 1.772517 NaN +banana 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +durian NaN 0.279344 -0.613172 +``` + +Pandas 不会重命名标签未包含在映射里的列或索引。注意,映射里多出的标签不会触发错误。 + +*0.21.0 版新增*。 + +[`DataFrame.rename()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html#pandas.DataFrame.rename "pandas.DataFrame.rename") 还支持“轴式”习语,用这种方式可以指定单个 `mapper`,及执行映射的 `axis`。 + +```python +In [238]: df.rename({'one': 'foo', 'two': 'bar'}, axis='columns') +Out[238]: + foo bar three +a 1.394981 1.772517 NaN +b 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +d NaN 0.279344 -0.613172 + +In [239]: df.rename({'a': 'apple', 'b': 'banana', 'd': 'durian'}, axis='index') +Out[239]: + one two three +apple 1.394981 1.772517 NaN +banana 0.343054 1.912123 -0.050390 +c 0.695246 1.478369 1.227435 +durian NaN 0.279344 -0.613172 +``` + +[`rename()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rename.html#pandas.Series.rename "pandas.Series.rename") 方法还提供了 `inplace` 命名参数,默认为 `False`,并会复制底层数据。`inplace=True` 时,会直接在原数据上重命名。 + +*0.18.0 版新增*。 + +[`rename()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rename.html#pandas.Series.rename "pandas.Series.rename") 还支持用标量或列表更改 `Series.name` 属性。 + +```python +In [240]: s.rename("scalar-name") +Out[240]: +a -0.186646 +b -1.692424 +c -0.303893 +d -1.425662 +e 1.114285 +Name: scalar-name, dtype: float64 +``` + +*0.24.0 版新增*。 + +[`rename_axis()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rename_axis.html#pandas.Series.rename_axis "pandas.Series.rename_axis") 方法支持指定 `多层索引` 名称,与标签相对应。 + +```python +In [241]: df = pd.DataFrame({'x': [1, 2, 3, 4, 5, 6], + .....: 'y': [10, 20, 30, 40, 50, 60]}, + .....: index=pd.MultiIndex.from_product([['a', 'b', 'c'], [1, 2]], + .....: names=['let', 'num'])) + .....: + +In [242]: df +Out[242]: + x y +let num +a 1 1 10 + 2 2 20 +b 1 3 30 + 2 4 40 +c 1 5 50 + 2 6 60 + +In [243]: df.rename_axis(index={'let': 'abc'}) +Out[243]: + x y +abc num +a 1 1 10 + 2 2 20 +b 1 3 30 + 2 4 40 +c 1 5 50 + 2 6 60 + +In [244]: df.rename_axis(index=str.upper) +Out[244]: + x y +LET NUM +a 1 1 10 + 2 2 20 +b 1 3 30 + 2 4 40 +c 1 5 50 + 2 6 60 +``` + +## 迭代 + +Pandas 对象基于类型进行迭代操作。Series 迭代时被视为数组,基础迭代生成值。DataFrame 则遵循字典式习语,用对象的 `key` 实现迭代操作。 + +简言之,基础迭代(`for i in object`)生成: + +* **Series** :值 +* **DataFrame**:列标签 + +例如,DataFrame 迭代时输出列名: + +```python +In [245]: df = pd.DataFrame({'col1': np.random.randn(3), + .....: 'col2': np.random.randn(3)}, index=['a', 'b', 'c']) + .....: + +In [246]: for col in df: + .....: print(col) + .....: +col1 +col2 +``` + +Pandas 对象还支持字典式的 [`items()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.items.html#pandas.DataFrame.items "pandas.DataFrame.items") 方法,通过键值对迭代。 + +用下列方法可以迭代 DataFrame 里的行: + +* [`iterrows()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows "pandas.DataFrame.iterrows"):把 DataFrame 里的行当作 (index, Series)对进行迭代。该操作把行转为 Series,同时改变数据类型,并对性能有影响。 + +* [`itertuples()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples "pandas.DataFrame.itertuples") 把 DataFrame 的行当作值的命名元组进行迭代。该操作比 [`iterrows()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows "pandas.DataFrame.iterrows") 快的多,建议尽量用这种方法迭代 DataFrame 的值。 + +::: danger 警告 + +Pandas 对象迭代的速度较慢。大部分情况下,没必要对行执行迭代操作,建议用以下几种替代方式: + +* 矢量化:很多操作可以用内置方法或 NumPy 函数,布尔索引…… +* 调用的函数不能在完整的 DataFrame / Series 上运行时,最好用 [`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply"),不要对值进行迭代操作。请参阅[函数应用](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-apply)文档。 +* 如果必须对值进行迭代,请务必注意代码的性能,建议在 cython 或 numba 环境下实现内循环。参阅[性能优化](https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html#enhancingperf)一节,查看这种操作方法的示例。 + +::: + +::: danger 警告 + +**永远不要修改**迭代的内容,这种方式不能确保所有操作都能正常运作。基于数据类型,迭代器返回的是复制(copy)的结果,不是视图(view),这种写入可能不会生效! + +下例中的赋值就不会生效: + +```python +In [247]: df = pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}) + +In [248]: for index, row in df.iterrows(): +.....: row['a'] = 10 +.....: + +In [249]: df +Out[249]: +a b +0 1 a +1 2 b +2 3 c +``` + +::: + +### 项目(items) +与字典型接口类似,[`items()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.items.html#pandas.DataFrame.items "pandas.DataFrame.items") 通过键值对进行迭代: + +* **Series**:(Index,标量值)对 +* **DataFrame**:(列,Series)对 + +示例如下: + +```python +In [250]: for label, ser in df.items(): + .....: print(label) + .....: print(ser) + .....: +a +0 1 +1 2 +2 3 +Name: a, dtype: int64 +b +0 a +1 b +2 c +Name: b, dtype: object +``` +### iterrows + +[`iterrows()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows "pandas.DataFrame.iterrows") 迭代 DataFrame 或 Series 里的每一行数据。这个操作返回一个迭代器,生成索引值及包含每行数据的 Series: + +```python +In [251]: for row_index, row in df.iterrows(): + .....: print(row_index, row, sep='\n') + .....: +0 +a 1 +b a +Name: 0, dtype: object +1 +a 2 +b b +Name: 1, dtype: object +2 +a 3 +b c +Name: 2, dtype: object +``` +::: tip 注意 + +[`iterrows()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows "pandas.DataFrame.iterrows") 返回的是 Series 里的每一行数据,该操作**不**保留每行数据的数据类型,因为数据类型是通过 DataFrame 的列界定的。 + +示例如下: + +```python +In [252]: df_orig = pd.DataFrame([[1, 1.5]], columns=['int', 'float']) + +In [253]: df_orig.dtypes +Out[253]: +int int64 +float float64 +dtype: object + +In [254]: row = next(df_orig.iterrows())[1] + +In [255]: row +Out[255]: +int 1.0 +float 1.5 +Name: 0, dtype: float64 +``` +`row` 里的值以 Series 形式返回,并被转换为浮点数,原始的整数值则在列 X: + +```python +In [256]: row['int'].dtype +Out[256]: dtype('float64') + +In [257]: df_orig['int'].dtype +Out[257]: dtype('int64') +``` +要想在行迭代时保存数据类型,最好用 [`itertuples()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples "pandas.DataFrame.itertuples"),这个函数返回值的命名元组,总的来说,该操作比 [`iterrows()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows "pandas.DataFrame.iterrows") 速度更快。 + +::: + +下例展示了怎样转置 DataFrame: + +```python +In [258]: df2 = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) + +In [259]: print(df2) + x y +0 1 4 +1 2 5 +2 3 6 + +In [260]: print(df2.T) + 0 1 2 +x 1 2 3 +y 4 5 6 + +In [261]: df2_t = pd.DataFrame({idx: values for idx, values in df2.iterrows()}) + +In [262]: print(df2_t) + 0 1 2 +x 1 2 3 +y 4 5 6 +``` + +### itertuples + +[`itertuples()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples "pandas.DataFrame.itertuples") 方法返回为 DataFrame 里每行数据生成命名元组的迭代器。该元组的第一个元素是行的索引值,其余的值则是行的值。 + +示例如下: + +```python +In [263]: for row in df.itertuples(): + .....: print(row) + .....: +Pandas(Index=0, a=1, b='a') +Pandas(Index=1, a=2, b='b') +Pandas(Index=2, a=3, b='c') +``` + +该方法不会把行转换为 Series,只是返回命名元组里的值。[`itertuples()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html#pandas.DataFrame.itertuples "pandas.DataFrame.itertuples") 保存值的数据类型,而且比 [`iterrows()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows "pandas.DataFrame.iterrows") 快。 + +::: tip 注意 + +包含无效 Python 识别符的列名、重复的列名及以下划线开头的列名,会被重命名为位置名称。如果列数较大,比如大于 255 列,则返回正则元组。 + +::: + +## .dt 访问器 + +`Series` 提供一个可以简单、快捷地返回 `datetime` 属性值的访问器。这个访问器返回的也是 Series,索引与现有的 Series 一样。 + +```python +# datetime +In [264]: s = pd.Series(pd.date_range('20130101 09:10:12', periods=4)) + +In [265]: s +Out[265]: +0 2013-01-01 09:10:12 +1 2013-01-02 09:10:12 +2 2013-01-03 09:10:12 +3 2013-01-04 09:10:12 +dtype: datetime64[ns] + +In [266]: s.dt.hour +Out[266]: +0 9 +1 9 +2 9 +3 9 +dtype: int64 + +In [267]: s.dt.second +Out[267]: +0 12 +1 12 +2 12 +3 12 +dtype: int64 + +In [268]: s.dt.day +Out[268]: +0 1 +1 2 +2 3 +3 4 +dtype: int64 +``` + +用下列表达式进行筛选非常方便: + +```python +In [269]: s[s.dt.day == 2] +Out[269]: +1 2013-01-02 09:10:12 +dtype: datetime64[ns] +``` + +时区转换也很轻松: + +```python +In [270]: stz = s.dt.tz_localize('US/Eastern') + +In [271]: stz +Out[271]: +0 2013-01-01 09:10:12-05:00 +1 2013-01-02 09:10:12-05:00 +2 2013-01-03 09:10:12-05:00 +3 2013-01-04 09:10:12-05:00 +dtype: datetime64[ns, US/Eastern] + +In [272]: stz.dt.tz +Out[272]: +``` + +可以把这些操作连在一起: + +```python +In [273]: s.dt.tz_localize('UTC').dt.tz_convert('US/Eastern') +Out[273]: +0 2013-01-01 04:10:12-05:00 +1 2013-01-02 04:10:12-05:00 +2 2013-01-03 04:10:12-05:00 +3 2013-01-04 04:10:12-05:00 +dtype: datetime64[ns, US/Eastern] +``` + +还可以用 [`Series.dt.strftime()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.strftime.html#pandas.Series.dt.strftime "pandas.Series.dt.strftime") 把 `datetime` 的值当成字符串进行格式化,支持与标准 [`strftime()`](https://docs.python.org/3/library/datetime.html#datetime.datetime.strftime "(in Python v3.7)") 同样的格式。 + +```python +# DatetimeIndex +In [274]: s = pd.Series(pd.date_range('20130101', periods=4)) + +In [275]: s +Out[275]: +0 2013-01-01 +1 2013-01-02 +2 2013-01-03 +3 2013-01-04 +dtype: datetime64[ns] + +In [276]: s.dt.strftime('%Y/%m/%d') +Out[276]: +0 2013/01/01 +1 2013/01/02 +2 2013/01/03 +3 2013/01/04 +dtype: object +``` + +```python +# PeriodIndex +In [277]: s = pd.Series(pd.period_range('20130101', periods=4)) + +In [278]: s +Out[278]: +0 2013-01-01 +1 2013-01-02 +2 2013-01-03 +3 2013-01-04 +dtype: period[D] + +In [279]: s.dt.strftime('%Y/%m/%d') +Out[279]: +0 2013/01/01 +1 2013/01/02 +2 2013/01/03 +3 2013/01/04 +dtype: object +``` + +`.dt` 访问器还支持 `period` 与 `timedelta`。 + +```python +# period +In [280]: s = pd.Series(pd.period_range('20130101', periods=4, freq='D')) + +In [281]: s +Out[281]: +0 2013-01-01 +1 2013-01-02 +2 2013-01-03 +3 2013-01-04 +dtype: period[D] + +In [282]: s.dt.year +Out[282]: +0 2013 +1 2013 +2 2013 +3 2013 +dtype: int64 + +In [283]: s.dt.day +Out[283]: +0 1 +1 2 +2 3 +3 4 +dtype: int64 +``` + +```python +# timedelta +In [284]: s = pd.Series(pd.timedelta_range('1 day 00:00:05', periods=4, freq='s')) + +In [285]: s +Out[285]: +0 1 days 00:00:05 +1 1 days 00:00:06 +2 1 days 00:00:07 +3 1 days 00:00:08 +dtype: timedelta64[ns] + +In [286]: s.dt.days +Out[286]: +0 1 +1 1 +2 1 +3 1 +dtype: int64 + +In [287]: s.dt.seconds +Out[287]: +0 5 +1 6 +2 7 +3 8 +dtype: int64 + +In [288]: s.dt.components +Out[288]: + days hours minutes seconds milliseconds microseconds nanoseconds +0 1 0 0 5 0 0 0 +1 1 0 0 6 0 0 0 +2 1 0 0 7 0 0 0 +3 1 0 0 8 0 0 0 +``` + +::: tip 注意 + +用这个访问器处理不是 `datetime` 类型的值时,`Series.dt` 会触发 `TypeError` 错误。 + +::: + +## 矢量化字符串方法 + +Series 支持字符串处理方法,可以非常方便地操作数组里的每个元素。这些方法会自动排除缺失值与空值,这也许是其最重要的特性。这些方法通过 Series 的 `str` 属性访问,一般情况下,这些操作的名称与内置的字符串方法一致。示例如下: + +```python +In [289]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) + +In [290]: s.str.lower() +Out[290]: +0 a +1 b +2 c +3 aaba +4 baca +5 NaN +6 caba +7 dog +8 cat +dtype: object +``` + +这里还提供了强大的模式匹配方法,但工业注意,模式匹配方法默认使用[正则表达式](https://docs.python.org/3/library/re.html)。 + +参阅[矢量化字符串方法](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#text-string-methods),了解完整内容。 + +## 排序 + +Pandas 支持三种排序方式,按索引标签排序,按列里的值排序,按两种方式混合排序。 + +### 按索引排序 + +[`Series.sort_index()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sort_index.html#pandas.Series.sort_index "pandas.Series.sort_index") 与 [`DataFrame.sort_index()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html#pandas.DataFrame.sort_index "pandas.DataFrame.sort_index") 方法用于按索引层级对 Pandas 对象排序。 + +```python +In [291]: df = pd.DataFrame({ + .....: 'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']), + .....: 'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']), + .....: 'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])}) + .....: + +In [292]: unsorted_df = df.reindex(index=['a', 'd', 'c', 'b'], + .....: columns=['three', 'two', 'one']) + .....: + +In [293]: unsorted_df +Out[293]: + three two one +a NaN -1.152244 0.562973 +d -0.252916 -0.109597 NaN +c 1.273388 -0.167123 0.640382 +b -0.098217 0.009797 -1.299504 + +# DataFrame +In [294]: unsorted_df.sort_index() +Out[294]: + three two one +a NaN -1.152244 0.562973 +b -0.098217 0.009797 -1.299504 +c 1.273388 -0.167123 0.640382 +d -0.252916 -0.109597 NaN + +In [295]: unsorted_df.sort_index(ascending=False) +Out[295]: + three two one +d -0.252916 -0.109597 NaN +c 1.273388 -0.167123 0.640382 +b -0.098217 0.009797 -1.299504 +a NaN -1.152244 0.562973 + +In [296]: unsorted_df.sort_index(axis=1) +Out[296]: + one three two +a 0.562973 NaN -1.152244 +d NaN -0.252916 -0.109597 +c 0.640382 1.273388 -0.167123 +b -1.299504 -0.098217 0.009797 + +# Series +In [297]: unsorted_df['three'].sort_index() +Out[297]: +a NaN +b -0.098217 +c 1.273388 +d -0.252916 +Name: three, dtype: float64 +``` + +### 按值排序 + +[`Series.sort_values()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sort_values.html#pandas.Series.sort_values "pandas.Series.sort_values") 方法用于按值对 Series 排序。[`DataFrame.sort_values()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values "pandas.DataFrame.sort_values") 方法用于按行列的值对 DataFrame 排序。[`DataFrame.sort_values()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values "pandas.DataFrame.sort_values") 的可选参数 `by` 用于指定按哪列排序,该参数的值可以是一列或多列数据。 + +```python +In [298]: df1 = pd.DataFrame({'one': [2, 1, 1, 1], + .....: 'two': [1, 3, 2, 4], + .....: 'three': [5, 4, 3, 2]}) + .....: + +In [299]: df1.sort_values(by='two') +Out[299]: + one two three +0 2 1 5 +2 1 2 3 +1 1 3 4 +3 1 4 2 +``` + +参数 `by` 支持列名列表,示例如下: + +```python +In [300]: df1[['one', 'two', 'three']].sort_values(by=['one', 'two']) +Out[300]: + one two three +2 1 2 3 +1 1 3 4 +3 1 4 2 +0 2 1 5 +``` + +这些方法支持用 `na_position` 参数处理空值。 + +```python +In [301]: s[2] = np.nan + +In [302]: s.sort_values() +Out[302]: +0 A +3 Aaba +1 B +4 Baca +6 CABA +8 cat +7 dog +2 NaN +5 NaN +dtype: object + +In [303]: s.sort_values(na_position='first') +Out[303]: +2 NaN +5 NaN +0 A +3 Aaba +1 B +4 Baca +6 CABA +8 cat +7 dog +dtype: object +``` + +### 按索引与值排序 + +*0.23.0 版新增*。 + +通过参数 `by` 传递给 [`DataFrame.sort_values()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values "pandas.DataFrame.sort_values") 的字符串可以引用列或索引层名。 + +```python +# 创建 MultiIndex +In [304]: idx = pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('a', 2), + .....: ('b', 2), ('b', 1), ('b', 1)]) + .....: + +In [305]: idx.names = ['first', 'second'] + +# 创建 DataFrame +In [306]: df_multi = pd.DataFrame({'A': np.arange(6, 0, -1)}, + .....: index=idx) + .....: + +In [307]: df_multi +Out[307]: + A +first second +a 1 6 + 2 5 + 2 4 +b 2 3 + 1 2 + 1 1 +``` + +按 `second`(索引)与 `A`(列)排序。 + +```python +In [308]: df_multi.sort_values(by=['second', 'A']) +Out[308]: + A +first second +b 1 1 + 1 2 +a 1 6 +b 2 3 +a 2 4 + 2 5 +``` + +::: tip 注意 + +字符串、列名、索引层名重名时,会触发警告提示,并以列名为准。后期版本中,这种情况将会触发模糊错误。 + +::: + +### 搜索排序 + +Series 支持 [`searchsorted()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.searchsorted.html#pandas.Series.searchsorted "pandas.Series.searchsorted") 方法,这与[`numpy.ndarray.searchsorted()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.searchsorted.html#numpy.ndarray.searchsorted "(in NumPy v1.17)") 的操作方式类似。 + +```python +In [309]: ser = pd.Series([1, 2, 3]) + +In [310]: ser.searchsorted([0, 3]) +Out[310]: array([0, 2]) + +In [311]: ser.searchsorted([0, 4]) +Out[311]: array([0, 3]) + +In [312]: ser.searchsorted([1, 3], side='right') +Out[312]: array([1, 3]) + +In [313]: ser.searchsorted([1, 3], side='left') +Out[313]: array([0, 2]) + +In [314]: ser = pd.Series([3, 1, 2]) + +In [315]: ser.searchsorted([0, 3], sorter=np.argsort(ser)) +Out[315]: array([0, 2]) +``` + +### 最大值与最小值 + +Series 支持 [`nsmallest()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.nsmallest.html#pandas.Series.nsmallest "pandas.Series.nsmallest") 与 [`nlargest()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.nlargest.html#pandas.Series.nlargest "pandas.Series.nlargest") 方法,本方法返回 N 个最大或最小的值。对于数据量大的 `Series` 来说,该方法比先为整个 Series 排序,再调用 `head(n)` 这种方式的速度要快得多。 + +```python +In [316]: s = pd.Series(np.random.permutation(10)) + +In [317]: s +Out[317]: +0 2 +1 0 +2 3 +3 7 +4 1 +5 5 +6 9 +7 6 +8 8 +9 4 +dtype: int64 + +In [318]: s.sort_values() +Out[318]: +1 0 +4 1 +0 2 +2 3 +9 4 +5 5 +7 6 +3 7 +8 8 +6 9 +dtype: int64 + +In [319]: s.nsmallest(3) +Out[319]: +1 0 +4 1 +0 2 +dtype: int64 + +In [320]: s.nlargest(3) +Out[320]: +6 9 +8 8 +3 7 +dtype: int64 +``` + +`DataFrame` 也支持 `nlargest` 与 `nsmallest` 方法。 + +```python +In [321]: df = pd.DataFrame({'a': [-2, -1, 1, 10, 8, 11, -1], + .....: 'b': list('abdceff'), + .....: 'c': [1.0, 2.0, 4.0, 3.2, np.nan, 3.0, 4.0]}) + .....: + +In [322]: df.nlargest(3, 'a') +Out[322]: + a b c +5 11 f 3.0 +3 10 c 3.2 +4 8 e NaN + +In [323]: df.nlargest(5, ['a', 'c']) +Out[323]: + a b c +5 11 f 3.0 +3 10 c 3.2 +4 8 e NaN +2 1 d 4.0 +6 -1 f 4.0 + +In [324]: df.nsmallest(3, 'a') +Out[324]: + a b c +0 -2 a 1.0 +1 -1 b 2.0 +6 -1 f 4.0 + +In [325]: df.nsmallest(5, ['a', 'c']) +Out[325]: + a b c +0 -2 a 1.0 +1 -1 b 2.0 +6 -1 f 4.0 +2 1 d 4.0 +4 8 e NaN +``` + +### 用多层索引的列排序 + +列为多层索引时,可以显式排序,用 `by` 指定所有层级。 + +```python +In [326]: df1.columns = pd.MultiIndex.from_tuples([('a', 'one'), + .....: ('a', 'two'), + .....: ('b', 'three')]) + .....: + +In [327]: df1.sort_values(by=('a', 'two')) +Out[327]: + a b + one two three +0 2 1 5 +2 1 2 3 +1 1 3 4 +3 1 4 2 +``` + +## 复制 + +在 Pandas 对象上执行 [`copy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html#pandas.DataFrame.copy "pandas.DataFrame.copy") 方法,将复制底层数据(但不包括轴索引,因为轴索引不可变),并返回一个新的对象。注意,**复制对象这种操作一般来说不是必须的**。比如说,以下几种方式可以***就地(inplace)*** 改变 DataFrame: + +* 插入、删除、修改列 +* 为 `index` 或 `columns` 属性赋值 +* 对于同质数据,用 `values` 属性或高级索引即可直接修改值 + +注意,用 Pandas 方法修改数据不会带来任何副作用,几乎所有方法都返回新的对象,不会修改原始数据对象。如果原始数据有所改动,唯一的可能就是用户显式指定了要修改原始数据。 + +## 数据类型 + +大多数情况下,Pandas 使用 NumPy 数组、Series 或 DataFrame 里某列的数据类型。NumPy 支持 `float`、`int`、`bool`、`timedelta[ns]`、`datetime64[ns]`,注意,NumPy 不支持带时区信息的 `datetime`。 + +Pandas 与第三方支持库扩充了 NumPy 类型系统,本节只介绍 Pandas 的内部扩展。如需了解如何编写与 Pandas 扩展类型,请参阅[扩展类型](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-extension-types),参阅[扩展数据类型](https://pandas.pydata.org/pandas-docs/stable/ecosystem.html#ecosystem-extensions)了解第三方支持库提供的扩展类型。 + +下表列出了 Pandas 扩展类型,参阅列出的文档内容,查看每种类型的详细说明。 + +| 数据种类 | 数据类型 | 标量 | 数组 | 文档 | +| :-----------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| tz-aware datetime | [`DatetimeTZDtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeTZDtype.html#pandas.DatetimeTZDtype) | [`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp) | [`arrays.DatetimeArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.arrays.DatetimeArray.html#pandas.arrays.DatetimeArray) | [Time zone handling](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-timezone) | +| Categorical | [`CategoricalDtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.CategoricalDtype.html#pandas.CategoricalDtype) | (无) | [`Categorical`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html#pandas.Categorical) | [Categorical data](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html#categorical) | +| period (time spans) | [`PeriodDtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.PeriodDtype.html#pandas.PeriodDtype) | [`Period`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Period.html#pandas.Period) | [`arrays.PeriodArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.arrays.PeriodArray.html#pandas.arrays.PeriodArray) | [Time span representation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-periods) | +| sparse | [`SparseDtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.SparseDtype.html#pandas.SparseDtype) | (无) | `arrays.SparseArray` | [Sparse data structures](https://pandas.pydata.org/pandas-docs/stable/user_guide/sparse.html#sparse) | +| intervals | [`IntervalDtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalDtype.html#pandas.IntervalDtype) | [`Interval`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Interval.html#pandas.Interval) | [`arrays.IntervalArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.arrays.IntervalArray.html#pandas.arrays.IntervalArray) | [IntervalIndex](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-intervalindex) | +| nullable integer | [`Int64Dtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Int64Dtype.html#pandas.Int64Dtype), … | (无) | [`arrays.IntegerArray`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.arrays.IntegerArray.html#pandas.arrays.IntegerArray) | [Nullable integer data type](https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html#integer-na) | + +Pandas 用 `object` 存储字符串。 + +虽然, `object` 数据类型能够存储任何对象,但应尽量避免这种操作,要了解与其它支持库与方法的性能与交互操作,参阅 [对象转换](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-object-conversion)。 + +DataFrame 的 [`dtypes`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html#pandas.DataFrame.dtypes "pandas.DataFrame.dtypes") 属性用起来很方便,以 Series 形式返回每列的数据类型。 + +```python +In [328]: dft = pd.DataFrame({'A': np.random.rand(3), + .....: 'B': 1, + .....: 'C': 'foo', + .....: 'D': pd.Timestamp('20010102'), + .....: 'E': pd.Series([1.0] * 3).astype('float32'), + .....: 'F': False, + .....: 'G': pd.Series([1] * 3, dtype='int8')}) + .....: + +In [329]: dft +Out[329]: + A B C D E F G +0 0.035962 1 foo 2001-01-02 1.0 False 1 +1 0.701379 1 foo 2001-01-02 1.0 False 1 +2 0.281885 1 foo 2001-01-02 1.0 False 1 + +In [330]: dft.dtypes +Out[330]: +A float64 +B int64 +C object +D datetime64[ns] +E float32 +F bool +G int8 +dtype: object +``` + +要查看 `Series` 的数据类型,用 [`dtype`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dtype.html#pandas.Series.dtype "pandas.Series.dtype") 属性。 + +```python +In [331]: dft['A'].dtype +Out[331]: dtype('float64') +``` + +Pandas 对象单列中含多种类型的数据时,该列的数据类型为可适配于各类数据的数据类型,通常为 `object`。 + +```python +# 整数被强制转换为浮点数 +In [332]: pd.Series([1, 2, 3, 4, 5, 6.]) +Out[332]: +0 1.0 +1 2.0 +2 3.0 +3 4.0 +4 5.0 +5 6.0 +dtype: float64 + +# 字符串数据决定了该 Series 的数据类型为 ``object`` +In [333]: pd.Series([1, 2, 3, 6., 'foo']) +Out[333]: +0 1 +1 2 +2 3 +3 6 +4 foo +dtype: object +``` + +`DataFrame.dtypes.value_counts()` 用于统计 DataFrame 里不同数据类型的列数。 + +```python +In [334]: dft.dtypes.value_counts() +Out[334]: +float32 1 +object 1 +bool 1 +int8 1 +float64 1 +datetime64[ns] 1 +int64 1 +dtype: int64 +``` + +多种数值型数据类型可以在 DataFrame 里共存。如果只传递一种数据类型,不论是通过 `dtype` 关键字直接传递,还是通过 `ndarray` 或 `Series` 传递,都会保存至 DataFrame 操作。此外,不同数值型数据类型**不会**合并。示例如下: + +```python +In [335]: df1 = pd.DataFrame(np.random.randn(8, 1), columns=['A'], dtype='float32') + +In [336]: df1 +Out[336]: + A +0 0.224364 +1 1.890546 +2 0.182879 +3 0.787847 +4 -0.188449 +5 0.667715 +6 -0.011736 +7 -0.399073 + +In [337]: df1.dtypes +Out[337]: +A float32 +dtype: object + +In [338]: df2 = pd.DataFrame({'A': pd.Series(np.random.randn(8), dtype='float16'), + .....: 'B': pd.Series(np.random.randn(8)), + .....: 'C': pd.Series(np.array(np.random.randn(8), + .....: dtype='uint8'))}) + .....: + +In [339]: df2 +Out[339]: + A B C +0 0.823242 0.256090 0 +1 1.607422 1.426469 0 +2 -0.333740 -0.416203 255 +3 -0.063477 1.139976 0 +4 -1.014648 -1.193477 0 +5 0.678711 0.096706 0 +6 -0.040863 -1.956850 1 +7 -0.357422 -0.714337 0 + +In [340]: df2.dtypes +Out[340]: +A float16 +B float64 +C uint8 +dtype: object +``` + +### 默认值 + +整数的默认类型为 `int64`,浮点数的默认类型为 `float64`,这里的默认值与系统平台无关,不管是 32 位系统,还是 64 位系统都是一样的。下列代码返回的结果都是 `int64`: + +```python +In [341]: pd.DataFrame([1, 2], columns=['a']).dtypes +Out[341]: +a int64 +dtype: object + +In [342]: pd.DataFrame({'a': [1, 2]}).dtypes +Out[342]: +a int64 +dtype: object + +In [343]: pd.DataFrame({'a': 1}, index=list(range(2))).dtypes +Out[343]: +a int64 +dtype: object +``` + +注意,NumPy 创建数组时,会根据系统选择类型。下列代码在 32 位系统上**将**返回 `int32`。 + +```python +In [344]: frame = pd.DataFrame(np.array([1, 2])) +``` + +### 向上转型 + +与其它类型合并时,用的是向上转型,指的是从现有类型转换为另一种类型,如`int` 变为 `float`。 + +```python +In [345]: df3 = df1.reindex_like(df2).fillna(value=0.0) + df2 + +In [346]: df3 +Out[346]: + A B C +0 1.047606 0.256090 0.0 +1 3.497968 1.426469 0.0 +2 -0.150862 -0.416203 255.0 +3 0.724370 1.139976 0.0 +4 -1.203098 -1.193477 0.0 +5 1.346426 0.096706 0.0 +6 -0.052599 -1.956850 1.0 +7 -0.756495 -0.714337 0.0 + +In [347]: df3.dtypes +Out[347]: +A float32 +B float64 +C float64 +dtype: object +``` + +[`DataFrame.to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy "pandas.DataFrame.to_numpy") 返回多个数据类型里**用得最多的数据类型**,这里指的是,输出结果的数据类型,适用于所有同构 NumPy 数组的数据类型。此处强制执行**向上转型**。 + +```python +In [348]: df3.to_numpy().dtype +Out[348]: dtype('float64') +``` + +### astype + +[`astype()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype "pandas.DataFrame.astype") 方法显式地把一种数据类型转换为另一种,默认操作为复制数据,就算数据类型没有改变也会复制数据,`copy=False` 改变默认操作模式。此外,`astype` 无效时,会触发异常。 + +向上转型一般都遵循 **NumPy** 规则。操作中含有两种不同类型的数据时,返回更为通用的那种数据类型。 + +```python +In [349]: df3 +Out[349]: + A B C +0 1.047606 0.256090 0.0 +1 3.497968 1.426469 0.0 +2 -0.150862 -0.416203 255.0 +3 0.724370 1.139976 0.0 +4 -1.203098 -1.193477 0.0 +5 1.346426 0.096706 0.0 +6 -0.052599 -1.956850 1.0 +7 -0.756495 -0.714337 0.0 + +In [350]: df3.dtypes +Out[350]: +A float32 +B float64 +C float64 +dtype: object + +# 转换数据类型 +In [351]: df3.astype('float32').dtypes +Out[351]: +A float32 +B float32 +C float32 +dtype: object +``` + +用 [`astype()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype "pandas.DataFrame.astype") 把一列或多列转换为指定类型 。 + +```python +In [352]: dft = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) + +In [353]: dft[['a', 'b']] = dft[['a', 'b']].astype(np.uint8) + +In [354]: dft +Out[354]: + a b c +0 1 4 7 +1 2 5 8 +2 3 6 9 + +In [355]: dft.dtypes +Out[355]: +a uint8 +b uint8 +c int64 +dtype: object +``` +*0.19.0 版新增。* + +[`astype()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype "pandas.DataFrame.astype") 通过字典指定哪些列转换为哪些类型。 + +```python +In [356]: dft1 = pd.DataFrame({'a': [1, 0, 1], 'b': [4, 5, 6], 'c': [7, 8, 9]}) + +In [357]: dft1 = dft1.astype({'a': np.bool, 'c': np.float64}) + +In [358]: dft1 +Out[358]: + a b c +0 True 4 7.0 +1 False 5 8.0 +2 True 6 9.0 + +In [359]: dft1.dtypes +Out[359]: +a bool +b int64 +c float64 +dtype: object +``` + +::: tip 注意 + +用 [`astype()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype "pandas.DataFrame.astype") 与 [`loc()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc "pandas.DataFrame.loc") 为部分列转换指定类型时,会发生向上转型。 + +[`loc()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc "pandas.DataFrame.loc") 尝试分配当前的数据类型,而 `[]` 则会从右方获取数据类型并进行覆盖。因此,下列代码会产出意料之外的结果: + +```python +In [360]: dft = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]}) + +In [361]: dft.loc[:, ['a', 'b']].astype(np.uint8).dtypes +Out[361]: +a uint8 +b uint8 +dtype: object + +In [362]: dft.loc[:, ['a', 'b']] = dft.loc[:, ['a', 'b']].astype(np.uint8) + +In [363]: dft.dtypes +Out[363]: +a int64 +b int64 +c int64 +dtype: object +``` + +::: + +### 对象转换 + +Pandas 提供了多种函数可以把 `object` 从一种类型强制转为另一种类型。这是因为,数据有时存储的是正确类型,但在保存时却存成了 `object` 类型,此时,用 [`DataFrame.infer_objects()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.infer_objects.html#pandas.DataFrame.infer_objects "pandas.DataFrame.infer_objects") 与 [`Series.infer_objects()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.infer_objects.html#pandas.Series.infer_objects "pandas.Series.infer_objects") 方法即可把数据**软**转换为正确的类型。 + +```python +In [364]: import datetime + +In [365]: df = pd.DataFrame([[1, 2], + .....: ['a', 'b'], + .....: [datetime.datetime(2016, 3, 2), + .....: datetime.datetime(2016, 3, 2)]]) + .....: + +In [366]: df = df.T + +In [367]: df +Out[367]: + 0 1 2 +0 1 a 2016-03-02 +1 2 b 2016-03-02 + +In [368]: df.dtypes +Out[368]: +0 object +1 object +2 datetime64[ns] +dtype: object +``` + +因为数据被转置,所以把原始列的数据类型改成了 `object`,但使用 `infer_objects` 后就变正确了。 + +```python +In [369]: df.infer_objects().dtypes +Out[369]: +0 int64 +1 object +2 datetime64[ns] +dtype: object +``` + +下列函数可以应用于一维数组与标量,执行硬转换,把对象转换为指定类型。 + +* [`to_numeric()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html#pandas.to_numeric "pandas.to_numeric"),转换为数值型 + + +```python +In [370]: m = ['1.1', 2, 3] + +In [371]: pd.to_numeric(m) +Out[371]: array([1.1, 2. , 3. ]) +``` +* [`to_datetime()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime "pandas.to_datetime"),转换为 `datetime` 对象 + +```python +In [372]: import datetime + +In [373]: m = ['2016-07-09', datetime.datetime(2016, 3, 2)] + +In [374]: pd.to_datetime(m) +Out[374]: DatetimeIndex(['2016-07-09', '2016-03-02'], dtype='datetime64[ns]', freq=None) +``` + +* [`to_timedelta()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_timedelta.html#pandas.to_timedelta "pandas.to_timedelta"),转换为 `timedelta` 对象。 + + +```python +In [375]: m = ['5us', pd.Timedelta('1day')] + +In [376]: pd.to_timedelta(m) +Out[376]: TimedeltaIndex(['0 days 00:00:00.000005', '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None) +``` + +如需强制转换,则要加入 `error` 参数,指定 Pandas 怎样处理不能转换为成预期类型或对象的数据。`errors` 参数的默认值为 `False`,指的是在转换过程中,遇到任何问题都触发错误。设置为 `errors='coerce'` 时,pandas 会忽略错误,强制把问题数据转换为 `pd.NaT`(`datetime` 与 `timedelta`),或 `np.nan`(数值型)。读取数据时,如果大部分要转换的数据是数值型或 `datetime`,这种操作非常有用,但偶尔也会有非制式数据混合在一起,可能会导致展示数据缺失: +```python +In [377]: import datetime + +In [378]: m = ['apple', datetime.datetime(2016, 3, 2)] + +In [379]: pd.to_datetime(m, errors='coerce') +Out[379]: DatetimeIndex(['NaT', '2016-03-02'], dtype='datetime64[ns]', freq=None) + +In [380]: m = ['apple', 2, 3] + +In [381]: pd.to_numeric(m, errors='coerce') +Out[381]: array([nan, 2., 3.]) + +In [382]: m = ['apple', pd.Timedelta('1day')] + +In [383]: pd.to_timedelta(m, errors='coerce') +Out[383]: TimedeltaIndex([NaT, '1 days'], dtype='timedelta64[ns]', freq=None) +``` + +`error` 参数还有第三个选项,`error='ignore'`。转换数据时会忽略错误,直接输出问题数据: + +```python +In [384]: import datetime + +In [385]: m = ['apple', datetime.datetime(2016, 3, 2)] + +In [386]: pd.to_datetime(m, errors='ignore') +Out[386]: Index(['apple', 2016-03-02 00:00:00], dtype='object') + +In [387]: m = ['apple', 2, 3] + +In [388]: pd.to_numeric(m, errors='ignore') +Out[388]: array(['apple', 2, 3], dtype=object) + +In [389]: m = ['apple', pd.Timedelta('1day')] + +In [390]: pd.to_timedelta(m, errors='ignore') +Out[390]: array(['apple', Timedelta('1 days 00:00:00')], dtype=object) +``` + +执行转换操作时,[`to_numeric()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html#pandas.to_numeric "pandas.to_numeric") 还有一个参数,`downcast`,即向下转型,可以把数值型转换为减少内存占用的数据类型: + +```python +In [391]: m = ['1', 2, 3] + +In [392]: pd.to_numeric(m, downcast='integer') # smallest signed int dtype +Out[392]: array([1, 2, 3], dtype=int8) + +In [393]: pd.to_numeric(m, downcast='signed') # same as 'integer' +Out[393]: array([1, 2, 3], dtype=int8) + +In [394]: pd.to_numeric(m, downcast='unsigned') # smallest unsigned int dtype +Out[394]: array([1, 2, 3], dtype=uint8) + +In [395]: pd.to_numeric(m, downcast='float') # smallest float dtype +Out[395]: array([1., 2., 3.], dtype=float32) +``` + +上述方法仅能应用于一维数组、列表或标量;不能直接用于 DataFrame 等多维对象。不过,用 [`apply()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply"),可以快速为每列应用函数: + +```python +In [396]: import datetime + +In [397]: df = pd.DataFrame([ + .....: ['2016-07-09', datetime.datetime(2016, 3, 2)]] * 2, dtype='O') + .....: + +In [398]: df +Out[398]: + 0 1 +0 2016-07-09 2016-03-02 00:00:00 +1 2016-07-09 2016-03-02 00:00:00 + +In [399]: df.apply(pd.to_datetime) +Out[399]: + 0 1 +0 2016-07-09 2016-03-02 +1 2016-07-09 2016-03-02 + +In [400]: df = pd.DataFrame([['1.1', 2, 3]] * 2, dtype='O') + +In [401]: df +Out[401]: + 0 1 2 +0 1.1 2 3 +1 1.1 2 3 + +In [402]: df.apply(pd.to_numeric) +Out[402]: + 0 1 2 +0 1.1 2 3 +1 1.1 2 3 + +In [403]: df = pd.DataFrame([['5us', pd.Timedelta('1day')]] * 2, dtype='O') + +In [404]: df +Out[404]: + 0 1 +0 5us 1 days 00:00:00 +1 5us 1 days 00:00:00 + +In [405]: df.apply(pd.to_timedelta) +Out[405]: + 0 1 +0 00:00:00.000005 1 days +1 00:00:00.000005 1 days +``` + +### 各种坑 + +对 `integer` 数据执行选择操作时,可以很轻而易举地把数据转换为 `floating` 。Pandas 会保存输入数据的数据类型,以防未引入 `nans` 的情况。参阅 [对整数 NA 空值的支持](https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#gotchas-intna)。 + +```python +In [406]: dfi = df3.astype('int32') + +In [407]: dfi['E'] = 1 + +In [408]: dfi +Out[408]: + A B C E +0 1 0 0 1 +1 3 1 0 1 +2 0 0 255 1 +3 0 1 0 1 +4 -1 -1 0 1 +5 1 0 0 1 +6 0 -1 1 1 +7 0 0 0 1 + +In [409]: dfi.dtypes +Out[409]: +A int32 +B int32 +C int32 +E int64 +dtype: object + +In [410]: casted = dfi[dfi > 0] + +In [411]: casted +Out[411]: + A B C E +0 1.0 NaN NaN 1 +1 3.0 1.0 NaN 1 +2 NaN NaN 255.0 1 +3 NaN 1.0 NaN 1 +4 NaN NaN NaN 1 +5 1.0 NaN NaN 1 +6 NaN NaN 1.0 1 +7 NaN NaN NaN 1 + +In [412]: casted.dtypes +Out[412]: +A float64 +B float64 +C float64 +E int64 +dtype: object +``` + +浮点数类型未改变。 + +```python +In [413]: dfa = df3.copy() + +In [414]: dfa['A'] = dfa['A'].astype('float32') + +In [415]: dfa.dtypes +Out[415]: +A float32 +B float64 +C float64 +dtype: object + +In [416]: casted = dfa[df2 > 0] + +In [417]: casted +Out[417]: + A B C +0 1.047606 0.256090 NaN +1 3.497968 1.426469 NaN +2 NaN NaN 255.0 +3 NaN 1.139976 NaN +4 NaN NaN NaN +5 1.346426 0.096706 NaN +6 NaN NaN 1.0 +7 NaN NaN NaN + +In [418]: casted.dtypes +Out[418]: +A float32 +B float64 +C float64 +dtype: object +``` + +## 基于 `dtype` 选择列 + +[`select_dtypes()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes "pandas.DataFrame.select_dtypes") 方法基于 `dtype` 选择列。 + +首先,创建一个由多种数据类型组成的 DataFrame: + + +```python +In [419]: df = pd.DataFrame({'string': list('abc'), + .....: 'int64': list(range(1, 4)), + .....: 'uint8': np.arange(3, 6).astype('u1'), + .....: 'float64': np.arange(4.0, 7.0), + .....: 'bool1': [True, False, True], + .....: 'bool2': [False, True, False], + .....: 'dates': pd.date_range('now', periods=3), + .....: 'category': pd.Series(list("ABC")).astype('category')}) + .....: + +In [420]: df['tdeltas'] = df.dates.diff() + +In [421]: df['uint64'] = np.arange(3, 6).astype('u8') + +In [422]: df['other_dates'] = pd.date_range('20130101', periods=3) + +In [423]: df['tz_aware_dates'] = pd.date_range('20130101', periods=3, tz='US/Eastern') + +In [424]: df +Out[424]: + string int64 uint8 float64 bool1 bool2 dates category tdeltas uint64 other_dates tz_aware_dates +0 a 1 3 4.0 True False 2019-08-22 15:49:01.870038 A NaT 3 2013-01-01 2013-01-01 00:00:00-05:00 +1 b 2 4 5.0 False True 2019-08-23 15:49:01.870038 B 1 days 4 2013-01-02 2013-01-02 00:00:00-05:00 +2 c 3 5 6.0 True False 2019-08-24 15:49:01.870038 C 1 days 5 2013-01-03 2013-01-03 00:00:00-05:00 +``` + +该 DataFrame 的数据类型: + +```python +In [425]: df.dtypes +Out[425]: +string object +int64 int64 +uint8 uint8 +float64 float64 +bool1 bool +bool2 bool +dates datetime64[ns] +category category +tdeltas timedelta64[ns] +uint64 uint64 +other_dates datetime64[ns] +tz_aware_dates datetime64[ns, US/Eastern] +dtype: object +``` + +[`select_dtypes()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes "pandas.DataFrame.select_dtypes") 有两个参数,`include` 与 `exclude`,用于实现“提取这些数据类型的列” (`include`)或 “提取不是这些数据类型的列”(`exclude`)。 + +选择 `bool` 型的列,示例如下: +```python +In [426]: df.select_dtypes(include=[bool]) +Out[426]: + bool1 bool2 +0 True False +1 False True +2 True False +``` + +该方法还支持输入 [NumPy 数据类型](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html)的名称: + +```python +In [427]: df.select_dtypes(include=['bool']) +Out[427]: + bool1 bool2 +0 True False +1 False True +2 True False +``` +[`select_dtypes()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html#pandas.DataFrame.select_dtypes "pandas.DataFrame.select_dtypes") 还支持通用数据类型。 + +比如,选择所有数值型与布尔型的列,同时,排除无符号整数: + +```python +In [428]: df.select_dtypes(include=['number', 'bool'], exclude=['unsignedinteger']) +Out[428]: + int64 float64 bool1 bool2 tdeltas +0 1 4.0 True False NaT +1 2 5.0 False True 1 days +2 3 6.0 True False 1 days +``` + +选择字符串型的列必须要用 `object`: + +```python +In [429]: df.select_dtypes(include=['object']) +Out[429]: + string +0 a +1 b +2 c +``` + +要查看 `numpy.number` 等通用 `dtype` 的所有子类型,可以定义一个函数,返回子类型树: + +```python +In [430]: def subdtypes(dtype): + .....: subs = dtype.__subclasses__() + .....: if not subs: + .....: return dtype + .....: return [dtype, [subdtypes(dt) for dt in subs]] + .....: +``` + +所有 NumPy 数据类型都是 `numpy.generic` 的子类: + +```python +In [431]: subdtypes(np.generic) +Out[431]: +[numpy.generic, + [[numpy.number, + [[numpy.integer, + [[numpy.signedinteger, + [numpy.int8, + numpy.int16, + numpy.int32, + numpy.int64, + numpy.int64, + numpy.timedelta64]], + [numpy.unsignedinteger, + [numpy.uint8, + numpy.uint16, + numpy.uint32, + numpy.uint64, + numpy.uint64]]]], + [numpy.inexact, + [[numpy.floating, + [numpy.float16, numpy.float32, numpy.float64, numpy.float128]], + [numpy.complexfloating, + [numpy.complex64, numpy.complex128, numpy.complex256]]]]]], + [numpy.flexible, + [[numpy.character, [numpy.bytes_, numpy.str_]], + [numpy.void, [numpy.record]]]], + numpy.bool_, + numpy.datetime64, + numpy.object_]] +``` +::: tip 注意 + +Pandas 支持 `category` 与 `datetime64[ns, tz]` 类型,但这两种类型未整合到 NumPy 架构,因此,上面的函数没有显示。 + +::: diff --git a/Python/pandas/getting_started/comparison.md b/Python/pandas/getting_started/comparison.md new file mode 100644 index 00000000..087f2c65 --- /dev/null +++ b/Python/pandas/getting_started/comparison.md @@ -0,0 +1,2962 @@ +# 与其他工具比较 + +## 与R/R库的比较 + +由于 ``pandas`` 旨在为人们提供可以替代[R](http://www.r-project.org/)的大量数据操作和分析的功能,因此本章节会提供较为详细的[R语言](http://en.wikipedia.org/wiki/R_(programming_language))的介绍以及与相关的许多第三方库的对比说明,比如我们的 ``pandas`` 库。在与R和CRAN库的比较中,我们关注以下事项: + +- **功能/灵活性**:每个工具可以/不​​可以做什么 +- **性能**:操作速度有多快。硬性数字/基准是优选的 +- **易于使用**:一种工具更容易/更难使用(您可能需要对此进行判断,并进行并排代码比较) + +此页面还为这些R包的用户提供了一些翻译指南。 + +要将 ``DataFrame`` 对象从 ``pandas`` 转化为到 R 的数据类型,有一个选择是采用HDF5文件,请参阅[外部兼容性](https://pandas.pydata.org/pandas-docs/stable/../user_guide/io.html#io-external-compatibility)示例。 + +### 快速参考 + +我们将从快速参考指南开始,将[dplyr](https://cran.r-project.org/package=dplyr)与pandas等效的一些常见R操作配对。 + +#### 查询、过滤、采样 + +R | Pandas +---|--- +dim(df) | df.shape +head(df) | df.head() +slice(df, 1:10) | df.iloc[:9] +filter(df, col1 == 1, col2 == 1) | df.query('col1 == 1 & col2 == 1') +df[df$col1 == 1 & df$col2 == 1,] | df[(df.col1 == 1) & (df.col2 == 1)] +select(df, col1, col2) | df[['col1', 'col2']] +select(df, col1:col3) | df.loc[:, 'col1':'col3'] +select(df, -(col1:col3)) | df.drop(cols_to_drop, axis=1)但是看[[1]](#select-range) +distinct(select(df, col1)) | df[['col1']].drop_duplicates() +distinct(select(df, col1, col2)) | df[['col1', 'col2']].drop_duplicates() +sample_n(df, 10) | df.sample(n=10) +sample_frac(df, 0.01) | df.sample(frac=0.01) + +::: tip Note + +R表示列的子集 ``(select(df,col1:col3)`` 的缩写更接近 Pandas 的写法,如果您有列的列表,例如 ``df[cols[1:3]`` 或 ``df.drop(cols[1:3])``,按列名执行此操作可能会引起混乱。 + +::: + +#### 排序 + +R | Pandas +---|--- +arrange(df, col1, col2) | df.sort_values(['col1', 'col2']) +arrange(df, desc(col1)) | df.sort_values('col1', ascending=False) + +#### 变换 + +R | Pandas +---|--- +select(df, col_one = col1) | df.rename(columns={'col1': 'col_one'})['col_one'] +rename(df, col_one = col1) | df.rename(columns={'col1': 'col_one'}) +mutate(df, c=a-b) | df.assign(c=df.a-df.b) + +#### 分组和组合 + +R | Pandas +---|--- +summary(df) | df.describe() +gdf <- group_by(df, col1) | gdf = df.groupby('col1') +summarise(gdf, avg=mean(col1, na.rm=TRUE)) | df.groupby('col1').agg({'col1': 'mean'}) +summarise(gdf, total=sum(col1)) | df.groupby('col1').sum() + +### 基本的R用法 + +#### 用R``c``方法来进行切片操作 + +R使您可以轻松地按名称访问列(``data.frame``) + +``` r +df <- data.frame(a=rnorm(5), b=rnorm(5), c=rnorm(5), d=rnorm(5), e=rnorm(5)) +df[, c("a", "c", "e")] +``` + +或整数位置 + +``` r +df <- data.frame(matrix(rnorm(1000), ncol=100)) +df[, c(1:10, 25:30, 40, 50:100)] +``` + +按名称选择多个``pandas``的列非常简单 + +``` python +In [1]: df = pd.DataFrame(np.random.randn(10, 3), columns=list('abc')) + +In [2]: df[['a', 'c']] +Out[2]: + a c +0 0.469112 -1.509059 +1 -1.135632 -0.173215 +2 0.119209 -0.861849 +3 -2.104569 1.071804 +4 0.721555 -1.039575 +5 0.271860 0.567020 +6 0.276232 -0.673690 +7 0.113648 0.524988 +8 0.404705 -1.715002 +9 -1.039268 -1.157892 + +In [3]: df.loc[:, ['a', 'c']] +Out[3]: + a c +0 0.469112 -1.509059 +1 -1.135632 -0.173215 +2 0.119209 -0.861849 +3 -2.104569 1.071804 +4 0.721555 -1.039575 +5 0.271860 0.567020 +6 0.276232 -0.673690 +7 0.113648 0.524988 +8 0.404705 -1.715002 +9 -1.039268 -1.157892 +``` + +通过整数位置选择多个不连续的列可以通过``iloc``索引器属性和 ``numpy.r_`` 的组合来实现。 + +``` python +In [4]: named = list('abcdefg') + +In [5]: n = 30 + +In [6]: columns = named + np.arange(len(named), n).tolist() + +In [7]: df = pd.DataFrame(np.random.randn(n, n), columns=columns) + +In [8]: df.iloc[:, np.r_[:10, 24:30]] +Out[8]: + a b c d e f g 7 8 9 24 25 26 27 28 29 +0 -1.344312 0.844885 1.075770 -0.109050 1.643563 -1.469388 0.357021 -0.674600 -1.776904 -0.968914 -1.170299 -0.226169 0.410835 0.813850 0.132003 -0.827317 +1 -0.076467 -1.187678 1.130127 -1.436737 -1.413681 1.607920 1.024180 0.569605 0.875906 -2.211372 0.959726 -1.110336 -0.619976 0.149748 -0.732339 0.687738 +2 0.176444 0.403310 -0.154951 0.301624 -2.179861 -1.369849 -0.954208 1.462696 -1.743161 -0.826591 0.084844 0.432390 1.519970 -0.493662 0.600178 0.274230 +3 0.132885 -0.023688 2.410179 1.450520 0.206053 -0.251905 -2.213588 1.063327 1.266143 0.299368 -2.484478 -0.281461 0.030711 0.109121 1.126203 -0.977349 +4 1.474071 -0.064034 -1.282782 0.781836 -1.071357 0.441153 2.353925 0.583787 0.221471 -0.744471 -1.197071 -1.066969 -0.303421 -0.858447 0.306996 -0.028665 +.. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... +25 1.492125 -0.068190 0.681456 1.221829 -0.434352 1.204815 -0.195612 1.251683 -1.040389 -0.796211 1.944517 0.042344 -0.307904 0.428572 0.880609 0.487645 +26 0.725238 0.624607 -0.141185 -0.143948 -0.328162 2.095086 -0.608888 -0.926422 1.872601 -2.513465 -0.846188 1.190624 0.778507 1.008500 1.424017 0.717110 +27 1.262419 1.950057 0.301038 -0.933858 0.814946 0.181439 -0.110015 -2.364638 -1.584814 0.307941 -1.341814 0.334281 -0.162227 1.007824 2.826008 1.458383 +28 -1.585746 -0.899734 0.921494 -0.211762 -0.059182 0.058308 0.915377 -0.696321 0.150664 -3.060395 0.403620 -0.026602 -0.240481 0.577223 -1.088417 0.326687 +29 -0.986248 0.169729 -1.158091 1.019673 0.646039 0.917399 -0.010435 0.366366 0.922729 0.869610 -1.209247 -0.671466 0.332872 -2.013086 -1.602549 0.333109 + +[30 rows x 16 columns] +``` + +#### ``aggregate`` + +在R中,您可能希望将数据分成几个子集,并计算每个子集的平均值。使用名为``df``的data.frame并将其分成组``by1``和``by2``: + +``` r +df <- data.frame( + v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9), + v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99), + by1 = c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12), + by2 = c("wet", "dry", 99, 95, NA, "damp", 95, 99, "red", 99, NA, NA)) +aggregate(x=df[, c("v1", "v2")], by=list(mydf2$by1, mydf2$by2), FUN = mean) +``` + +该[``groupby()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby)方法类似于基本R的 ``aggregate`` +函数。 + +``` python +In [9]: df = pd.DataFrame( + ...: {'v1': [1, 3, 5, 7, 8, 3, 5, np.nan, 4, 5, 7, 9], + ...: 'v2': [11, 33, 55, 77, 88, 33, 55, np.nan, 44, 55, 77, 99], + ...: 'by1': ["red", "blue", 1, 2, np.nan, "big", 1, 2, "red", 1, np.nan, 12], + ...: 'by2': ["wet", "dry", 99, 95, np.nan, "damp", 95, 99, "red", 99, np.nan, + ...: np.nan]}) + ...: + +In [10]: g = df.groupby(['by1', 'by2']) + +In [11]: g[['v1', 'v2']].mean() +Out[11]: + v1 v2 +by1 by2 +1 95 5.0 55.0 + 99 5.0 55.0 +2 95 7.0 77.0 + 99 NaN NaN +big damp 3.0 33.0 +blue dry 3.0 33.0 +red red 4.0 44.0 + wet 1.0 11.0 +``` + +有关更多详细信息和示例,请参阅[groupby文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/groupby.html#groupby-split)。 + +#### ``match``/ ``%in%`` + +在R中选择数据的常用方法是使用``%in%``使用该函数定义的数据``match``。运算符``%in%``用于返回指示是否存在匹配的逻辑向量: + +``` r +s <- 0:4 +s %in% c(2,4) +``` + +该[``isin()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.isin.html#pandas.DataFrame.isin)方法类似于R ``%in%``运算符: + +``` python +In [12]: s = pd.Series(np.arange(5), dtype=np.float32) + +In [13]: s.isin([2, 4]) +Out[13]: +0 False +1 False +2 True +3 False +4 True +dtype: bool +``` + +该``match``函数返回其第二个参数匹配位置的向量: + +``` r +s <- 0:4 +match(s, c(2,4)) +``` + +有关更多详细信息和示例,请参阅[重塑文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/indexing.html#indexing-basics-indexing-isin)。 + +#### ``tapply`` + +``tapply``类似于``aggregate``,但数据可以是一个参差不齐的数组,因为子类大小可能是不规则的。使用调用的data.frame + ``baseball``,并根据数组检索信息``team``: + +``` r +baseball <- + data.frame(team = gl(5, 5, + labels = paste("Team", LETTERS[1:5])), + player = sample(letters, 25), + batting.average = runif(25, .200, .400)) + +tapply(baseball$batting.average, baseball.example$team, + max) +``` + +在``pandas``我们可以使用[``pivot_table()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.pivot_table.html#pandas.pivot_table)方法来处理这个: + +``` python +In [14]: import random + +In [15]: import string + +In [16]: baseball = pd.DataFrame( + ....: {'team': ["team %d" % (x + 1) for x in range(5)] * 5, + ....: 'player': random.sample(list(string.ascii_lowercase), 25), + ....: 'batting avg': np.random.uniform(.200, .400, 25)}) + ....: + +In [17]: baseball.pivot_table(values='batting avg', columns='team', aggfunc=np.max) +Out[17]: +team team 1 team 2 team 3 team 4 team 5 +batting avg 0.352134 0.295327 0.397191 0.394457 0.396194 +``` + +有关更多详细信息和示例,请参阅[重塑文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/reshaping.html#reshaping-pivot)。 + +#### ``subset`` + +该[``query()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query)方法类似于基本R ``subset`` +函数。在R中,您可能希望获取``data.frame``一列的值小于另一列的值的行: + +``` r +df <- data.frame(a=rnorm(10), b=rnorm(10)) +subset(df, a <= b) +df[df$a <= df$b,] # note the comma +``` + +在``pandas``,有几种方法可以执行子集化。您可以使用 + [``query()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query)或传递表达式,就像它是索引/切片以及标准布尔索引一样: + +``` python +In [18]: df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)}) + +In [19]: df.query('a <= b') +Out[19]: + a b +1 0.174950 0.552887 +2 -0.023167 0.148084 +3 -0.495291 -0.300218 +4 -0.860736 0.197378 +5 -1.134146 1.720780 +7 -0.290098 0.083515 +8 0.238636 0.946550 + +In [20]: df[df.a <= df.b] +Out[20]: + a b +1 0.174950 0.552887 +2 -0.023167 0.148084 +3 -0.495291 -0.300218 +4 -0.860736 0.197378 +5 -1.134146 1.720780 +7 -0.290098 0.083515 +8 0.238636 0.946550 + +In [21]: df.loc[df.a <= df.b] +Out[21]: + a b +1 0.174950 0.552887 +2 -0.023167 0.148084 +3 -0.495291 -0.300218 +4 -0.860736 0.197378 +5 -1.134146 1.720780 +7 -0.290098 0.083515 +8 0.238636 0.946550 +``` + +有关更多详细信息和示例,请参阅[查询文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/indexing.html#indexing-query)。 + +#### ``with`` + +使用``df``带有列的R中调用的data.frame的表达式``a``, + ``b``将使用``with``如下方式进行求值: + +``` r +df <- data.frame(a=rnorm(10), b=rnorm(10)) +with(df, a + b) +df$a + df$b # same as the previous expression +``` + +在``pandas``等效表达式中,使用该 + [``eval()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval)方法将是: + +``` python +In [22]: df = pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)}) + +In [23]: df.eval('a + b') +Out[23]: +0 -0.091430 +1 -2.483890 +2 -0.252728 +3 -0.626444 +4 -0.261740 +5 2.149503 +6 -0.332214 +7 0.799331 +8 -2.377245 +9 2.104677 +dtype: float64 + +In [24]: df.a + df.b # same as the previous expression +Out[24]: +0 -0.091430 +1 -2.483890 +2 -0.252728 +3 -0.626444 +4 -0.261740 +5 2.149503 +6 -0.332214 +7 0.799331 +8 -2.377245 +9 2.104677 +dtype: float64 +``` + +在某些情况下,[``eval()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval)将比纯Python中的评估快得多。有关更多详细信息和示例,请参阅[eval文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/enhancingperf.html#enhancingperf-eval)。 + +### plyr + +``plyr``是用于数据分析的拆分应用组合策略的R库。这些函数围绕R,``a`` +for ``arrays``,``l``for ``lists``和``d``for中的三个数据结构``data.frame``。下表显示了如何在Python中映射这些数据结构。 + +R | Python +---|--- +array | list +lists | 字典(dist)或对象列表(list of objects) +data.frame | dataframe + +#### ``ddply`` + +在R中使用名为``df``的data.frame的表达式,比如您有一个希望按``月``汇总``x``的需求: + +``` r +require(plyr) +df <- data.frame( + x = runif(120, 1, 168), + y = runif(120, 7, 334), + z = runif(120, 1.7, 20.7), + month = rep(c(5,6,7,8),30), + week = sample(1:4, 120, TRUE) +) + +ddply(df, .(month, week), summarize, + mean = round(mean(x), 2), + sd = round(sd(x), 2)) +``` + +在``pandas``等效表达式中,使用该 + [``groupby()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby)方法将是: + +``` python +In [25]: df = pd.DataFrame({'x': np.random.uniform(1., 168., 120), + ....: 'y': np.random.uniform(7., 334., 120), + ....: 'z': np.random.uniform(1.7, 20.7, 120), + ....: 'month': [5, 6, 7, 8] * 30, + ....: 'week': np.random.randint(1, 4, 120)}) + ....: + +In [26]: grouped = df.groupby(['month', 'week']) + +In [27]: grouped['x'].agg([np.mean, np.std]) +Out[27]: + mean std +month week +5 1 63.653367 40.601965 + 2 78.126605 53.342400 + 3 92.091886 57.630110 +6 1 81.747070 54.339218 + 2 70.971205 54.687287 + 3 100.968344 54.010081 +7 1 61.576332 38.844274 + 2 61.733510 48.209013 + 3 71.688795 37.595638 +8 1 62.741922 34.618153 + 2 91.774627 49.790202 + 3 73.936856 60.773900 +``` + +有关更多详细信息和示例,请参阅[groupby文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/groupby.html#groupby-aggregate)。 + +### 重塑/ reshape2 + +#### ``melt.array`` + +使用``a``在R中调用的3维数组的表达式,您希望将其融合到data.frame中: + +``` r +a <- array(c(1:23, NA), c(2,3,4)) +data.frame(melt(a)) +``` + +在Python中,既然``a``是一个列表,你可以简单地使用列表理解。 + +``` python +In [28]: a = np.array(list(range(1, 24)) + [np.NAN]).reshape(2, 3, 4) + +In [29]: pd.DataFrame([tuple(list(x) + [val]) for x, val in np.ndenumerate(a)]) +Out[29]: + 0 1 2 3 +0 0 0 0 1.0 +1 0 0 1 2.0 +2 0 0 2 3.0 +3 0 0 3 4.0 +4 0 1 0 5.0 +.. .. .. .. ... +19 1 1 3 20.0 +20 1 2 0 21.0 +21 1 2 1 22.0 +22 1 2 2 23.0 +23 1 2 3 NaN + +[24 rows x 4 columns] +``` + +#### ``melt.list`` + +使用``a``R中调用的列表的表达式,您希望将其融合到data.frame中: + +``` r +a <- as.list(c(1:4, NA)) +data.frame(melt(a)) +``` + +在Python中,此列表将是元组列表,因此 + [``DataFrame()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.html#pandas.DataFrame)方法会根据需要将其转换为数据帧。 + +``` python +In [30]: a = list(enumerate(list(range(1, 5)) + [np.NAN])) + +In [31]: pd.DataFrame(a) +Out[31]: + 0 1 +0 0 1.0 +1 1 2.0 +2 2 3.0 +3 3 4.0 +4 4 NaN +``` + +有关更多详细信息和示例,请参阅[“进入数据结构”文档](https://pandas.pydata.org/pandas-docs/stable/dsintro.html#dsintro)。 + +#### ``melt.data.frame`` + +使用``cheese``在R中调用的data.frame的表达式,您要在其中重新整形data.frame: + +``` r +cheese <- data.frame( + first = c('John', 'Mary'), + last = c('Doe', 'Bo'), + height = c(5.5, 6.0), + weight = c(130, 150) +) +melt(cheese, id=c("first", "last")) +``` + +在Python中,该[``melt()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.melt.html#pandas.melt)方法是R等价物: + +``` python +In [32]: cheese = pd.DataFrame({'first': ['John', 'Mary'], + ....: 'last': ['Doe', 'Bo'], + ....: 'height': [5.5, 6.0], + ....: 'weight': [130, 150]}) + ....: + +In [33]: pd.melt(cheese, id_vars=['first', 'last']) +Out[33]: + first last variable value +0 John Doe height 5.5 +1 Mary Bo height 6.0 +2 John Doe weight 130.0 +3 Mary Bo weight 150.0 + +In [34]: cheese.set_index(['first', 'last']).stack() # alternative way +Out[34]: +first last +John Doe height 5.5 + weight 130.0 +Mary Bo height 6.0 + weight 150.0 +dtype: float64 +``` + +有关更多详细信息和示例,请参阅[重塑文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/reshaping.html#reshaping-melt)。 + +#### ``cast`` + +在R中``acast``是一个表达式,它使用``df``在R中调用的data.frame 来转换为更高维的数组: + +``` r +df <- data.frame( + x = runif(12, 1, 168), + y = runif(12, 7, 334), + z = runif(12, 1.7, 20.7), + month = rep(c(5,6,7),4), + week = rep(c(1,2), 6) +) + +mdf <- melt(df, id=c("month", "week")) +acast(mdf, week ~ month ~ variable, mean) +``` + +在Python中,最好的方法是使用[``pivot_table()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.pivot_table.html#pandas.pivot_table): + +``` python +In [35]: df = pd.DataFrame({'x': np.random.uniform(1., 168., 12), + ....: 'y': np.random.uniform(7., 334., 12), + ....: 'z': np.random.uniform(1.7, 20.7, 12), + ....: 'month': [5, 6, 7] * 4, + ....: 'week': [1, 2] * 6}) + ....: + +In [36]: mdf = pd.melt(df, id_vars=['month', 'week']) + +In [37]: pd.pivot_table(mdf, values='value', index=['variable', 'week'], + ....: columns=['month'], aggfunc=np.mean) + ....: +Out[37]: +month 5 6 7 +variable week +x 1 93.888747 98.762034 55.219673 + 2 94.391427 38.112932 83.942781 +y 1 94.306912 279.454811 227.840449 + 2 87.392662 193.028166 173.899260 +z 1 11.016009 10.079307 16.170549 + 2 8.476111 17.638509 19.003494 +``` + +类似地``dcast``,使用``df``R中调用的data.frame 来基于``Animal``和聚合信息``FeedType``: + +``` r +df <- data.frame( + Animal = c('Animal1', 'Animal2', 'Animal3', 'Animal2', 'Animal1', + 'Animal2', 'Animal3'), + FeedType = c('A', 'B', 'A', 'A', 'B', 'B', 'A'), + Amount = c(10, 7, 4, 2, 5, 6, 2) +) + +dcast(df, Animal ~ FeedType, sum, fill=NaN) +# Alternative method using base R +with(df, tapply(Amount, list(Animal, FeedType), sum)) +``` + +Python可以通过两种不同的方式处理它。首先,类似于上面使用[``pivot_table()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.pivot_table.html#pandas.pivot_table): + +``` python +In [38]: df = pd.DataFrame({ + ....: 'Animal': ['Animal1', 'Animal2', 'Animal3', 'Animal2', 'Animal1', + ....: 'Animal2', 'Animal3'], + ....: 'FeedType': ['A', 'B', 'A', 'A', 'B', 'B', 'A'], + ....: 'Amount': [10, 7, 4, 2, 5, 6, 2], + ....: }) + ....: + +In [39]: df.pivot_table(values='Amount', index='Animal', columns='FeedType', + ....: aggfunc='sum') + ....: +Out[39]: +FeedType A B +Animal +Animal1 10.0 5.0 +Animal2 2.0 13.0 +Animal3 6.0 NaN +``` + +第二种方法是使用该[``groupby()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby)方法: + +``` python +In [40]: df.groupby(['Animal', 'FeedType'])['Amount'].sum() +Out[40]: +Animal FeedType +Animal1 A 10 + B 5 +Animal2 A 2 + B 13 +Animal3 A 6 +Name: Amount, dtype: int64 +``` + +有关更多详细信息和示例,请参阅[重新整形文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/reshaping.html#reshaping-pivot)或[groupby文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/groupby.html#groupby-split)。 + +#### ``factor`` + +pandas具有分类数据的数据类型。 + +``` r +cut(c(1,2,3,4,5,6), 3) +factor(c(1,2,3,2,2,3)) +``` + +在Pandas,这是完成与``pd.cut``和``astype("category")``: + +``` python +In [41]: pd.cut(pd.Series([1, 2, 3, 4, 5, 6]), 3) +Out[41]: +0 (0.995, 2.667] +1 (0.995, 2.667] +2 (2.667, 4.333] +3 (2.667, 4.333] +4 (4.333, 6.0] +5 (4.333, 6.0] +dtype: category +Categories (3, interval[float64]): [(0.995, 2.667] < (2.667, 4.333] < (4.333, 6.0]] + +In [42]: pd.Series([1, 2, 3, 2, 2, 3]).astype("category") +Out[42]: +0 1 +1 2 +2 3 +3 2 +4 2 +5 3 +dtype: category +Categories (3, int64): [1, 2, 3] +``` + +有关更多详细信息和示例,请参阅[分类介绍](https://pandas.pydata.org/pandas-docs/stable/../user_guide/categorical.html#categorical)和 + [API文档](https://pandas.pydata.org/pandas-docs/stable/../reference/arrays.html#api-arrays-categorical)。还有一个关于[R因子差异](https://pandas.pydata.org/pandas-docs/stable/../user_guide/categorical.html#categorical-rfactor)的文档 + 。 + +## 与SQL比较 + +由于许多潜在的 pandas 用户对[SQL](https://en.wikipedia.org/wiki/SQL)有一定的了解 + ,因此本页面旨在提供一些使用pandas如何执行各种SQL操作的示例。 + +如果您是 pandas 的新手,您可能需要先阅读[十分钟入门Pandas](/docs/getting_started/10min.html) 以熟悉本库。 + +按照惯例,我们按如下方式导入 pandas 和 NumPy: + +``` python +In [1]: import pandas as pd + +In [2]: import numpy as np +``` + +大多数示例将使用``tips``pandas测试中找到的数据集。我们将数据读入名为*tips*的DataFrame中,并假设我们有一个具有相同名称和结构的数据库表。 + +``` python +In [3]: url = ('https://raw.github.com/pandas-dev' + ...: '/pandas/master/pandas/tests/data/tips.csv') + ...: + +In [4]: tips = pd.read_csv(url) + +In [5]: tips.head() +Out[5]: + total_bill tip sex smoker day time size +0 16.99 1.01 Female No Sun Dinner 2 +1 10.34 1.66 Male No Sun Dinner 3 +2 21.01 3.50 Male No Sun Dinner 3 +3 23.68 3.31 Male No Sun Dinner 2 +4 24.59 3.61 Female No Sun Dinner 4 +``` + +### SELECT + +在SQL中,使用您要选择的以逗号分隔的列列表(或``*`` +选择所有列)来完成选择: + +``` sql +SELECT total_bill, tip, smoker, time +FROM tips +LIMIT 5; +``` + +使用pandas,通过将列名列表传递给DataFrame来完成列选择: + +``` python +In [6]: tips[['total_bill', 'tip', 'smoker', 'time']].head(5) +Out[6]: + total_bill tip smoker time +0 16.99 1.01 No Dinner +1 10.34 1.66 No Dinner +2 21.01 3.50 No Dinner +3 23.68 3.31 No Dinner +4 24.59 3.61 No Dinner +``` + +在没有列名列表的情况下调用DataFrame将显示所有列(类似于SQL``*``)。 + +### WHERE + +SQL中的过滤是通过WHERE子句完成的。 + +``` sql +SELECT * +FROM tips +WHERE time = 'Dinner' +LIMIT 5; +``` + +DataFrame可以通过多种方式进行过滤; 最直观的是使用 + [布尔索引](https://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing)。 + +``` python +In [7]: tips[tips['time'] == 'Dinner'].head(5) +Out[7]: + total_bill tip sex smoker day time size +0 16.99 1.01 Female No Sun Dinner 2 +1 10.34 1.66 Male No Sun Dinner 3 +2 21.01 3.50 Male No Sun Dinner 3 +3 23.68 3.31 Male No Sun Dinner 2 +4 24.59 3.61 Female No Sun Dinner 4 +``` + +上面的语句只是将一个 ``Series`` 的 True / False 对象传递给 DataFrame,返回所有带有True的行。 + +``` python +In [8]: is_dinner = tips['time'] == 'Dinner' + +In [9]: is_dinner.value_counts() +Out[9]: +True 176 +False 68 +Name: time, dtype: int64 + +In [10]: tips[is_dinner].head(5) +Out[10]: + total_bill tip sex smoker day time size +0 16.99 1.01 Female No Sun Dinner 2 +1 10.34 1.66 Male No Sun Dinner 3 +2 21.01 3.50 Male No Sun Dinner 3 +3 23.68 3.31 Male No Sun Dinner 2 +4 24.59 3.61 Female No Sun Dinner 4 +``` + +就像SQL的OR和AND一样,可以使用|将多个条件传递给DataFrame (OR)和&(AND)。 + +``` sql +-- tips of more than $5.00 at Dinner meals +SELECT * +FROM tips +WHERE time = 'Dinner' AND tip > 5.00; +``` + +``` python +# tips of more than $5.00 at Dinner meals +In [11]: tips[(tips['time'] == 'Dinner') & (tips['tip'] > 5.00)] +Out[11]: + total_bill tip sex smoker day time size +23 39.42 7.58 Male No Sat Dinner 4 +44 30.40 5.60 Male No Sun Dinner 4 +47 32.40 6.00 Male No Sun Dinner 4 +52 34.81 5.20 Female No Sun Dinner 4 +59 48.27 6.73 Male No Sat Dinner 4 +116 29.93 5.07 Male No Sun Dinner 4 +155 29.85 5.14 Female No Sun Dinner 5 +170 50.81 10.00 Male Yes Sat Dinner 3 +172 7.25 5.15 Male Yes Sun Dinner 2 +181 23.33 5.65 Male Yes Sun Dinner 2 +183 23.17 6.50 Male Yes Sun Dinner 4 +211 25.89 5.16 Male Yes Sat Dinner 4 +212 48.33 9.00 Male No Sat Dinner 4 +214 28.17 6.50 Female Yes Sat Dinner 3 +239 29.03 5.92 Male No Sat Dinner 3 +``` + +``` sql +-- tips by parties of at least 5 diners OR bill total was more than $45 +SELECT * +FROM tips +WHERE size >= 5 OR total_bill > 45; +``` + +``` python +# tips by parties of at least 5 diners OR bill total was more than $45 +In [12]: tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)] +Out[12]: + total_bill tip sex smoker day time size +59 48.27 6.73 Male No Sat Dinner 4 +125 29.80 4.20 Female No Thur Lunch 6 +141 34.30 6.70 Male No Thur Lunch 6 +142 41.19 5.00 Male No Thur Lunch 5 +143 27.05 5.00 Female No Thur Lunch 6 +155 29.85 5.14 Female No Sun Dinner 5 +156 48.17 5.00 Male No Sun Dinner 6 +170 50.81 10.00 Male Yes Sat Dinner 3 +182 45.35 3.50 Male Yes Sun Dinner 3 +185 20.69 5.00 Male No Sun Dinner 5 +187 30.46 2.00 Male Yes Sun Dinner 5 +212 48.33 9.00 Male No Sat Dinner 4 +216 28.15 3.00 Male Yes Sat Dinner 5 +``` + +使用[``notna()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.Series.notna.html#pandas.Series.notna)和[``isna()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.Series.isna.html#pandas.Series.isna) +方法完成NULL检查。 + +``` python +In [13]: frame = pd.DataFrame({'col1': ['A', 'B', np.NaN, 'C', 'D'], + ....: 'col2': ['F', np.NaN, 'G', 'H', 'I']}) + ....: + +In [14]: frame +Out[14]: + col1 col2 +0 A F +1 B NaN +2 NaN G +3 C H +4 D I +``` + +假设我们有一个与上面的DataFrame结构相同的表。我们只能``col2``通过以下查询看到IS NULL 的记录: + +``` sql +SELECT * +FROM frame +WHERE col2 IS NULL; +``` + +``` python +In [15]: frame[frame['col2'].isna()] +Out[15]: + col1 col2 +1 B NaN +``` + +获取``col1``IS NOT NULL的项目可以完成[``notna()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.Series.notna.html#pandas.Series.notna)。 + +``` sql +SELECT * +FROM frame +WHERE col1 IS NOT NULL; +``` + +``` python +In [16]: frame[frame['col1'].notna()] +Out[16]: + col1 col2 +0 A F +1 B NaN +3 C H +4 D I +``` + +### GROUP BY + +在pandas中,SQL的GROUP BY操作使用类似命名的 + [``groupby()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby)方法执行。[``groupby()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby)通常是指我们想要将数据集拆分成组,应用某些功能(通常是聚合),然后将这些组合在一起的过程。 + +常见的SQL操作是获取整个数据集中每个组中的记录数。例如,有一个需要向我们提供提示中的性别的数量的查询语句: + +``` sql +SELECT sex, count(*) +FROM tips +GROUP BY sex; +/* +Female 87 +Male 157 +*/ +``` + +在 pandas 中可以这样: + +``` python +In [17]: tips.groupby('sex').size() +Out[17]: +sex +Female 87 +Male 157 +dtype: int64 +``` + +请注意,在我们使用的pandas代码中[``size()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.core.groupby.DataFrameGroupBy.size.html#pandas.core.groupby.DataFrameGroupBy.size),没有 + [``count()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.core.groupby.DataFrameGroupBy.count.html#pandas.core.groupby.DataFrameGroupBy.count)。这是因为 + [``count()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.core.groupby.DataFrameGroupBy.count.html#pandas.core.groupby.DataFrameGroupBy.count)将函数应用于每个列,返回每个列中的记录数。``not null`` + +``` python +In [18]: tips.groupby('sex').count() +Out[18]: + total_bill tip smoker day time size +sex +Female 87 87 87 87 87 87 +Male 157 157 157 157 157 157 +``` + +或者,我们可以将该[``count()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.core.groupby.DataFrameGroupBy.count.html#pandas.core.groupby.DataFrameGroupBy.count)方法应用于单个列: + +``` python +In [19]: tips.groupby('sex')['total_bill'].count() +Out[19]: +sex +Female 87 +Male 157 +Name: total_bill, dtype: int64 +``` + +也可以一次应用多个功能。例如,假设我们希望查看提示量与星期几的不同之处 - ``agg()``允许您将字典传递给分组的DataFrame,指示要应用于特定列的函数。 + +``` sql +SELECT day, AVG(tip), COUNT(*) +FROM tips +GROUP BY day; +/* +Fri 2.734737 19 +Sat 2.993103 87 +Sun 3.255132 76 +Thur 2.771452 62 +*/ +``` + +``` python +In [20]: tips.groupby('day').agg({'tip': np.mean, 'day': np.size}) +Out[20]: + tip day +day +Fri 2.734737 19 +Sat 2.993103 87 +Sun 3.255132 76 +Thur 2.771452 62 +``` + +通过将列列表传递给[``groupby()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby)方法来完成多个列的分组 + 。 + +``` sql +SELECT smoker, day, COUNT(*), AVG(tip) +FROM tips +GROUP BY smoker, day; +/* +smoker day +No Fri 4 2.812500 + Sat 45 3.102889 + Sun 57 3.167895 + Thur 45 2.673778 +Yes Fri 15 2.714000 + Sat 42 2.875476 + Sun 19 3.516842 + Thur 17 3.030000 +*/ +``` + +``` python +In [21]: tips.groupby(['smoker', 'day']).agg({'tip': [np.size, np.mean]}) +Out[21]: + tip + size mean +smoker day +No Fri 4.0 2.812500 + Sat 45.0 3.102889 + Sun 57.0 3.167895 + Thur 45.0 2.673778 +Yes Fri 15.0 2.714000 + Sat 42.0 2.875476 + Sun 19.0 3.516842 + Thur 17.0 3.030000 +``` + +### JOIN + +可以使用[``join()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join)或执行JOIN [``merge()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.merge.html#pandas.merge)。默认情况下, + [``join()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join)将在其索引上加入DataFrame。每个方法都有参数,允许您指定要执行的连接类型(LEFT,RIGHT,INNER,FULL)或要连接的列(列名称或索引)。 + +``` python +In [22]: df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], + ....: 'value': np.random.randn(4)}) + ....: + +In [23]: df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'], + ....: 'value': np.random.randn(4)}) + ....: +``` + +假设我们有两个与DataFrames名称和结构相同的数据库表。 + +现在让我们来看看各种类型的JOIN。 + +#### INNER JOIN + +``` sql +SELECT * +FROM df1 +INNER JOIN df2 + ON df1.key = df2.key; +``` + +``` python +# merge performs an INNER JOIN by default +In [24]: pd.merge(df1, df2, on='key') +Out[24]: + key value_x value_y +0 B -0.282863 1.212112 +1 D -1.135632 -0.173215 +2 D -1.135632 0.119209 +``` + +[``merge()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.merge.html#pandas.merge) 当您想要将一个DataFrame列与另一个DataFrame索引连接时,还会为这些情况提供参数。 + +``` python +In [25]: indexed_df2 = df2.set_index('key') + +In [26]: pd.merge(df1, indexed_df2, left_on='key', right_index=True) +Out[26]: + key value_x value_y +1 B -0.282863 1.212112 +3 D -1.135632 -0.173215 +3 D -1.135632 0.119209 +``` + +#### LEFT OUTER JOIN + +``` sql +-- show all records from df1 +SELECT * +FROM df1 +LEFT OUTER JOIN df2 + ON df1.key = df2.key; +``` + +``` python +# show all records from df1 +In [27]: pd.merge(df1, df2, on='key', how='left') +Out[27]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +``` + +#### RIGHT JOIN + +``` sql +-- show all records from df2 +SELECT * +FROM df1 +RIGHT OUTER JOIN df2 + ON df1.key = df2.key; +``` + +``` python +# show all records from df2 +In [28]: pd.merge(df1, df2, on='key', how='right') +Out[28]: + key value_x value_y +0 B -0.282863 1.212112 +1 D -1.135632 -0.173215 +2 D -1.135632 0.119209 +3 E NaN -1.044236 +``` + +#### FULL JOIN + +pandas还允许显示数据集两侧的FULL JOIN,无论连接列是否找到匹配项。在编写时,所有RDBMS(MySQL)都不支持FULL JOIN。 + +``` sql +-- show all records from both tables +SELECT * +FROM df1 +FULL OUTER JOIN df2 + ON df1.key = df2.key; +``` + +``` python +# show all records from both frames +In [29]: pd.merge(df1, df2, on='key', how='outer') +Out[29]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +5 E NaN -1.044236 +``` + +### UNION + +UNION ALL可以使用[``concat()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.concat.html#pandas.concat)。 + +``` python +In [30]: df1 = pd.DataFrame({'city': ['Chicago', 'San Francisco', 'New York City'], + ....: 'rank': range(1, 4)}) + ....: + +In [31]: df2 = pd.DataFrame({'city': ['Chicago', 'Boston', 'Los Angeles'], + ....: 'rank': [1, 4, 5]}) + ....: +``` + +``` sql +SELECT city, rank +FROM df1 +UNION ALL +SELECT city, rank +FROM df2; +/* + city rank + Chicago 1 +San Francisco 2 +New York City 3 + Chicago 1 + Boston 4 + Los Angeles 5 +*/ +``` + +``` python +In [32]: pd.concat([df1, df2]) +Out[32]: + city rank +0 Chicago 1 +1 San Francisco 2 +2 New York City 3 +0 Chicago 1 +1 Boston 4 +2 Los Angeles 5 +``` + +SQL的UNION类似于UNION ALL,但是UNION将删除重复的行。 + +``` sql +SELECT city, rank +FROM df1 +UNION +SELECT city, rank +FROM df2; +-- notice that there is only one Chicago record this time +/* + city rank + Chicago 1 +San Francisco 2 +New York City 3 + Boston 4 + Los Angeles 5 +*/ +``` + +在 pandas 中,您可以[``concat()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.concat.html#pandas.concat)结合使用 + [``drop_duplicates()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates)。 + +``` python +In [33]: pd.concat([df1, df2]).drop_duplicates() +Out[33]: + city rank +0 Chicago 1 +1 San Francisco 2 +2 New York City 3 +1 Boston 4 +2 Los Angeles 5 +``` + +### Pandas等同于某些SQL分析和聚合函数 + +#### 带有偏移量的前N行 + +``` sql +-- MySQL +SELECT * FROM tips +ORDER BY tip DESC +LIMIT 10 OFFSET 5; +``` + +``` python +In [34]: tips.nlargest(10 + 5, columns='tip').tail(10) +Out[34]: + total_bill tip sex smoker day time size +183 23.17 6.50 Male Yes Sun Dinner 4 +214 28.17 6.50 Female Yes Sat Dinner 3 +47 32.40 6.00 Male No Sun Dinner 4 +239 29.03 5.92 Male No Sat Dinner 3 +88 24.71 5.85 Male No Thur Lunch 2 +181 23.33 5.65 Male Yes Sun Dinner 2 +44 30.40 5.60 Male No Sun Dinner 4 +52 34.81 5.20 Female No Sun Dinner 4 +85 34.83 5.17 Female No Thur Lunch 4 +211 25.89 5.16 Male Yes Sat Dinner 4 +``` + +#### 每组前N行 + +``` sql +-- Oracle's ROW_NUMBER() analytic function +SELECT * FROM ( + SELECT + t.*, + ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn + FROM tips t +) +WHERE rn < 3 +ORDER BY day, rn; +``` + +``` python +In [35]: (tips.assign(rn=tips.sort_values(['total_bill'], ascending=False) + ....: .groupby(['day']) + ....: .cumcount() + 1) + ....: .query('rn < 3') + ....: .sort_values(['day', 'rn'])) + ....: +Out[35]: + total_bill tip sex smoker day time size rn +95 40.17 4.73 Male Yes Fri Dinner 4 1 +90 28.97 3.00 Male Yes Fri Dinner 2 2 +170 50.81 10.00 Male Yes Sat Dinner 3 1 +212 48.33 9.00 Male No Sat Dinner 4 2 +156 48.17 5.00 Male No Sun Dinner 6 1 +182 45.35 3.50 Male Yes Sun Dinner 3 2 +197 43.11 5.00 Female Yes Thur Lunch 4 1 +142 41.19 5.00 Male No Thur Lunch 5 2 +``` + +同样使用 *rank (method ='first')* 函数 + +``` python +In [36]: (tips.assign(rnk=tips.groupby(['day'])['total_bill'] + ....: .rank(method='first', ascending=False)) + ....: .query('rnk < 3') + ....: .sort_values(['day', 'rnk'])) + ....: +Out[36]: + total_bill tip sex smoker day time size rnk +95 40.17 4.73 Male Yes Fri Dinner 4 1.0 +90 28.97 3.00 Male Yes Fri Dinner 2 2.0 +170 50.81 10.00 Male Yes Sat Dinner 3 1.0 +212 48.33 9.00 Male No Sat Dinner 4 2.0 +156 48.17 5.00 Male No Sun Dinner 6 1.0 +182 45.35 3.50 Male Yes Sun Dinner 3 2.0 +197 43.11 5.00 Female Yes Thur Lunch 4 1.0 +142 41.19 5.00 Male No Thur Lunch 5 2.0 +``` + +``` sql +-- Oracle's RANK() analytic function +SELECT * FROM ( + SELECT + t.*, + RANK() OVER(PARTITION BY sex ORDER BY tip) AS rnk + FROM tips t + WHERE tip < 2 +) +WHERE rnk < 3 +ORDER BY sex, rnk; +``` + +让我们找到每个性别组(等级<3)的提示(提示<2)。请注意,使用``rank(method='min')``函数时 + *rnk_min*对于相同的*提示*保持不变 +(如Oracle的RANK()函数) + +``` python +In [37]: (tips[tips['tip'] < 2] + ....: .assign(rnk_min=tips.groupby(['sex'])['tip'] + ....: .rank(method='min')) + ....: .query('rnk_min < 3') + ....: .sort_values(['sex', 'rnk_min'])) + ....: +Out[37]: + total_bill tip sex smoker day time size rnk_min +67 3.07 1.00 Female Yes Sat Dinner 1 1.0 +92 5.75 1.00 Female Yes Fri Dinner 2 1.0 +111 7.25 1.00 Female No Sat Dinner 1 1.0 +236 12.60 1.00 Male Yes Sat Dinner 2 1.0 +237 32.83 1.17 Male Yes Sat Dinner 2 2.0 +``` + +### 更新(UPDATE) + +``` sql +UPDATE tips +SET tip = tip*2 +WHERE tip < 2; +``` + +``` python +In [38]: tips.loc[tips['tip'] < 2, 'tip'] *= 2 +``` + +### 删除(DELETE) + +``` sql +DELETE FROM tips +WHERE tip > 9; +``` + +在pandas中,我们选择应保留的行,而不是删除它们 + +``` python +In [39]: tips = tips.loc[tips['tip'] <= 9] +``` + +## 与SAS的比较 + +对于来自 [SAS](https://en.wikipedia.org/wiki/SAS_(software)) 的潜在用户,本节旨在演示如何在 pandas 中做各种类似SAS的操作。 + +由于许多潜在的 pandas 用户对[SQL](https://en.wikipedia.org/wiki/SQL)有一定的了解,因此本页面旨在提供一些使用 pandas 如何执行各种SQL操作的示例。 + +如果您是 pandas 的新手,您可能需要先阅读[十分钟入门Pandas](/docs/getting_started/10min.html) 以熟悉本库。 + +按照惯例,我们按如下方式导入 pandas 和 NumPy: + +``` python +In [1]: import pandas as pd + +In [2]: import numpy as np +``` + +::: tip 注意 + +在本教程中,``DataFrame``将通过调用显示 + pandas ``df.head()``,它将显示该行的前N行(默认为5行)``DataFrame``。这通常用于交互式工作(例如[Jupyter笔记本](https://jupyter.org/)或终端) - SAS中的等价物将是: + +``` sas +proc print data=df(obs=5); +run; +``` + +::: + +### 数据结构 + +#### 一般术语对照表 + +Pandas | SAS +---|--- +DataFrame | 数据集(data set) +column | 变量(variable) +row | 观察(observation) +groupby | BY-group +NaN | . + +#### ``DataFrame``/ ``Series`` + +A ``DataFrame``pandas类似于SAS数据集 - 具有标记列的二维数据源,可以是不同类型的。如本文档所示,几乎所有可以使用SAS ``DATA``步骤应用于数据集的操作也可以在pandas中完成。 + +A ``Series``是表示a的一列的数据结构 + ``DataFrame``。SAS没有针对单个列的单独数据结构,但通常,使用a ``Series``类似于在``DATA``步骤中引用列。 + +#### ``Index`` + +每一个``DataFrame``和``Series``有一个``Index``-这是对标签 + *的行*数据。SAS没有完全类似的概念。除了在``DATA``step(``_N_``)期间可以访问的隐式整数索引之外,数据集的行基本上是未标记的。 + +在pandas中,如果未指定索引,则默认情况下也使用整数索引(第一行= 0,第二行= 1,依此类推)。虽然使用标记``Index``或 + ``MultiIndex``可以启用复杂的分析,并且最终是 Pandas 理解的重要部分,但是对于这种比较,我们基本上会忽略它, + ``Index``并且只是将其``DataFrame``视为列的集合。有关如何有效使用的更多信息, + 请参阅[索引文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/indexing.html#indexing)``Index``。 + +### 数据输入/输出 + +#### 从值构造DataFrame + +通过将数据放在``datalines``语句之后并指定列名,可以从指定值构建SAS数据集。 + +``` sas +data df; + input x y; + datalines; + 1 2 + 3 4 + 5 6 + ; +run; +``` + +``DataFrame``可以用许多不同的方式构造一个pandas ,但是对于少量的值,通常很方便将它指定为Python字典,其中键是列名,值是数据。 + +``` python +In [3]: df = pd.DataFrame({'x': [1, 3, 5], 'y': [2, 4, 6]}) + +In [4]: df +Out[4]: + x y +0 1 2 +1 3 4 +2 5 6 +``` + +#### 读取外部数据 + +与SAS一样,pandas提供了从多种格式读取数据的实用程序。``tips``在pandas测试([csv](https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv))中找到的数据集将用于以下许多示例中。 + +SAS提供将csv数据读入数据集。``PROC IMPORT`` + +``` sas +proc import datafile='tips.csv' dbms=csv out=tips replace; + getnames=yes; +run; +``` + +Pandas 方法是[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.read_csv.html#pandas.read_csv)类似的。 + +``` python +In [5]: url = ('https://raw.github.com/pandas-dev/' + ...: 'pandas/master/pandas/tests/data/tips.csv') + ...: + +In [6]: tips = pd.read_csv(url) + +In [7]: tips.head() +Out[7]: + total_bill tip sex smoker day time size +0 16.99 1.01 Female No Sun Dinner 2 +1 10.34 1.66 Male No Sun Dinner 3 +2 21.01 3.50 Male No Sun Dinner 3 +3 23.68 3.31 Male No Sun Dinner 2 +4 24.59 3.61 Female No Sun Dinner 4 +``` + +比如,可以使用许多参数来指定数据应该如何解析。例如,如果数据是由制表符分隔的,并且没有列名,那么pandas命令将是:``PROC IMPORT````read_csv`` + +``` python +tips = pd.read_csv('tips.csv', sep='\t', header=None) + +# alternatively, read_table is an alias to read_csv with tab delimiter +tips = pd.read_table('tips.csv', header=None) +``` + +除了text / csv之外,pandas还支持各种其他数据格式,如Excel,HDF5和SQL数据库。这些都是通过``pd.read_*`` +函数读取的。有关更多详细信息,请参阅[IO文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/io.html#io)。 + +#### 导出数据 + +在SAS中``proc导入``相反就是``proc导出`` + +``` sas +proc export data=tips outfile='tips2.csv' dbms=csv; +run; +``` + +类似地,在 Pandas ,相反``read_csv``是[``to_csv()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv),与其他的数据格式遵循类似的API。 + +``` python +tips.to_csv('tips2.csv') +``` + +### 数据操作 + +#### 列上的操作 + +在该``DATA``步骤中,可以在新列或现有列上使用任意数学表达式。 + +``` sas +data tips; + set tips; + total_bill = total_bill - 2; + new_bill = total_bill / 2; +run; +``` + +pandas 通过指定个体提供了类似的矢量化操作``Series``中``DataFrame``。可以以相同的方式分配新列。 + +``` python +In [8]: tips['total_bill'] = tips['total_bill'] - 2 + +In [9]: tips['new_bill'] = tips['total_bill'] / 2.0 + +In [10]: tips.head() +Out[10]: + total_bill tip sex smoker day time size new_bill +0 14.99 1.01 Female No Sun Dinner 2 7.495 +1 8.34 1.66 Male No Sun Dinner 3 4.170 +2 19.01 3.50 Male No Sun Dinner 3 9.505 +3 21.68 3.31 Male No Sun Dinner 2 10.840 +4 22.59 3.61 Female No Sun Dinner 4 11.295 +``` + +#### 过滤 + +SAS中的过滤是通过一个或多个列上的``if``或``where``语句完成的。 + +``` sas +data tips; + set tips; + if total_bill > 10; +run; + +data tips; + set tips; + where total_bill > 10; + /* equivalent in this case - where happens before the + DATA step begins and can also be used in PROC statements */ +run; +``` + +DataFrame可以通过多种方式进行过滤; 最直观的是使用 + [布尔索引](https://pandas.pydata.org/pandas-docs/stable/../user_guide/indexing.html#indexing-boolean) + +``` python +In [11]: tips[tips['total_bill'] > 10].head() +Out[11]: + total_bill tip sex smoker day time size +0 14.99 1.01 Female No Sun Dinner 2 +2 19.01 3.50 Male No Sun Dinner 3 +3 21.68 3.31 Male No Sun Dinner 2 +4 22.59 3.61 Female No Sun Dinner 4 +5 23.29 4.71 Male No Sun Dinner 4 +``` + +#### 如果/那么逻辑 + +在SAS中,if / then逻辑可用于创建新列。 + +``` sas +data tips; + set tips; + format bucket $4.; + + if total_bill < 10 then bucket = 'low'; + else bucket = 'high'; +run; +``` + +Pandas 中的相同操作可以使用``where``来自的方法来完成``numpy``。 + +``` python +In [12]: tips['bucket'] = np.where(tips['total_bill'] < 10, 'low', 'high') + +In [13]: tips.head() +Out[13]: + total_bill tip sex smoker day time size bucket +0 14.99 1.01 Female No Sun Dinner 2 high +1 8.34 1.66 Male No Sun Dinner 3 low +2 19.01 3.50 Male No Sun Dinner 3 high +3 21.68 3.31 Male No Sun Dinner 2 high +4 22.59 3.61 Female No Sun Dinner 4 high +``` + +#### 日期功能 + +SAS提供了各种功能来对日期/日期时间列进行操作。 + +``` sas +data tips; + set tips; + format date1 date2 date1_plusmonth mmddyy10.; + date1 = mdy(1, 15, 2013); + date2 = mdy(2, 15, 2015); + date1_year = year(date1); + date2_month = month(date2); + * shift date to beginning of next interval; + date1_next = intnx('MONTH', date1, 1); + * count intervals between dates; + months_between = intck('MONTH', date1, date2); +run; +``` + +等效的pandas操作如下所示。除了这些功能外,pandas还支持Base SAS中不具备的其他时间序列功能(例如重新采样和自定义偏移) - 有关详细信息,请参阅[时间序列文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/timeseries.html#timeseries)。 + +``` python +In [14]: tips['date1'] = pd.Timestamp('2013-01-15') + +In [15]: tips['date2'] = pd.Timestamp('2015-02-15') + +In [16]: tips['date1_year'] = tips['date1'].dt.year + +In [17]: tips['date2_month'] = tips['date2'].dt.month + +In [18]: tips['date1_next'] = tips['date1'] + pd.offsets.MonthBegin() + +In [19]: tips['months_between'] = ( + ....: tips['date2'].dt.to_period('M') - tips['date1'].dt.to_period('M')) + ....: + +In [20]: tips[['date1', 'date2', 'date1_year', 'date2_month', + ....: 'date1_next', 'months_between']].head() + ....: +Out[20]: + date1 date2 date1_year date2_month date1_next months_between +0 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +1 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +2 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +3 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +4 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +``` + +#### 列的选择 + +SAS在``DATA``步骤中提供关键字以选择,删除和重命名列。 + +``` sas +data tips; + set tips; + keep sex total_bill tip; +run; + +data tips; + set tips; + drop sex; +run; + +data tips; + set tips; + rename total_bill=total_bill_2; +run; +``` + +下面的 Pandas 表示相同的操作。 + +``` python +# keep +In [21]: tips[['sex', 'total_bill', 'tip']].head() +Out[21]: + sex total_bill tip +0 Female 14.99 1.01 +1 Male 8.34 1.66 +2 Male 19.01 3.50 +3 Male 21.68 3.31 +4 Female 22.59 3.61 + +# drop +In [22]: tips.drop('sex', axis=1).head() +Out[22]: + total_bill tip smoker day time size +0 14.99 1.01 No Sun Dinner 2 +1 8.34 1.66 No Sun Dinner 3 +2 19.01 3.50 No Sun Dinner 3 +3 21.68 3.31 No Sun Dinner 2 +4 22.59 3.61 No Sun Dinner 4 + +# rename +In [23]: tips.rename(columns={'total_bill': 'total_bill_2'}).head() +Out[23]: + total_bill_2 tip sex smoker day time size +0 14.99 1.01 Female No Sun Dinner 2 +1 8.34 1.66 Male No Sun Dinner 3 +2 19.01 3.50 Male No Sun Dinner 3 +3 21.68 3.31 Male No Sun Dinner 2 +4 22.59 3.61 Female No Sun Dinner 4 +``` + +#### 按值排序 + +SAS中的排序是通过 ``PROC SORT`` + +``` sas +proc sort data=tips; + by sex total_bill; +run; +``` + +pandas对象有一个[``sort_values()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values)方法,它采用列表进行排序。 + +``` python +In [24]: tips = tips.sort_values(['sex', 'total_bill']) + +In [25]: tips.head() +Out[25]: + total_bill tip sex smoker day time size +67 1.07 1.00 Female Yes Sat Dinner 1 +92 3.75 1.00 Female Yes Fri Dinner 2 +111 5.25 1.00 Female No Sat Dinner 1 +145 6.35 1.50 Female No Thur Lunch 2 +135 6.51 1.25 Female No Thur Lunch 2 +``` + +### 字符串处理 + +#### 长度 + +SAS使用[LENGTHN](https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002284668.htm) +和[LENGTHC](https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002283942.htm) +函数确定字符串的长度 + 。``LENGTHN``排除尾随空白并``LENGTHC``包括尾随空白。 + +``` sas +data _null_; +set tips; +put(LENGTHN(time)); +put(LENGTHC(time)); +run; +``` + +Python使用该``len``函数确定字符串的长度。 +``len``包括尾随空白。使用``len``和``rstrip``排除尾随空格。 + +``` python +In [26]: tips['time'].str.len().head() +Out[26]: +67 6 +92 6 +111 6 +145 5 +135 5 +Name: time, dtype: int64 + +In [27]: tips['time'].str.rstrip().str.len().head() +Out[27]: +67 6 +92 6 +111 6 +145 5 +135 5 +Name: time, dtype: int64 +``` + +#### 查找(Find) + +SAS使用[FINDW](https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002978282.htm)函数确定字符串中字符的位置 + 。 +``FINDW``获取第一个参数定义的字符串,并搜索您提供的子字符串的第一个位置作为第二个参数。 + +``` sas +data _null_; +set tips; +put(FINDW(sex,'ale')); +run; +``` + +Python使用``find``函数确定字符串中字符的位置 + 。 ``find``搜索子字符串的第一个位置。如果找到子字符串,则该函数返回其位置。请记住,Python索引是从零开始的,如果找不到子串,函数将返回-1。 + +``` python +In [28]: tips['sex'].str.find("ale").head() +Out[28]: +67 3 +92 3 +111 3 +145 3 +135 3 +Name: sex, dtype: int64 +``` + +#### 字符串提取(Substring) + +SAS使用[SUBSTR](https://www2.sas.com/proceedings/sugi25/25/cc/25p088.pdf)函数根据其位置从字符串中提取子字符串 + 。 + +``` sas +data _null_; +set tips; +put(substr(sex,1,1)); +run; +``` + +使用pandas,您可以使用``[]``符号从位置位置提取字符串中的子字符串。请记住,Python索引是从零开始的。 + +``` python +In [29]: tips['sex'].str[0:1].head() +Out[29]: +67 F +92 F +111 F +145 F +135 F +Name: sex, dtype: object +``` + +#### SCAN + +SAS [SCAN](https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm) +函数返回字符串中的第n个字。第一个参数是要解析的字符串,第二个参数指定要提取的字。 + +``` sas +data firstlast; +input String $60.; +First_Name = scan(string, 1); +Last_Name = scan(string, -1); +datalines2; +John Smith; +Jane Cook; +;;; +run; +``` + +Python使用正则表达式根据文本从字符串中提取子字符串。有更强大的方法,但这只是一个简单的方法。 + +``` python +In [30]: firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']}) + +In [31]: firstlast['First_Name'] = firstlast['String'].str.split(" ", expand=True)[0] + +In [32]: firstlast['Last_Name'] = firstlast['String'].str.rsplit(" ", expand=True)[0] + +In [33]: firstlast +Out[33]: + String First_Name Last_Name +0 John Smith John John +1 Jane Cook Jane Jane +``` + +#### 大写,小写和特殊转换 + +SAS [UPCASE ](https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245965.htm) +[LOWCASE](https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245912.htm)和 + [PROPCASE](https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/a002598106.htm) +函数改变了参数的大小写。 + +``` sas +data firstlast; +input String $60.; +string_up = UPCASE(string); +string_low = LOWCASE(string); +string_prop = PROPCASE(string); +datalines2; +John Smith; +Jane Cook; +;;; +run; +``` + +等效Python的功能``upper``,``lower``和``title``。 + +``` python +In [34]: firstlast = pd.DataFrame({'String': ['John Smith', 'Jane Cook']}) + +In [35]: firstlast['string_up'] = firstlast['String'].str.upper() + +In [36]: firstlast['string_low'] = firstlast['String'].str.lower() + +In [37]: firstlast['string_prop'] = firstlast['String'].str.title() + +In [38]: firstlast +Out[38]: + String string_up string_low string_prop +0 John Smith JOHN SMITH john smith John Smith +1 Jane Cook JANE COOK jane cook Jane Cook +``` + +### 合并(Merging) + +合并示例中将使用以下表格 + +``` python +In [39]: df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], + ....: 'value': np.random.randn(4)}) + ....: + +In [40]: df1 +Out[40]: + key value +0 A 0.469112 +1 B -0.282863 +2 C -1.509059 +3 D -1.135632 + +In [41]: df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'], + ....: 'value': np.random.randn(4)}) + ....: + +In [42]: df2 +Out[42]: + key value +0 B 1.212112 +1 D -0.173215 +2 D 0.119209 +3 E -1.044236 +``` + +在SAS中,必须在合并之前显式排序数据。使用``in=``虚拟变量来跟踪是否在一个或两个输入帧中找到匹配来完成不同类型的连接。 + +``` sas +proc sort data=df1; + by key; +run; + +proc sort data=df2; + by key; +run; + +data left_join inner_join right_join outer_join; + merge df1(in=a) df2(in=b); + + if a and b then output inner_join; + if a then output left_join; + if b then output right_join; + if a or b then output outer_join; +run; +``` + +pandas DataFrames有一个[``merge()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge)提供类似功能的方法。请注意,数据不必提前排序,并且通过``how``关键字可以实现不同的连接类型。 + +``` python +In [43]: inner_join = df1.merge(df2, on=['key'], how='inner') + +In [44]: inner_join +Out[44]: + key value_x value_y +0 B -0.282863 1.212112 +1 D -1.135632 -0.173215 +2 D -1.135632 0.119209 + +In [45]: left_join = df1.merge(df2, on=['key'], how='left') + +In [46]: left_join +Out[46]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 + +In [47]: right_join = df1.merge(df2, on=['key'], how='right') + +In [48]: right_join +Out[48]: + key value_x value_y +0 B -0.282863 1.212112 +1 D -1.135632 -0.173215 +2 D -1.135632 0.119209 +3 E NaN -1.044236 + +In [49]: outer_join = df1.merge(df2, on=['key'], how='outer') + +In [50]: outer_join +Out[50]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +5 E NaN -1.044236 +``` + +### 缺失数据(Missing data) + +与SAS一样,pandas具有丢失数据的表示 - 这是特殊浮点值``NaN``(不是数字)。许多语义都是相同的,例如,丢失的数据通过数字操作传播,默认情况下会被聚合忽略。 + +``` python +In [51]: outer_join +Out[51]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +5 E NaN -1.044236 + +In [52]: outer_join['value_x'] + outer_join['value_y'] +Out[52]: +0 NaN +1 0.929249 +2 NaN +3 -1.308847 +4 -1.016424 +5 NaN +dtype: float64 + +In [53]: outer_join['value_x'].sum() +Out[53]: -3.5940742896293765 +``` + +一个区别是丢失的数据无法与其哨兵值进行比较。例如,在SAS中,您可以执行此操作以过滤缺失值。 + +``` sas +data outer_join_nulls; + set outer_join; + if value_x = .; +run; + +data outer_join_no_nulls; + set outer_join; + if value_x ^= .; +run; +``` + +这在 Pandas 中不起作用。相反,应使用``pd.isna``或``pd.notna``函数进行比较。 + +``` python +In [54]: outer_join[pd.isna(outer_join['value_x'])] +Out[54]: + key value_x value_y +5 E NaN -1.044236 + +In [55]: outer_join[pd.notna(outer_join['value_x'])] +Out[55]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +``` + +pandas还提供了各种方法来处理丢失的数据 - 其中一些方法在SAS中表达起来很有挑战性。例如,有一些方法可以删除具有任何缺失值的所有行,使用指定值替换缺失值,例如平均值或前一行的前向填充。看到 + [丢失的数据文件](https://pandas.pydata.org/pandas-docs/stable/../user_guide/missing_data.html#missing-data)为多。 + +``` python +In [56]: outer_join.dropna() +Out[56]: + key value_x value_y +1 B -0.282863 1.212112 +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 + +In [57]: outer_join.fillna(method='ffill') +Out[57]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 1.212112 +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +5 E -1.135632 -1.044236 + +In [58]: outer_join['value_x'].fillna(outer_join['value_x'].mean()) +Out[58]: +0 0.469112 +1 -0.282863 +2 -1.509059 +3 -1.135632 +4 -1.135632 +5 -0.718815 +Name: value_x, dtype: float64 +``` + +### GroupBy + +#### 聚合(Aggregation) + +SAS的PROC SUMMARY可用于按一个或多个关键变量进行分组,并计算数字列上的聚合。 + +``` sas +proc summary data=tips nway; + class sex smoker; + var total_bill tip; + output out=tips_summed sum=; +run; +``` + +pandas提供了一种``groupby``允许类似聚合的灵活机制。有关 +更多详细信息和示例,请参阅[groupby文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/groupby.html#groupby)。 + +``` python +In [59]: tips_summed = tips.groupby(['sex', 'smoker'])['total_bill', 'tip'].sum() + +In [60]: tips_summed.head() +Out[60]: + total_bill tip +sex smoker +Female No 869.68 149.77 + Yes 527.27 96.74 +Male No 1725.75 302.00 + Yes 1217.07 183.07 +``` + +#### 转换(Transformation) + +在SAS中,如果组聚合需要与原始帧一起使用,则必须将它们合并在一起。例如,减去吸烟者组每次观察的平均值。 + +``` sas +proc summary data=tips missing nway; + class smoker; + var total_bill; + output out=smoker_means mean(total_bill)=group_bill; +run; + +proc sort data=tips; + by smoker; +run; + +data tips; + merge tips(in=a) smoker_means(in=b); + by smoker; + adj_total_bill = total_bill - group_bill; + if a and b; +run; +``` + +pandas ``groupby``提供了一种``transform``机制,允许在一个操作中简洁地表达这些类型的操作。 + +``` python +In [61]: gb = tips.groupby('smoker')['total_bill'] + +In [62]: tips['adj_total_bill'] = tips['total_bill'] - gb.transform('mean') + +In [63]: tips.head() +Out[63]: + total_bill tip sex smoker day time size adj_total_bill +67 1.07 1.00 Female Yes Sat Dinner 1 -17.686344 +92 3.75 1.00 Female Yes Fri Dinner 2 -15.006344 +111 5.25 1.00 Female No Sat Dinner 1 -11.938278 +145 6.35 1.50 Female No Thur Lunch 2 -10.838278 +135 6.51 1.25 Female No Thur Lunch 2 -10.678278 +``` + +#### 按组处理 + +除了聚合之外,``groupby``还可以使用pandas 通过SAS的组处理来复制大多数其他pandas 。例如,此``DATA``步骤按性别/吸烟者组读取数据,并过滤到每个的第一个条目。 + +``` sas +proc sort data=tips; + by sex smoker; +run; + +data tips_first; + set tips; + by sex smoker; + if FIRST.sex or FIRST.smoker then output; +run; +``` + +在 Pandas 中,这将写成: + +``` python +In [64]: tips.groupby(['sex', 'smoker']).first() +Out[64]: + total_bill tip day time size adj_total_bill +sex smoker +Female No 5.25 1.00 Sat Dinner 1 -11.938278 + Yes 1.07 1.00 Sat Dinner 1 -17.686344 +Male No 5.51 2.00 Thur Lunch 2 -11.678278 + Yes 5.25 5.15 Sun Dinner 2 -13.506344 +``` + +### 其他注意事项 + +#### 磁盘与内存 + +pandas仅在内存中运行,其中SAS数据集存在于磁盘上。这意味着可以在pandas中加载的数据大小受机器内存的限制,但对数据的操作可能更快。 + +如果需要进行核心处理,一种可能性是 + [dask.dataframe](https://dask.pydata.org/en/latest/dataframe.html) +库(目前正在开发中),它为磁盘上的pandas功能提供了一个子集``DataFrame`` + +#### 数据互操作 + +pandas提供了一种[``read_sas()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.read_sas.html#pandas.read_sas)方法,可以读取以XPORT或SAS7BDAT二进制格式保存的SAS数据。 + +``` sas +libname xportout xport 'transport-file.xpt'; +data xportout.tips; + set tips(rename=(total_bill=tbill)); + * xport variable names limited to 6 characters; +run; +``` + +``` python +df = pd.read_sas('transport-file.xpt') +df = pd.read_sas('binary-file.sas7bdat') +``` + +您也可以直接指定文件格式。默认情况下,pandas将尝试根据其扩展名推断文件格式。 + +``` python +df = pd.read_sas('transport-file.xpt', format='xport') +df = pd.read_sas('binary-file.sas7bdat', format='sas7bdat') +``` + +XPORT是一种相对有限的格式,它的解析并不像其他一些pandas读者那样优化。在SAS和pandas之间交换数据的另一种方法是序列化为csv。 + +``` python +# version 0.17, 10M rows + +In [8]: %time df = pd.read_sas('big.xpt') +Wall time: 14.6 s + +In [9]: %time df = pd.read_csv('big.csv') +Wall time: 4.86 s +``` + +## 与Stata的比较 + +对于来自 [Stata](https://en.wikipedia.org/wiki/Stata) 的潜在用户,本节旨在演示如何在 pandas 中做各种类似Stata的操作。 + +如果您是 pandas 的新手,您可能需要先阅读[十分钟入门Pandas](/docs/getting_started/10min.html) 以熟悉本库。 + +按照惯例,我们按如下方式导入 pandas 和 NumPy: + +``` python +In [1]: import pandas as pd + +In [2]: import numpy as np +``` + +::: tip 注意 + +在本教程中,``DataFrame``将通过调用显示 + pandas ``df.head()``,它将显示该行的前N行(默认为5行)``DataFrame``。这通常用于交互式工作(例如[Jupyter笔记本](https://jupyter.org/)或终端) - Stata中的等价物将是: + +``` +list in 1/5 +``` + +::: + +### 数据结构 + +#### 一般术语对照表 + +Pandas | Stata +---|--- +DataFrame | 数据集(data set) +column | 变量(variable) +row | 观察(observation) +groupby | bysort +NaN | . + +#### ``DataFrame``/ ``Series`` + +pandas 中的 ``DataFrame`` 类似于 ``Stata`` 数据集-具有不同类型的标记列的二维数据源。如本文档所示,几乎任何可以应用于Stata中的数据集的操作也可以在 pandas 中完成。 + +``Series`` 是表示DataFrame的一列的数据结构。Stata 对于单个列没有单独的数据结构,但是通常,使用 ``Series`` 类似于引用Stata中的数据集的列。 + +#### ``Index`` + +每个 ``DataFrame`` 和 ``Series`` 在数据 *行* 上都有一个叫 ``Index``-label 的标签。在 Stata 中没有相似的概念。在Stata中,数据集的行基本上是无标签的,除了可以用 ``_n`` 访问的隐式整数索引。 + +在pandas中,如果未指定索引,则默认情况下也使用整数索引(第一行= 0,第二行= 1,依此类推)。虽然使用标记``Index``或 + ``MultiIndex``可以启用复杂的分析,并且最终是 pandas 理解的重要部分,但是对于这种比较,我们基本上会忽略它, + ``Index``并且只是将其``DataFrame``视为列的集合。有关如何有效使用的更多信息, + 请参阅[索引文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/indexing.html#indexing)``Index``。 + +### 数据输入/输出 + +#### 从价值观构建数据帧 + +通过将数据放在``input``语句之后并指定列名,可以从指定值构建Stata数据集。 + +``` +input x y +1 2 +3 4 +5 6 +end +``` + +pandas 的 ``DataFrame`` 可以用许多不同的方式构建,但对于少量的值,通常可以方便地将其指定为Python字典,其中键是列名,值是数据。 + + +``` python +In [3]: df = pd.DataFrame({'x': [1, 3, 5], 'y': [2, 4, 6]}) + +In [4]: df +Out[4]: + x y +0 1 2 +1 3 4 +2 5 6 +``` + +#### 读取外部数据 + +与Stata一样,pandas提供了从多种格式读取数据的实用程序。``tips``在pandas测试([csv](https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv))中找到的数据集将用于以下许多示例中。 + +Stata提供将csv数据读入内存中的数据集。如果文件在当前工作目录中,我们可以按如下方式导入它。``import delimited````tips.csv`` + +``` +import delimited tips.csv +``` + +pandas 方法是[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.read_csv.html#pandas.read_csv)类似的。此外,如果提供了网址,它将自动下载数据集。 + +``` python +In [5]: url = ('https://raw.github.com/pandas-dev' + ...: '/pandas/master/pandas/tests/data/tips.csv') + ...: + +In [6]: tips = pd.read_csv(url) + +In [7]: tips.head() +Out[7]: + total_bill tip sex smoker day time size +0 16.99 1.01 Female No Sun Dinner 2 +1 10.34 1.66 Male No Sun Dinner 3 +2 21.01 3.50 Male No Sun Dinner 3 +3 23.68 3.31 Male No Sun Dinner 2 +4 24.59 3.61 Female No Sun Dinner 4 +``` + +比如,可以使用许多参数来指定数据应该如何解析。例如,如果数据是由制表符分隔的,没有列名,并且存在于当前工作目录中,则pandas命令将为:``import delimited``[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.read_csv.html#pandas.read_csv) + +``` python +tips = pd.read_csv('tips.csv', sep='\t', header=None) + +# alternatively, read_table is an alias to read_csv with tab delimiter +tips = pd.read_table('tips.csv', header=None) +``` + +pandas 还可以用于 ``.dta`` 的文件格式中。使用[``read_stata()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.read_stata.html#pandas.read_stata)函数读取格式的Stata数据集。 + +``` python +df = pd.read_stata('data.dta') +``` + +除了text / csv和Stata文件之外,pandas还支持各种其他数据格式,如Excel,SAS,HDF5,Parquet和SQL数据库。这些都是通过``pd.read_*`` +函数读取的。有关更多详细信息,请参阅[IO文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/io.html#io)。 + +#### 导出数据 + +stata 中 ``import delimated`` 的反向操作是 ``export delimated``。 + +``` +export delimited tips2.csv +``` + +类似地,在 pandas 中,``read_csv`` 的反向操作是[``DataFrame.to_csv()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv)。 + +``` python +tips.to_csv('tips2.csv') +``` + +pandas 还可以使用[``DataFrame.to_stata()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.to_stata.html#pandas.DataFrame.to_stata)方法导出为Stata文件格式。 + +``` python +tips.to_stata('tips2.dta') +``` + +### 数据操作 + +#### 列上的操作 + +在Stata中,任意数学表达式可以与新列或现有列上的``generate``和 + ``replace``命令一起使用。该``drop``命令从数据集中删除列。 + +``` +replace total_bill = total_bill - 2 +generate new_bill = total_bill / 2 +drop new_bill +``` + +pandas 通过指定个体提供了类似的矢量化操作``Series``中``DataFrame``。可以以相同的方式分配新列。该[``DataFrame.drop()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.drop.html#pandas.DataFrame.drop)方法从中删除一列``DataFrame``。 + +``` python +In [8]: tips['total_bill'] = tips['total_bill'] - 2 + +In [9]: tips['new_bill'] = tips['total_bill'] / 2 + +In [10]: tips.head() +Out[10]: + total_bill tip sex smoker day time size new_bill +0 14.99 1.01 Female No Sun Dinner 2 7.495 +1 8.34 1.66 Male No Sun Dinner 3 4.170 +2 19.01 3.50 Male No Sun Dinner 3 9.505 +3 21.68 3.31 Male No Sun Dinner 2 10.840 +4 22.59 3.61 Female No Sun Dinner 4 11.295 + +In [11]: tips = tips.drop('new_bill', axis=1) +``` + +#### 过滤 + +在Stata中过滤是通过 ``if`` 一个或多个列上的子句完成的。 + +``` +list if total_bill > 10 +``` + +DataFrame可以通过多种方式进行过滤; 最直观的是使用 + [布尔索引](https://pandas.pydata.org/pandas-docs/stable/../user_guide/indexing.html#indexing-boolean)。 + +``` python +In [12]: tips[tips['total_bill'] > 10].head() +Out[12]: + total_bill tip sex smoker day time size +0 14.99 1.01 Female No Sun Dinner 2 +2 19.01 3.50 Male No Sun Dinner 3 +3 21.68 3.31 Male No Sun Dinner 2 +4 22.59 3.61 Female No Sun Dinner 4 +5 23.29 4.71 Male No Sun Dinner 4 +``` + +#### 如果/那么逻辑 + +在Stata中,``if``子句也可用于创建新列。 + +``` +generate bucket = "low" if total_bill < 10 +replace bucket = "high" if total_bill >= 10 +``` + +使用 ``numpy`` 的 ``where`` 方法可以在 pandas 中完成相同的操作。 + +``` python +In [13]: tips['bucket'] = np.where(tips['total_bill'] < 10, 'low', 'high') + +In [14]: tips.head() +Out[14]: + total_bill tip sex smoker day time size bucket +0 14.99 1.01 Female No Sun Dinner 2 high +1 8.34 1.66 Male No Sun Dinner 3 low +2 19.01 3.50 Male No Sun Dinner 3 high +3 21.68 3.31 Male No Sun Dinner 2 high +4 22.59 3.61 Female No Sun Dinner 4 high +``` + +#### 日期功能 + +Stata提供了各种函数来对date / datetime列进行操作。 + +``` +generate date1 = mdy(1, 15, 2013) +generate date2 = date("Feb152015", "MDY") + +generate date1_year = year(date1) +generate date2_month = month(date2) + +* shift date to beginning of next month +generate date1_next = mdy(month(date1) + 1, 1, year(date1)) if month(date1) != 12 +replace date1_next = mdy(1, 1, year(date1) + 1) if month(date1) == 12 +generate months_between = mofd(date2) - mofd(date1) + +list date1 date2 date1_year date2_month date1_next months_between +``` + +等效的 pandas 操作如下所示。除了这些功能外,pandas 还支持 Stata 中不具备的其他时间序列功能(例如时区处理和自定义偏移) - 有关详细信息,请参阅[时间序列文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/timeseries.html#timeseries)。 + +``` python +In [15]: tips['date1'] = pd.Timestamp('2013-01-15') + +In [16]: tips['date2'] = pd.Timestamp('2015-02-15') + +In [17]: tips['date1_year'] = tips['date1'].dt.year + +In [18]: tips['date2_month'] = tips['date2'].dt.month + +In [19]: tips['date1_next'] = tips['date1'] + pd.offsets.MonthBegin() + +In [20]: tips['months_between'] = (tips['date2'].dt.to_period('M') + ....: - tips['date1'].dt.to_period('M')) + ....: + +In [21]: tips[['date1', 'date2', 'date1_year', 'date2_month', 'date1_next', + ....: 'months_between']].head() + ....: +Out[21]: + date1 date2 date1_year date2_month date1_next months_between +0 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +1 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +2 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +3 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +4 2013-01-15 2015-02-15 2013 2 2013-02-01 <25 * MonthEnds> +``` + +#### 列的选择 + +Stata 提供了选择,删除和重命名列的关键字。 + +``` +keep sex total_bill tip + +drop sex + +rename total_bill total_bill_2 +``` + +下面的 pandas 表示相同的操作。请注意,与 Stata 相比,这些操作不会发生。要使这些更改保持不变,请将操作分配回变量。 + +``` python +# keep +In [22]: tips[['sex', 'total_bill', 'tip']].head() +Out[22]: + sex total_bill tip +0 Female 14.99 1.01 +1 Male 8.34 1.66 +2 Male 19.01 3.50 +3 Male 21.68 3.31 +4 Female 22.59 3.61 + +# drop +In [23]: tips.drop('sex', axis=1).head() +Out[23]: + total_bill tip smoker day time size +0 14.99 1.01 No Sun Dinner 2 +1 8.34 1.66 No Sun Dinner 3 +2 19.01 3.50 No Sun Dinner 3 +3 21.68 3.31 No Sun Dinner 2 +4 22.59 3.61 No Sun Dinner 4 + +# rename +In [24]: tips.rename(columns={'total_bill': 'total_bill_2'}).head() +Out[24]: + total_bill_2 tip sex smoker day time size +0 14.99 1.01 Female No Sun Dinner 2 +1 8.34 1.66 Male No Sun Dinner 3 +2 19.01 3.50 Male No Sun Dinner 3 +3 21.68 3.31 Male No Sun Dinner 2 +4 22.59 3.61 Female No Sun Dinner 4 +``` + +#### 按值排序 + +Stata中的排序是通过 ``sort`` + +``` +sort sex total_bill +``` + +pandas 对象有一个[``DataFrame.sort_values()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.sort_values.html#pandas.DataFrame.sort_values)方法,它采用列表进行排序。 + +``` python +In [25]: tips = tips.sort_values(['sex', 'total_bill']) + +In [26]: tips.head() +Out[26]: + total_bill tip sex smoker day time size +67 1.07 1.00 Female Yes Sat Dinner 1 +92 3.75 1.00 Female Yes Fri Dinner 2 +111 5.25 1.00 Female No Sat Dinner 1 +145 6.35 1.50 Female No Thur Lunch 2 +135 6.51 1.25 Female No Thur Lunch 2 +``` + +### 字符串处理 + +#### 查找字符串的长度 + +Stata 分别使用ASCII和Unicode字符串 ``strlen()`` 和 ``ustrlen()`` 函数确定字符串的长度。 + +``` +generate strlen_time = strlen(time) +generate ustrlen_time = ustrlen(time) +``` + +Python 使用该 ``len`` 函数确定字符串的长度。在Python 3中,所有字符串都是Unicode字符串。``len``包括尾随空白。使用``len``和``rstrip``排除尾随空格。 + +``` python +In [27]: tips['time'].str.len().head() +Out[27]: +67 6 +92 6 +111 6 +145 5 +135 5 +Name: time, dtype: int64 + +In [28]: tips['time'].str.rstrip().str.len().head() +Out[28]: +67 6 +92 6 +111 6 +145 5 +135 5 +Name: time, dtype: int64 +``` + +#### 找到字符串的位置 + +Stata使用该``strpos()``函数确定字符串中字符的位置。这将获取第一个参数定义的字符串,并搜索您提供的子字符串的第一个位置作为第二个参数。 + +``` +generate str_position = strpos(sex, "ale") +``` + +Python使用``find()``函数确定字符串中字符的位置。``find``搜索子字符串的第一个位置。如果找到子字符串,则该函数返回其位置。请记住,Python索引是从零开始的,如果找不到子串,函数将返回-1。 + +``` python +In [29]: tips['sex'].str.find("ale").head() +Out[29]: +67 3 +92 3 +111 3 +145 3 +135 3 +Name: sex, dtype: int64 +``` + +#### 按位置提取字符串 + +Stata根据``substr()``函数的位置从字符串中提取字符串。 + +``` +generate short_sex = substr(sex, 1, 1) +``` + +使用pandas,您可以使用``[]``符号从位置位置提取字符串中的子字符串。请记住,Python索引是从零开始的。 + +``` python +In [30]: tips['sex'].str[0:1].head() +Out[30]: +67 F +92 F +111 F +145 F +135 F +Name: sex, dtype: object +``` + +#### 提取第n个字符 + +Stata ``word()``函数返回字符串中的第n个单词。第一个参数是要解析的字符串,第二个参数指定要提取的字。 + +``` +clear +input str20 string +"John Smith" +"Jane Cook" +end + +generate first_name = word(name, 1) +generate last_name = word(name, -1) +``` + +Python使用正则表达式根据文本从字符串中提取字符串。有更强大的方法,但这只是一个简单的方法。 + +``` python +In [31]: firstlast = pd.DataFrame({'string': ['John Smith', 'Jane Cook']}) + +In [32]: firstlast['First_Name'] = firstlast['string'].str.split(" ", expand=True)[0] + +In [33]: firstlast['Last_Name'] = firstlast['string'].str.rsplit(" ", expand=True)[0] + +In [34]: firstlast +Out[34]: + string First_Name Last_Name +0 John Smith John John +1 Jane Cook Jane Jane +``` + +#### 改变案例 + +所述的Stata ``strupper()``,``strlower()``,``strproper()``, + ``ustrupper()``,``ustrlower()``,和``ustrtitle()``功能分别改变ASCII和Unicode字符串的情况下,。 + +``` +clear +input str20 string +"John Smith" +"Jane Cook" +end + +generate upper = strupper(string) +generate lower = strlower(string) +generate title = strproper(string) +list +``` + +等效Python的功能``upper``,``lower``和``title``。 + +``` python +In [35]: firstlast = pd.DataFrame({'string': ['John Smith', 'Jane Cook']}) + +In [36]: firstlast['upper'] = firstlast['string'].str.upper() + +In [37]: firstlast['lower'] = firstlast['string'].str.lower() + +In [38]: firstlast['title'] = firstlast['string'].str.title() + +In [39]: firstlast +Out[39]: + string upper lower title +0 John Smith JOHN SMITH john smith John Smith +1 Jane Cook JANE COOK jane cook Jane Cook +``` + +### 合并 + +合并示例中将使用以下表格 + +``` python +In [40]: df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], + ....: 'value': np.random.randn(4)}) + ....: + +In [41]: df1 +Out[41]: + key value +0 A 0.469112 +1 B -0.282863 +2 C -1.509059 +3 D -1.135632 + +In [42]: df2 = pd.DataFrame({'key': ['B', 'D', 'D', 'E'], + ....: 'value': np.random.randn(4)}) + ....: + +In [43]: df2 +Out[43]: + key value +0 B 1.212112 +1 D -0.173215 +2 D 0.119209 +3 E -1.044236 +``` + +在Stata中,要执行合并,一个数据集必须在内存中,另一个必须作为磁盘上的文件名引用。相比之下,Python必须``DataFrames``已经在内存中。 + +默认情况下,Stata执行外部联接,其中两个数据集的所有观察值在合并后都保留在内存中。通过使用在``_merge``变量中创建的值,可以仅保留来自初始数据集,合并数据集或两者的交集的观察 + 。 + +``` +* First create df2 and save to disk +clear +input str1 key +B +D +D +E +end +generate value = rnormal() +save df2.dta + +* Now create df1 in memory +clear +input str1 key +A +B +C +D +end +generate value = rnormal() + +preserve + +* Left join +merge 1:n key using df2.dta +keep if _merge == 1 + +* Right join +restore, preserve +merge 1:n key using df2.dta +keep if _merge == 2 + +* Inner join +restore, preserve +merge 1:n key using df2.dta +keep if _merge == 3 + +* Outer join +restore +merge 1:n key using df2.dta +``` + +pandas 的 DataFrames 有一个[``DataFrame.merge()``](https://pandas.pydata.org/pandas-docs/stable/../reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge)提供类似功能的方法。请注意,通过``how``关键字可以实现不同的连接类型。 + +``` python +In [44]: inner_join = df1.merge(df2, on=['key'], how='inner') + +In [45]: inner_join +Out[45]: + key value_x value_y +0 B -0.282863 1.212112 +1 D -1.135632 -0.173215 +2 D -1.135632 0.119209 + +In [46]: left_join = df1.merge(df2, on=['key'], how='left') + +In [47]: left_join +Out[47]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 + +In [48]: right_join = df1.merge(df2, on=['key'], how='right') + +In [49]: right_join +Out[49]: + key value_x value_y +0 B -0.282863 1.212112 +1 D -1.135632 -0.173215 +2 D -1.135632 0.119209 +3 E NaN -1.044236 + +In [50]: outer_join = df1.merge(df2, on=['key'], how='outer') + +In [51]: outer_join +Out[51]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +5 E NaN -1.044236 +``` + +### 缺少数据 + +像Stata一样,pandas 有缺失数据的表示 - 特殊浮点值``NaN``(不是数字)。许多语义都是一样的; 例如,丢失的数据通过数字操作传播,默认情况下会被聚合忽略。 + +``` python +In [52]: outer_join +Out[52]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +5 E NaN -1.044236 + +In [53]: outer_join['value_x'] + outer_join['value_y'] +Out[53]: +0 NaN +1 0.929249 +2 NaN +3 -1.308847 +4 -1.016424 +5 NaN +dtype: float64 + +In [54]: outer_join['value_x'].sum() +Out[54]: -3.5940742896293765 +``` + +一个区别是丢失的数据无法与其哨兵值进行比较。例如,在 Stata 中,您可以执行此操作以过滤缺失值。 + +``` +* Keep missing values +list if value_x == . +* Keep non-missing values +list if value_x != . +``` + +这在 pandas 中不起作用。相反,应使用``pd.isna()``或``pd.notna()``函数进行比较。 + +``` python +In [55]: outer_join[pd.isna(outer_join['value_x'])] +Out[55]: + key value_x value_y +5 E NaN -1.044236 + +In [56]: outer_join[pd.notna(outer_join['value_x'])] +Out[56]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 NaN +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +``` + +pandas 还提供了多种处理丢失数据的方法,其中一些方法在Stata中表达起来很有挑战性。例如,有一些方法可以删除具有任何缺失值的所有行,用指定值(如平均值)替换缺失值,或从前一行向前填充。有关详细信息,请参阅[缺失数据文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/missing_data.html#missing-data)。 + +``` python +# Drop rows with any missing value +In [57]: outer_join.dropna() +Out[57]: + key value_x value_y +1 B -0.282863 1.212112 +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 + +# Fill forwards +In [58]: outer_join.fillna(method='ffill') +Out[58]: + key value_x value_y +0 A 0.469112 NaN +1 B -0.282863 1.212112 +2 C -1.509059 1.212112 +3 D -1.135632 -0.173215 +4 D -1.135632 0.119209 +5 E -1.135632 -1.044236 + +# Impute missing values with the mean +In [59]: outer_join['value_x'].fillna(outer_join['value_x'].mean()) +Out[59]: +0 0.469112 +1 -0.282863 +2 -1.509059 +3 -1.135632 +4 -1.135632 +5 -0.718815 +Name: value_x, dtype: float64 +``` + +### 的GroupBy + +#### 聚合 + +Stata ``collapse``可用于按一个或多个关键变量进行分组,并计算数字列上的聚合。 + +``` +collapse (sum) total_bill tip, by(sex smoker) +``` + +pandas提供了一种``groupby``允许类似聚合的灵活机制。有关 +更多详细信息和示例,请参阅[groupby文档](https://pandas.pydata.org/pandas-docs/stable/../user_guide/groupby.html#groupby)。 + +``` python +In [60]: tips_summed = tips.groupby(['sex', 'smoker'])['total_bill', 'tip'].sum() + +In [61]: tips_summed.head() +Out[61]: + total_bill tip +sex smoker +Female No 869.68 149.77 + Yes 527.27 96.74 +Male No 1725.75 302.00 + Yes 1217.07 183.07 +``` + +#### 转换 + +在Stata中,如果组聚合需要与原始数据集一起使用``bysort``,通常会使用``egen()``。例如,减去吸烟者组每次观察的平均值。 + +``` +bysort sex smoker: egen group_bill = mean(total_bill) +generate adj_total_bill = total_bill - group_bill +``` + +pandas ``groupby``提供了一种``transform``机制,允许在一个操作中简洁地表达这些类型的操作。 + +``` python +In [62]: gb = tips.groupby('smoker')['total_bill'] + +In [63]: tips['adj_total_bill'] = tips['total_bill'] - gb.transform('mean') + +In [64]: tips.head() +Out[64]: + total_bill tip sex smoker day time size adj_total_bill +67 1.07 1.00 Female Yes Sat Dinner 1 -17.686344 +92 3.75 1.00 Female Yes Fri Dinner 2 -15.006344 +111 5.25 1.00 Female No Sat Dinner 1 -11.938278 +145 6.35 1.50 Female No Thur Lunch 2 -10.838278 +135 6.51 1.25 Female No Thur Lunch 2 -10.678278 +``` + +#### 按组处理 + +除聚合外,pandas ``groupby``还可用于复制``bysort``Stata中的大多数其他处理。例如,以下示例按性别/吸烟者组列出当前排序顺序中的第一个观察结果。 + +``` +bysort sex smoker: list if _n == 1 +``` + +在 pandas 中,这将写成: + +``` python +In [65]: tips.groupby(['sex', 'smoker']).first() +Out[65]: + total_bill tip day time size adj_total_bill +sex smoker +Female No 5.25 1.00 Sat Dinner 1 -11.938278 + Yes 1.07 1.00 Sat Dinner 1 -17.686344 +Male No 5.51 2.00 Thur Lunch 2 -11.678278 + Yes 5.25 5.15 Sun Dinner 2 -13.506344 +``` + +### 其他注意事项 + +#### 磁盘与内存 + +pandas 和 Stata 都只在内存中运行。这意味着能够在 pandas 中加载的数据大小受机器内存的限制。如果需要进行核心处理,则有一种可能性是[dask.dataframe](http://dask.pydata.org/en/latest/dataframe.html) 库,它为磁盘上的pandas功能提供了一个子集``DataFrame``。 diff --git a/Python/pandas/getting_started/dsintro.md b/Python/pandas/getting_started/dsintro.md new file mode 100644 index 00000000..efa7031e --- /dev/null +++ b/Python/pandas/getting_started/dsintro.md @@ -0,0 +1,1286 @@ +# 数据结构简介 + +本节介绍 Pandas 基础数据结构,包括各类对象的数据类型、索引、轴标记、对齐等基础操作。首先,导入 NumPy 和 Pandas: + +```python +In [1]: import numpy as np + +In [2]: import pandas as pd +``` + +“**数据对齐是内在的**”,这一原则是根本。除非显式指定,Pandas 不会断开标签和数据之间的连接。 + +下文先简单介绍数据结构,然后再分门别类介绍每种功能与方法。 + +## Series + +[`Series`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.html#Pandas.Series) 是带标签的一维数组,可存储整数、浮点数、字符串、Python 对象等类型的数据。轴标签统称为**索引**。调用 `pd.Series` 函数即可创建 Series: + +```python +>>> s = pd.Series(data, index=index) +``` + +上述代码中,`data` 支持以下数据类型: + +* Python 字典 +* 多维数组 +* 标量值(如,5) + +`index` 是轴标签列表。不同**数据**可分为以下几种情况: + +**多维数组** + +`data` 是多维数组时,**index** 长度必须与 **data** 长度一致。没有指定 `index` 参数时,创建数值型索引,即 `[0, ..., len(data) - 1]`。 + +```python +In [3]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) + +In [4]: s +Out[4]: +a 0.469112 +b -0.282863 +c -1.509059 +d -1.135632 +e 1.212112 +dtype: float64 + +In [5]: s.index +Out[5]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object') + +In [6]: pd.Series(np.random.randn(5)) +Out[6]: +0 -0.173215 +1 0.119209 +2 -1.044236 +3 -0.861849 +4 -2.104569 +dtype: float64 +``` + +::: tip 注意 + +Pandas 的索引值可以重复。不支持重复索引值的操作会触发异常。其原因主要与性能有关,有很多计算实例,比如 GroupBy 操作就不用索引。 + +::: + +**字典** + +Series 可以用字典实例化: + +```python +In [7]: d = {'b': 1, 'a': 0, 'c': 2} + +In [8]: pd.Series(d) +Out[8]: +b 1 +a 0 +c 2 +dtype: int64 +``` + +::: tip 注意 + +`data` 为字典,且未设置 `index` 参数时,如果 Python 版本 >= 3.6 且 Pandas 版本 >= 0.23,`Series` 按字典的插入顺序排序索引。 + +Python < 3.6 或 Pandas < 0.23,且未设置 `index` 参数时,`Series` 按字母顺序排序字典的键(key)列表。 + +::: + +上例中,如果 Python < 3.6 或 Pandas < 0.23,`Series` 按字母排序字典的键。输出结果不是 ` ['b', 'a', 'c']`,而是 `['a', 'b', 'c']`。 + +如果设置了 `index` 参数,则按索引标签提取 `data` 里对应的值。 + +```python +In [9]: d = {'a': 0., 'b': 1., 'c': 2.} + +In [10]: pd.Series(d) +Out[10]: +a 0.0 +b 1.0 +c 2.0 +dtype: float64 + +In [11]: pd.Series(d, index=['b', 'c', 'd', 'a']) +Out[11]: +b 1.0 +c 2.0 +d NaN +a 0.0 +dtype: float64 +``` + +::: tip 注意 + +Pandas 用 `NaN`(Not a Number)表示**缺失数据**。 + +::: + +**标量值** + +`data` 是标量值时,必须提供索引。`Series` 按**索引**长度重复该标量值。 + +```python +In [12]: pd.Series(5., index=['a', 'b', 'c', 'd', 'e']) +Out[12]: +a 5.0 +b 5.0 +c 5.0 +d 5.0 +e 5.0 +dtype: float64 +``` + +### Series 类似多维数组 + +`Series` 操作与 `ndarray` 类似,支持大多数 NumPy 函数,还支持索引切片。 + +```python +In [13]: s[0] +Out[13]: 0.4691122999071863 + +In [14]: s[:3] +Out[14]: +a 0.469112 +b -0.282863 +c -1.509059 +dtype: float64 + +In [15]: s[s > s.median()] +Out[15]: +a 0.469112 +e 1.212112 +dtype: float64 + +In [16]: s[[4, 3, 1]] +Out[16]: +e 1.212112 +d -1.135632 +b -0.282863 +dtype: float64 + +In [17]: np.exp(s) +Out[17]: +a 1.598575 +b 0.753623 +c 0.221118 +d 0.321219 +e 3.360575 +dtype: float64 +``` + +::: tip 注意 + +[索引与选择数据](https://Pandas.pydata.org/Pandas-docs/stable/user_guide/indexing.html#indexing)一节介绍了 `s[[4, 3, 1]]` 等数组索引操作。 + +::: + +和 NumPy 数组一样,Series 也支持 [`dtype`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.dtype.html#Pandas.Series.dtype)。 + +```python +In [18]: s.dtype +Out[18]: dtype('float64') +``` + +`Series` 的数据类型一般是 NumPy 数据类型。不过,Pandas 和第三方库在一些方面扩展了 NumPy 类型系统,即[`扩展数据类型`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.api.extensions.ExtensionDtype.html#Pandas.api.extensions.ExtensionDtype)。比如,Pandas 的[类别型数据](https://Pandas.pydata.org/Pandas-docs/stable/user_guide/categorical.html#categorical)与[可空整数数据类型](https://Pandas.pydata.org/Pandas-docs/stable/user_guide/integer_na.html#integer-na)。更多信息,请参阅[数据类型](basics.html#basics-dtypes) 。 + +[`Series.array`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.array.html#Pandas.Series.array) 用于提取 `Series` 数组。 + +```python +In [19]: s.array +Out[19]: + +[ 0.4691122999071863, -0.2828633443286633, -1.5090585031735124, + -1.1356323710171934, 1.2121120250208506] +Length: 5, dtype: float64 +``` + +执行不用索引的操作时,如禁用[自动对齐](#dsintro-alignment),访问数组非常有用。 + +[`Series.array`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.array.html#Pandas.Series.array) 一般是[`扩展数组`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.api.extensions.ExtensionArray.html#Pandas.api.extensions.ExtensionArray)。简单说,扩展数组是把 N 个 [`numpy.ndarray`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray) 包在一起的打包器。Pandas 知道怎么把`扩展数组`存储到 `Series` 或 `DataFrame` 的列里。更多信息,请参阅[数据类型](basics.html#basics-dtypes)。 + +Series 只是类似于多维数组,提取**真正**的多维数组,要用 + [`Series.to_numpy()`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.to_numpy.html#Pandas.Series.to_numpy)。 + +```python +In [20]: s.to_numpy() +Out[20]: array([ 0.4691, -0.2829, -1.5091, -1.1356, 1.2121]) +``` + +Series 是[`扩展数组`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.api.extensions.ExtensionArray.html#Pandas.api.extensions.ExtensionArray) ,[`Series.to_numpy()`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.to_numpy.html#Pandas.Series.to_numpy) 返回的是 NumPy 多维数组。 + +### Series 类似字典 + +Series 类似固定大小的字典,可以用索引标签提取值或设置值: + +```python +In [21]: s['a'] +Out[21]: 0.4691122999071863 + +In [22]: s['e'] = 12. + +In [23]: s +Out[23]: +a 0.469112 +b -0.282863 +c -1.509059 +d -1.135632 +e 12.000000 +dtype: float64 + +In [24]: 'e' in s +Out[24]: True + +In [25]: 'f' in s +Out[25]: False +``` + +引用 `Series` 里没有的标签会触发异常: + +```python +>>> s['f'] +KeyError: 'f' +``` + +`get` 方法可以提取 `Series` 里没有的标签,返回 `None` 或指定默认值: + +```python +In [26]: s.get('f') + +In [27]: s.get('f', np.nan) +Out[27]: nan +``` + +更多信息,请参阅[属性访问](https://Pandas.pydata.org/Pandas-docs/stable/user_guide/indexing.html#indexing-attribute-access)。 + +### 矢量操作与对齐 Series 标签 + +Series 和 NumPy 数组一样,都不用循环每个值,而且 Series 支持大多数 NumPy 多维数组的方法。 + +```python +In [28]: s + s +Out[28]: +a 0.938225 +b -0.565727 +c -3.018117 +d -2.271265 +e 24.000000 +dtype: float64 + +In [29]: s * 2 +Out[29]: +a 0.938225 +b -0.565727 +c -3.018117 +d -2.271265 +e 24.000000 +dtype: float64 + +In [30]: np.exp(s) +Out[30]: +a 1.598575 +b 0.753623 +c 0.221118 +d 0.321219 +e 162754.791419 +dtype: float64 +``` + +Series 和多维数组的主要区别在于, Series 之间的操作会自动基于标签对齐数据。因此,不用顾及执行计算操作的 Series 是否有相同的标签。 + +```python +In [31]: s[1:] + s[:-1] +Out[31]: +a NaN +b -0.565727 +c -3.018117 +d -2.271265 +e NaN +dtype: float64 +``` + +操作未对齐索引的 Series, 其计算结果是所有涉及索引的**并集**。如果在 Series 里找不到标签,运算结果标记为 `NaN`,即缺失值。编写无需显式对齐数据的代码,给交互数据分析和研究提供了巨大的自由度和灵活性。Pandas 数据结构集成的数据对齐功能,是 Pandas 区别于大多数标签型数据处理工具的重要特性。 + +::: tip 注意 + +总之,让不同索引对象操作的默认结果生成索引**并集**,是为了避免信息丢失。就算缺失了数据,索引标签依然包含计算的重要信息。当然,也可以用**`dropna`** 函数清除含有缺失值的标签。 + +::: + +### 名称属性 + +Series 支持 `name` 属性: + +```python +In [32]: s = pd.Series(np.random.randn(5), name='something') + +In [33]: s +Out[33]: +0 -0.494929 +1 1.071804 +2 0.721555 +3 -0.706771 +4 -1.039575 +Name: something, dtype: float64 + +In [34]: s.name +Out[34]: 'something' +``` + +一般情况下,Series 自动分配 `name`,特别是提取一维 DataFrame 切片时,详见下文。 + +*0.18.0 版新增。* + +[`pandas.Series.rename()`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.rename.html#Pandas.Series.rename) 方法用于重命名 Series 。 + +```python +In [35]: s2 = s.rename("different") + +In [36]: s2.name +Out[36]: 'different' +``` + +注意,`s` 与 `s2` 指向不同的对象。 + +## DataFrame + +**DataFrame** 是由多种类型的列构成的二维标签数据结构,类似于 Excel 、SQL 表,或 Series 对象构成的字典。DataFrame 是最常用的 Pandas 对象,与 Series 一样,DataFrame 支持多种类型的输入数据: + +- 一维 ndarray、列表、字典、Series 字典 +- 二维 numpy.ndarray +- [结构多维数组或记录多维数组](https://docs.scipy.org/doc/numpy/user/basics.rec.html) +- `Series` +- `DataFrame` + +除了数据,还可以有选择地传递 **index**(行标签)和 **columns**(列标签)参数。传递了索引或列,就可以确保生成的 DataFrame 里包含索引或列。Series 字典加上指定索引时,会丢弃与传递的索引不匹配的所有数据。 + +没有传递轴标签时,按常规依据输入数据进行构建。 + +::: tip 注意 + +Python > = 3.6,且 Pandas > = 0.23,数据是字典,且未指定 `columns` 参数时,`DataFrame` 的列按字典的插入顺序排序。 + +Python < 3.6 或 Pandas < 0.23,且未指定 `columns` 参数时,`DataFrame` 的列按字典键的字母排序。 + +::: + +### 用 Series 字典或字典生成 DataFrame + +生成的**索引**是每个 **Series** 索引的并集。先把嵌套字典转换为 Series。如果没有指定列,DataFrame 的列就是字典键的有序列表。 + +```python +In [37]: d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']), + ....: 'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} + ....: + +In [38]: df = pd.DataFrame(d) + +In [39]: df +Out[39]: + one two +a 1.0 1.0 +b 2.0 2.0 +c 3.0 3.0 +d NaN 4.0 + +In [40]: pd.DataFrame(d, index=['d', 'b', 'a']) +Out[40]: + one two +d NaN 4.0 +b 2.0 2.0 +a 1.0 1.0 + +In [41]: pd.DataFrame(d, index=['d', 'b', 'a'], columns=['two', 'three']) +Out[41]: + two three +d 4.0 NaN +b 2.0 NaN +a 1.0 NaN +``` + +**index** 和 **columns** 属性分别用于访问行、列标签: + +::: tip 注意 + +指定列与数据字典一起传递时,传递的列会覆盖字典的键。 + +::: + +```python +In [42]: df.index +Out[42]: Index(['a', 'b', 'c', 'd'], dtype='object') + +In [43]: df.columns +Out[43]: Index(['one', 'two'], dtype='object') +``` + +### 用多维数组字典、列表字典生成 DataFrame + +多维数组的长度必须相同。如果传递了索引参数,`index` 的长度必须与数组一致。如果没有传递索引参数,生成的结果是 `range(n)`,`n` 为数组长度。 + +```python +In [44]: d = {'one': [1., 2., 3., 4.], + ....: 'two': [4., 3., 2., 1.]} + ....: + +In [45]: pd.DataFrame(d) +Out[45]: + one two +0 1.0 4.0 +1 2.0 3.0 +2 3.0 2.0 +3 4.0 1.0 + +In [46]: pd.DataFrame(d, index=['a', 'b', 'c', 'd']) +Out[46]: + one two +a 1.0 4.0 +b 2.0 3.0 +c 3.0 2.0 +d 4.0 1.0 +``` + +### 用结构多维数组或记录多维数组生成 DataFrame + +本例与数组字典的操作方式相同。 + +```python +In [47]: data = np.zeros((2, ), dtype=[('A', 'i4'), ('B', 'f4'), ('C', 'a10')]) + +In [48]: data[:] = [(1, 2., 'Hello'), (2, 3., "World")] + +In [49]: pd.DataFrame(data) +Out[49]: + A B C +0 1 2.0 b'Hello' +1 2 3.0 b'World' + +In [50]: pd.DataFrame(data, index=['first', 'second']) +Out[50]: + A B C +first 1 2.0 b'Hello' +second 2 3.0 b'World' + +In [51]: pd.DataFrame(data, columns=['C', 'A', 'B']) +Out[51]: + C A B +0 b'Hello' 1 2.0 +1 b'World' 2 3.0 +``` + +::: tip 注意 + +DataFrame 的运作方式与 NumPy 二维数组不同。 + +::: + +### 用列表字典生成 DataFrame + +```python +In [52]: data2 = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}] + +In [53]: pd.DataFrame(data2) +Out[53]: + a b c +0 1 2 NaN +1 5 10 20.0 + +In [54]: pd.DataFrame(data2, index=['first', 'second']) +Out[54]: + a b c +first 1 2 NaN +second 5 10 20.0 + +In [55]: pd.DataFrame(data2, columns=['a', 'b']) +Out[55]: + a b +0 1 2 +1 5 10 +``` + +### 用元组字典生成 DataFrame + +元组字典可以自动创建多层索引 DataFrame。 + +```python +In [56]: pd.DataFrame({('a', 'b'): {('A', 'B'): 1, ('A', 'C'): 2}, + ....: ('a', 'a'): {('A', 'C'): 3, ('A', 'B'): 4}, + ....: ('a', 'c'): {('A', 'B'): 5, ('A', 'C'): 6}, + ....: ('b', 'a'): {('A', 'C'): 7, ('A', 'B'): 8}, + ....: ('b', 'b'): {('A', 'D'): 9, ('A', 'B'): 10}}) + ....: +Out[56]: + a b + b a c a b +A B 1.0 4.0 5.0 8.0 10.0 + C 2.0 3.0 6.0 7.0 NaN + D NaN NaN NaN NaN 9.0 +``` + +### 用 Series 创建 DataFrame + +生成的 DataFrame 继承了输入的 Series 的索引,如果没有指定列名,默认列名是输入 Series 的名称。 + +**缺失数据** + +更多内容,详见[缺失数据](https://Pandas.pydata.org/Pandas-docs/stable/user_guide/missing_data.html#missing-data) 。DataFrame 里的缺失值用 `np.nan` 表示。DataFrame 构建器以 `numpy.MaskedArray` 为参数时 ,被屏蔽的条目为缺失数据。 + +### 备选构建器 + +**DataFrame.from_dict** + +`DataFrame.from_dict` 接收字典组成的字典或数组序列字典,并生成 DataFrame。除了 `orient` 参数默认为 `columns`,本构建器的操作与 `DataFrame` 构建器类似。把 `orient` 参数设置为 `'index'`, 即可把字典的键作为行标签。 + +```python +In [57]: pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])])) +Out[57]: + A B +0 1 4 +1 2 5 +2 3 6 +``` + +`orient='index'` 时,键是行标签。本例还传递了列名: + +```python +In [58]: pd.DataFrame.from_dict(dict([('A', [1, 2, 3]), ('B', [4, 5, 6])]), + ....: orient='index', columns=['one', 'two', 'three']) + ....: +Out[58]: + one two three +A 1 2 3 +B 4 5 6 +``` + +**DataFrame.from_records** + +`DataFrame.from_records` 构建器支持元组列表或结构数据类型(`dtype`)的多维数组。本构建器与 `DataFrame` 构建器类似,只不过生成的 DataFrame 索引是结构数据类型指定的字段。例如: + +```python +In [59]: data +Out[59]: +array([(1, 2., b'Hello'), (2, 3., b'World')], + dtype=[('A', ' 2 + +In [64]: df +Out[64]: + one two three flag +a 1.0 1.0 1.0 False +b 2.0 2.0 4.0 False +c 3.0 3.0 9.0 True +d NaN 4.0 NaN False +``` + +删除(del、pop)列的方式也与字典类似: + +```python +In [65]: del df['two'] + +In [66]: three = df.pop('three') + +In [67]: df +Out[67]: + one flag +a 1.0 False +b 2.0 False +c 3.0 True +d NaN False +``` + +标量值以广播的方式填充列: + +```python +In [68]: df['foo'] = 'bar' + +In [69]: df +Out[69]: + one flag foo +a 1.0 False bar +b 2.0 False bar +c 3.0 True bar +d NaN False bar +``` + +插入与 DataFrame 索引不同的 Series 时,以 DataFrame 的索引为准: + +```python +In [70]: df['one_trunc'] = df['one'][:2] + +In [71]: df +Out[71]: + one flag foo one_trunc +a 1.0 False bar 1.0 +b 2.0 False bar 2.0 +c 3.0 True bar NaN +d NaN False bar NaN +``` + +可以插入原生多维数组,但长度必须与 DataFrame 索引长度一致。 + +默认在 DataFrame 尾部插入列。`insert` 函数可以指定插入列的位置: + +```python +In [72]: df.insert(1, 'bar', df['one']) + +In [73]: df +Out[73]: + one bar flag foo one_trunc +a 1.0 1.0 False bar 1.0 +b 2.0 2.0 False bar 2.0 +c 3.0 3.0 True bar NaN +d NaN NaN False bar NaN +``` + +### 用方法链分配新列 + +受 [dplyr](https://dplyr.tidyverse.org/reference/mutate.html) 的 `mutate` 启发,DataFrame 提供了 [`assign()`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.DataFrame.assign.html#Pandas.DataFrame.assign) 方法,可以利用现有的列创建新列。 + +```python +In [74]: iris = pd.read_csv('data/iris.data') + +In [75]: iris.head() +Out[75]: + SepalLength SepalWidth PetalLength PetalWidth Name +0 5.1 3.5 1.4 0.2 Iris-setosa +1 4.9 3.0 1.4 0.2 Iris-setosa +2 4.7 3.2 1.3 0.2 Iris-setosa +3 4.6 3.1 1.5 0.2 Iris-setosa +4 5.0 3.6 1.4 0.2 Iris-setosa + +In [76]: (iris.assign(sepal_ratio=iris['SepalWidth'] / iris['SepalLength']) + ....: .head()) + ....: +Out[76]: + SepalLength SepalWidth PetalLength PetalWidth Name sepal_ratio +0 5.1 3.5 1.4 0.2 Iris-setosa 0.686275 +1 4.9 3.0 1.4 0.2 Iris-setosa 0.612245 +2 4.7 3.2 1.3 0.2 Iris-setosa 0.680851 +3 4.6 3.1 1.5 0.2 Iris-setosa 0.673913 +4 5.0 3.6 1.4 0.2 Iris-setosa 0.720000 +``` + +上例中,插入了一个预计算的值。还可以传递带参数的函数,在 `assign` 的 DataFrame 上求值。 + +```python +In [77]: iris.assign(sepal_ratio=lambda x: (x['SepalWidth'] / x['SepalLength'])).head() +Out[77]: + SepalLength SepalWidth PetalLength PetalWidth Name sepal_ratio +0 5.1 3.5 1.4 0.2 Iris-setosa 0.686275 +1 4.9 3.0 1.4 0.2 Iris-setosa 0.612245 +2 4.7 3.2 1.3 0.2 Iris-setosa 0.680851 +3 4.6 3.1 1.5 0.2 Iris-setosa 0.673913 +4 5.0 3.6 1.4 0.2 Iris-setosa 0.720000 +``` + +`assign` 返回的**都是**数据副本,原 DataFrame 不变。 + +未引用 DataFrame 时,传递可调用的,不是实际要插入的值。这种方式常见于在操作链中调用 `assign` 的操作。例如,将 DataFrame 限制为花萼长度大于 5 的观察值,计算比例,再制图: + +```python +In [78]: (iris.query('SepalLength > 5') + ....: .assign(SepalRatio=lambda x: x.SepalWidth / x.SepalLength, + ....: PetalRatio=lambda x: x.PetalWidth / x.PetalLength) + ....: .plot(kind='scatter', x='SepalRatio', y='PetalRatio')) + ....: +Out[78]: +``` + +![函数运算](https://static.pypandas.cn/public/static/images/basics_assign.png) + +上例用 `assign` 把函数传递给 DataFrame, 并执行函数运算。这是要注意的是,该 DataFrame 是筛选了花萼长度大于 5 以后的数据。首先执行的是筛选操作,再计算比例。这个例子就是对没有事先*筛选* DataFrame 进行的引用。 + +`assign` 函数签名就是 `**kwargs`。键是新字段的列名,值为是插入值(例如,`Series` 或 NumPy 数组),或把 `DataFrame` 当做调用参数的函数。返回结果是插入新值的 DataFrame 副本。 + +*0.23.0 版新增。* + +从 3.6 版开始,Python 可以保存 `**kwargs` 顺序。这种操作允许*依赖赋值*,`**kwargs` 后的表达式,可以引用同一个 [`assign()`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.DataFrame.assign.html#Pandas.DataFrame.assign) 函数里之前创建的列 。 + +```python +In [79]: dfa = pd.DataFrame({"A": [1, 2, 3], + ....: "B": [4, 5, 6]}) + ....: + +In [80]: dfa.assign(C=lambda x: x['A'] + x['B'], + ....: D=lambda x: x['A'] + x['C']) + ....: +Out[80]: + A B C D +0 1 4 5 6 +1 2 5 7 9 +2 3 6 9 12 +``` + +第二个表达式里,`x['C']` 引用刚创建的列,与 `dfa['A'] + dfa['B']` 等效。 + +要兼容所有 Python 版本,可以把 `assign` 操作分为两部分。 + +```python +In [81]: dependent = pd.DataFrame({"A": [1, 1, 1]}) + +In [82]: (dependent.assign(A=lambda x: x['A'] + 1) + ....: .assign(B=lambda x: x['A'] + 2)) + ....: +Out[82]: + A B +0 2 4 +1 2 4 +2 2 4 +``` + +::: danger 警告 + +依赖赋值改变了 Python 3.6 及之后版本与 Python 3.6 之前版本的代码操作方式。 + +要想编写支持 3.6 之前或之后版本的 Python 代码,传递 `assign` 表达式时,要注意以下两点: + +* 更新现有的列 +* 在同一个 `assign` 引用刚建立的更新列 + +示例如下,更新列 “A”,然后,在创建 “B” 列时引用该列。 + +```python +>>> dependent = pd.DataFrame({"A": [1, 1, 1]}) +>>> dependent.assign(A=lambda x: x["A"] + 1, B=lambda x: x["A"] + 2) +``` + +Python 3.5 或更早版本的表达式在创建 `B` 列时引用的是 `A` 列的“旧”值 `[1, 1, 1]`。输出是: + +``` +A B +0 2 3 +1 2 3 +2 2 3 +``` + +Python >= 3.6 的表达式创建 `A` 列时,引用的是 `A` 列的“”新”值,`[2, 2, 2]`,输出是: + +``` +A B +0 2 4 +1 2 4 +2 2 4 +``` + +::: + +### 索引 / 选择 + +索引基础用法如下: + +操作 | 句法 | 结果 +---|---|--- +选择列 | `df[col]` | Series +用标签选择行 | `df.loc[label]` | Series +用整数位置选择行 | `df.iloc[loc]` | Series +行切片 | `df[5:10]` | DataFrame +用布尔向量选择行 | `df[bool_vec]` | DataFrame + +选择行返回 Series,索引是 DataFrame 的列: + +```python +In [83]: df.loc['b'] +Out[83]: +one 2 +bar 2 +flag False +foo bar +one_trunc 2 +Name: b, dtype: object + +In [84]: df.iloc[2] +Out[84]: +one 3 +bar 3 +flag True +foo bar +one_trunc NaN +Name: c, dtype: object +``` + +高级索引、切片技巧,请参阅[索引](https://Pandas.pydata.org/Pandas-docs/stable/user_guide/indexing.html#indexing)。[重建索引](basics.html#basics-reindexing)介绍重建索引 / 遵循新标签集的基础知识。 + +### 数据对齐和运算 + +DataFrame 对象可以自动对齐**列与索引(行标签)**的数据。与上文一样,生成的结果是列和行标签的并集。 + +```python +In [85]: df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D']) + +In [86]: df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C']) + +In [87]: df + df2 +Out[87]: + A B C D +0 0.045691 -0.014138 1.380871 NaN +1 -0.955398 -1.501007 0.037181 NaN +2 -0.662690 1.534833 -0.859691 NaN +3 -2.452949 1.237274 -0.133712 NaN +4 1.414490 1.951676 -2.320422 NaN +5 -0.494922 -1.649727 -1.084601 NaN +6 -1.047551 -0.748572 -0.805479 NaN +7 NaN NaN NaN NaN +8 NaN NaN NaN NaN +9 NaN NaN NaN NaN +``` + +DataFrame 和 Series 之间执行操作时,默认操作是在 DataFrame 的**列**上对齐 Series 的**索引**,按行执行[广播]((http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html))操作。例如: + +```python +In [88]: df - df.iloc[0] +Out[88]: + A B C D +0 0.000000 0.000000 0.000000 0.000000 +1 -1.359261 -0.248717 -0.453372 -1.754659 +2 0.253128 0.829678 0.010026 -1.991234 +3 -1.311128 0.054325 -1.724913 -1.620544 +4 0.573025 1.500742 -0.676070 1.367331 +5 -1.741248 0.781993 -1.241620 -2.053136 +6 -1.240774 -0.869551 -0.153282 0.000430 +7 -0.743894 0.411013 -0.929563 -0.282386 +8 -1.194921 1.320690 0.238224 -1.482644 +9 2.293786 1.856228 0.773289 -1.446531 +``` + +时间序列是特例,DataFrame 索引包含日期时,按列广播: + +```python +In [89]: index = pd.date_range('1/1/2000', periods=8) + +In [90]: df = pd.DataFrame(np.random.randn(8, 3), index=index, columns=list('ABC')) + +In [91]: df +Out[91]: + A B C +2000-01-01 -1.226825 0.769804 -1.281247 +2000-01-02 -0.727707 -0.121306 -0.097883 +2000-01-03 0.695775 0.341734 0.959726 +2000-01-04 -1.110336 -0.619976 0.149748 +2000-01-05 -0.732339 0.687738 0.176444 +2000-01-06 0.403310 -0.154951 0.301624 +2000-01-07 -2.179861 -1.369849 -0.954208 +2000-01-08 1.462696 -1.743161 -0.826591 + +In [92]: type(df['A']) +Out[92]: Pandas.core.series.Series + +In [93]: df - df['A'] +Out[93]: + 2000-01-01 00:00:00 2000-01-02 00:00:00 2000-01-03 00:00:00 2000-01-04 00:00:00 ... 2000-01-08 00:00:00 A B C +2000-01-01 NaN NaN NaN NaN ... NaN NaN NaN NaN +2000-01-02 NaN NaN NaN NaN ... NaN NaN NaN NaN +2000-01-03 NaN NaN NaN NaN ... NaN NaN NaN NaN +2000-01-04 NaN NaN NaN NaN ... NaN NaN NaN NaN +2000-01-05 NaN NaN NaN NaN ... NaN NaN NaN NaN +2000-01-06 NaN NaN NaN NaN ... NaN NaN NaN NaN +2000-01-07 NaN NaN NaN NaN ... NaN NaN NaN NaN +2000-01-08 NaN NaN NaN NaN ... NaN NaN NaN NaN + +[8 rows x 11 columns] +``` + +::: danger 警告 + +```python +df - df['A'] +``` + +已弃用,后期版本中会删除。实现此操作的首选方法是: + +```python +df.sub(df['A'], axis=0) +``` + +::: + +有关匹配和广播操作的显式控制,请参阅[二进制操作](basics.html#basics-binop)。 + +标量操作与其它数据结构一样: + +```python +In [94]: df * 5 + 2 +Out[94]: + A B C +2000-01-01 -4.134126 5.849018 -4.406237 +2000-01-02 -1.638535 1.393469 1.510587 +2000-01-03 5.478873 3.708672 6.798628 +2000-01-04 -3.551681 -1.099880 2.748742 +2000-01-05 -1.661697 5.438692 2.882222 +2000-01-06 4.016548 1.225246 3.508122 +2000-01-07 -8.899303 -4.849247 -2.771039 +2000-01-08 9.313480 -6.715805 -2.132955 + +In [95]: 1 / df +Out[95]: + A B C +2000-01-01 -0.815112 1.299033 -0.780489 +2000-01-02 -1.374179 -8.243600 -10.216313 +2000-01-03 1.437247 2.926250 1.041965 +2000-01-04 -0.900628 -1.612966 6.677871 +2000-01-05 -1.365487 1.454041 5.667510 +2000-01-06 2.479485 -6.453662 3.315381 +2000-01-07 -0.458745 -0.730007 -1.047990 +2000-01-08 0.683669 -0.573671 -1.209788 + +In [96]: df ** 4 +Out[96]: + A B C +2000-01-01 2.265327 0.351172 2.694833 +2000-01-02 0.280431 0.000217 0.000092 +2000-01-03 0.234355 0.013638 0.848376 +2000-01-04 1.519910 0.147740 0.000503 +2000-01-05 0.287640 0.223714 0.000969 +2000-01-06 0.026458 0.000576 0.008277 +2000-01-07 22.579530 3.521204 0.829033 +2000-01-08 4.577374 9.233151 0.466834 +``` + +支持布尔运算符: + +```python +In [97]: df1 = pd.DataFrame({'a': [1, 0, 1], 'b': [0, 1, 1]}, dtype=bool) + +In [98]: df2 = pd.DataFrame({'a': [0, 1, 1], 'b': [1, 1, 0]}, dtype=bool) + +In [99]: df1 & df2 +Out[99]: + a b +0 False False +1 False True +2 True False + +In [100]: df1 | df2 +Out[100]: + a b +0 True True +1 True True +2 True True + +In [101]: df1 ^ df2 +Out[101]: + a b +0 True True +1 True False +2 False True + +In [102]: -df1 +Out[102]: + a b +0 False True +1 True False +2 False False +``` + +### 转置 + +类似于多维数组,`T` 属性(即 `transpose` 函数)可以转置 DataFrame: + +```python +# only show the first 5 rows +In [103]: df[:5].T +Out[103]: + 2000-01-01 2000-01-02 2000-01-03 2000-01-04 2000-01-05 +A -1.226825 -0.727707 0.695775 -1.110336 -0.732339 +B 0.769804 -0.121306 0.341734 -0.619976 0.687738 +C -1.281247 -0.097883 0.959726 0.149748 0.176444 +``` + +### DataFrame 应用 NumPy 函数 + +Series 与 DataFrame 可使用 log、exp、sqrt 等多种元素级 NumPy 通用函数(ufunc) ,假设 DataFrame 的数据都是数字: + +```python +In [104]: np.exp(df) +Out[104]: + A B C +2000-01-01 0.293222 2.159342 0.277691 +2000-01-02 0.483015 0.885763 0.906755 +2000-01-03 2.005262 1.407386 2.610980 +2000-01-04 0.329448 0.537957 1.161542 +2000-01-05 0.480783 1.989212 1.192968 +2000-01-06 1.496770 0.856457 1.352053 +2000-01-07 0.113057 0.254145 0.385117 +2000-01-08 4.317584 0.174966 0.437538 + +In [105]: np.asarray(df) +Out[105]: +array([[-1.2268, 0.7698, -1.2812], + [-0.7277, -0.1213, -0.0979], + [ 0.6958, 0.3417, 0.9597], + [-1.1103, -0.62 , 0.1497], + [-0.7323, 0.6877, 0.1764], + [ 0.4033, -0.155 , 0.3016], + [-2.1799, -1.3698, -0.9542], + [ 1.4627, -1.7432, -0.8266]]) +``` + +DataFrame 不是多维数组的替代品,它的索引语义和数据模型与多维数组都不同。 + +[`Series`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.html#Pandas.Series) 应用 `__array_ufunc__`,支持 NumPy [通用函数](https://docs.scipy.org/doc/numpy/reference/ufuncs.html)。 + +通用函数应用于 Series 的底层数组。 + +```python +In [106]: ser = pd.Series([1, 2, 3, 4]) + +In [107]: np.exp(ser) +Out[107]: +0 2.718282 +1 7.389056 +2 20.085537 +3 54.598150 +dtype: float64 +``` + +*0.25.0 版更改:* 多个 `Series` 传递给 *ufunc* 时,会先进行对齐。 + +Pandas 可以自动对齐 ufunc 里的多个带标签输入数据。例如,两个标签排序不同的 [`Series`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.html#Pandas.Series) 运算前,会先对齐标签。 + +```python +In [108]: ser1 = pd.Series([1, 2, 3], index=['a', 'b', 'c']) + +In [109]: ser2 = pd.Series([1, 3, 5], index=['b', 'a', 'c']) + +In [110]: ser1 +Out[110]: +a 1 +b 2 +c 3 +dtype: int64 + +In [111]: ser2 +Out[111]: +b 1 +a 3 +c 5 +dtype: int64 + +In [112]: np.remainder(ser1, ser2) +Out[112]: +a 1 +b 0 +c 3 +dtype: int64 +``` + +一般来说,Pandas 提取两个索引的并集,不重叠的值用缺失值填充。 + +```python +In [113]: ser3 = pd.Series([2, 4, 6], index=['b', 'c', 'd']) + +In [114]: ser3 +Out[114]: +b 2 +c 4 +d 6 +dtype: int64 + +In [115]: np.remainder(ser1, ser3) +Out[115]: +a NaN +b 0.0 +c 3.0 +d NaN +dtype: float64 +``` + +对 [`Series`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.html#Pandas.Series) 和 [`Index`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Index.html#Pandas.Index) 应用二进制 ufunc 时,优先执行 Series,并返回的结果也是 Series 。 + +```python +In [116]: ser = pd.Series([1, 2, 3]) + +In [117]: idx = pd.Index([4, 5, 6]) + +In [118]: np.maximum(ser, idx) +Out[118]: +0 4 +1 5 +2 6 +dtype: int64 +``` + +NumPy 通用函数可以安全地应用于非多维数组支持的 [`Series`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.Series.html#Pandas.Series),例如,[`SparseArray`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.SparseArray.html#Pandas.SparseArray)(参见[稀疏计算](https://Pandas.pydata.org/Pandas-docs/stable/user_guide/sparse.html#sparse-calculation))。如有可能,应用 ufunc 而不把基础数据转换为多维数组。 + +### 控制台显示 + +控制台显示大型 DataFrame 时,会根据空间调整显示大小。[`info()`](https://Pandas.pydata.org/Pandas-docs/stable/reference/api/Pandas.DataFrame.info.html#Pandas.DataFrame.info)函数可以查看 DataFrame 的信息摘要。下列代码读取 R 语言 **plyr** 包里的**棒球**数据集 CSV 文件): + +```python +In [119]: baseball = pd.read_csv('data/baseball.csv') + +In [120]: print(baseball) + id player year stint team lg g ab r h X2b X3b hr rbi sb cs bb so ibb hbp sh sf gidp +0 88641 womacto01 2006 2 CHN NL 19 50 6 14 1 0 1 2.0 1.0 1.0 4 4.0 0.0 0.0 3.0 0.0 0.0 +1 88643 schilcu01 2006 1 BOS AL 31 2 0 1 0 0 0 0.0 0.0 0.0 0 1.0 0.0 0.0 0.0 0.0 0.0 +.. ... ... ... ... ... .. .. ... .. ... ... ... .. ... ... ... .. ... ... ... ... ... ... +98 89533 aloumo01 2007 1 NYN NL 87 328 51 112 19 1 13 49.0 3.0 0.0 27 30.0 5.0 2.0 0.0 3.0 13.0 +99 89534 alomasa02 2007 1 NYN NL 8 22 1 3 1 0 0 0.0 0.0 0.0 0 3.0 0.0 0.0 0.0 0.0 0.0 + +[100 rows x 23 columns] + +In [121]: baseball.info() + +RangeIndex: 100 entries, 0 to 99 +Data columns (total 23 columns): +id 100 non-null int64 +player 100 non-null object +year 100 non-null int64 +stint 100 non-null int64 +team 100 non-null object +lg 100 non-null object +g 100 non-null int64 +ab 100 non-null int64 +r 100 non-null int64 +h 100 non-null int64 +X2b 100 non-null int64 +X3b 100 non-null int64 +hr 100 non-null int64 +rbi 100 non-null float64 +sb 100 non-null float64 +cs 100 non-null float64 +bb 100 non-null int64 +so 100 non-null float64 +ibb 100 non-null float64 +hbp 100 non-null float64 +sh 100 non-null float64 +sf 100 non-null float64 +gidp 100 non-null float64 +dtypes: float64(9), int64(11), object(3) +memory usage: 18.1+ KB +``` + +尽管 `to_string` 有时不匹配控制台的宽度,但还是可以用 `to_string` 以表格形式返回 DataFrame 的字符串表示形式: + +```python +In [122]: print(baseball.iloc[-20:, :12].to_string()) + id player year stint team lg g ab r h X2b X3b +80 89474 finlest01 2007 1 COL NL 43 94 9 17 3 0 +81 89480 embreal01 2007 1 OAK AL 4 0 0 0 0 0 +82 89481 edmonji01 2007 1 SLN NL 117 365 39 92 15 2 +83 89482 easleda01 2007 1 NYN NL 76 193 24 54 6 0 +84 89489 delgaca01 2007 1 NYN NL 139 538 71 139 30 0 +85 89493 cormirh01 2007 1 CIN NL 6 0 0 0 0 0 +86 89494 coninje01 2007 2 NYN NL 21 41 2 8 2 0 +87 89495 coninje01 2007 1 CIN NL 80 215 23 57 11 1 +88 89497 clemero02 2007 1 NYA AL 2 2 0 1 0 0 +89 89498 claytro01 2007 2 BOS AL 8 6 1 0 0 0 +90 89499 claytro01 2007 1 TOR AL 69 189 23 48 14 0 +91 89501 cirilje01 2007 2 ARI NL 28 40 6 8 4 0 +92 89502 cirilje01 2007 1 MIN AL 50 153 18 40 9 2 +93 89521 bondsba01 2007 1 SFN NL 126 340 75 94 14 0 +94 89523 biggicr01 2007 1 HOU NL 141 517 68 130 31 3 +95 89525 benitar01 2007 2 FLO NL 34 0 0 0 0 0 +96 89526 benitar01 2007 1 SFN NL 19 0 0 0 0 0 +97 89530 ausmubr01 2007 1 HOU NL 117 349 38 82 16 3 +98 89533 aloumo01 2007 1 NYN NL 87 328 51 112 19 1 +99 89534 alomasa02 2007 1 NYN NL 8 22 1 3 1 0 +``` + +默认情况下,过宽的 DataFrame 会跨多行输出: + +```python +In [123]: pd.DataFrame(np.random.randn(3, 12)) +Out[123]: + 0 1 2 3 4 5 6 7 8 9 10 11 +0 -0.345352 1.314232 0.690579 0.995761 2.396780 0.014871 3.357427 -0.317441 -1.236269 0.896171 -0.487602 -0.082240 +1 -2.182937 0.380396 0.084844 0.432390 1.519970 -0.493662 0.600178 0.274230 0.132885 -0.023688 2.410179 1.450520 +2 0.206053 -0.251905 -2.213588 1.063327 1.266143 0.299368 -0.863838 0.408204 -1.048089 -0.025747 -0.988387 0.094055 +``` + +`display.width` 选项可以更改单行输出的宽度: + +```python +In [124]: pd.set_option('display.width', 40) # 默认值为 80 + +In [125]: pd.DataFrame(np.random.randn(3, 12)) +Out[125]: + 0 1 2 3 4 5 6 7 8 9 10 11 +0 1.262731 1.289997 0.082423 -0.055758 0.536580 -0.489682 0.369374 -0.034571 -2.484478 -0.281461 0.030711 0.109121 +1 1.126203 -0.977349 1.474071 -0.064034 -1.282782 0.781836 -1.071357 0.441153 2.353925 0.583787 0.221471 -0.744471 +2 0.758527 1.729689 -0.964980 -0.845696 -1.340896 1.846883 -1.328865 1.682706 -1.717693 0.888782 0.228440 0.901805 +``` + +还可以用 `display.max_colwidth` 调整最大列宽。 + +```python +In [126]: datafile = {'filename': ['filename_01', 'filename_02'], + .....: 'path': ["media/user_name/storage/folder_01/filename_01", + .....: "media/user_name/storage/folder_02/filename_02"]} + .....: + +In [127]: pd.set_option('display.max_colwidth', 30) + +In [128]: pd.DataFrame(datafile) +Out[128]: + filename path +0 filename_01 media/user_name/storage/fo... +1 filename_02 media/user_name/storage/fo... + +In [129]: pd.set_option('display.max_colwidth', 100) + +In [130]: pd.DataFrame(datafile) +Out[130]: + filename path +0 filename_01 media/user_name/storage/folder_01/filename_01 +1 filename_02 media/user_name/storage/folder_02/filename_02 +``` + +`expand_frame_repr` 选项可以禁用此功能,在一个区块里输出整个表格。 + +### DataFrame 列属性访问和 IPython 代码补全 + +DataFrame 列标签是有效的 Python 变量名时,可以像属性一样访问该列: + +```python +In [131]: df = pd.DataFrame({'foo1': np.random.randn(5), + .....: 'foo2': np.random.randn(5)}) + .....: + +In [132]: df +Out[132]: + foo1 foo2 +0 1.171216 -0.858447 +1 0.520260 0.306996 +2 -1.197071 -0.028665 +3 -1.066969 0.384316 +4 -0.303421 1.574159 + +In [133]: df.foo1 +Out[133]: +0 1.171216 +1 0.520260 +2 -1.197071 +3 -1.066969 +4 -0.303421 +Name: foo1, dtype: float64 +``` + +[IPython](https://ipython.org) 支持补全功能,按 **tab** 键可以实现代码补全: + +```python +In [134]: df.fo # 此时按 tab 键 会显示下列内容 +df.foo1 df.foo2 +``` diff --git a/Python/pandas/getting_started/overview.md b/Python/pandas/getting_started/overview.md new file mode 100644 index 00000000..6005f0b5 --- /dev/null +++ b/Python/pandas/getting_started/overview.md @@ -0,0 +1,119 @@ +# Pandas 概览 +**Pandas** 是 [Python](https://www.python.org/) 的核心数据分析支持库,提供了快速、灵活、明确的数据结构,旨在简单、直观地处理关系型、标记型数据。Pandas 的目标是成为 Python 数据分析实践与实战的必备高级工具,其长远目标是成为**最强大、最灵活、可以支持任何语言的开源数据分析工具**。经过多年不懈的努力,Pandas 离这个目标已经越来越近了。 + +Pandas 适用于处理以下类型的数据: + +* 与 SQL 或 Excel 表类似的,含异构列的表格数据; +* 有序和无序(非固定频率)的时间序列数据; +* 带行列标签的矩阵数据,包括同构或异构型数据; +* 任意其它形式的观测、统计数据集, 数据转入 Pandas 数据结构时不必事先标记。 + +Pandas 的主要数据结构是 [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series)(一维数据)与 [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)(二维数据),这两种数据结构足以处理金融、统计、社会科学、工程等领域里的大多数典型用例。对于 R 用户,DataFrame 提供了比 R 语言 data.frame 更丰富的功能。Pandas 基于 [NumPy](https://www.numpy.org/) 开发,可以与其它第三方科学计算支持库完美集成。 + +Pandas 就像一把万能瑞士军刀,下面仅列出了它的部分优势 : + +* 处理浮点与非浮点数据里的**缺失数据**,表示为 `NaN`; +* 大小可变:**插入或删除** DataFrame 等多维对象的列; +* 自动、显式**数据对齐**:显式地将对象与一组标签对齐,也可以忽略标签,在 Series、DataFrame 计算时自动与数据对齐; +* 强大、灵活的**分组**(group by)功能:**拆分-应用-组合**数据集,聚合、转换数据; +* 把 Python 和 NumPy 数据结构里不规则、不同索引的数据**轻松**地转换为 DataFrame 对象; +* 基于智能标签,对大型数据集进行**切片**、**花式索引**、**子集分解**等操作; +* 直观地**合并(merge)**、**连接(join)**数据集; +* 灵活地**重塑(reshape)**、**透视(pivot)**数据集; +* **轴**支持结构化标签:一个刻度支持多个标签; +* 成熟的 IO 工具:读取**文本文件**(CSV 等支持分隔符的文件)、Excel 文件、数据库等来源的数据,利用超快的 **HDF5** 格式保存 / 加载数据; +* **时间序列**:支持日期范围生成、频率转换、移动窗口统计、移动窗口线性回归、日期位移等时间序列功能。 + +这些功能主要是为了解决其它编程语言、科研环境的痛点。处理数据一般分为几个阶段:数据整理与清洗、数据分析与建模、数据可视化与制表,Pandas 是处理数据的理想工具。 + +其它说明: + +* Pandas 速度**很快**。Pandas 的很多底层算法都用 [Cython](https://cython.org/) 优化过。然而,为了保持通用性,必然要牺牲一些性能,如果专注某一功能,完全可以开发出比 Pandas 更快的专用工具。 +* Pandas 是 [statsmodels](https://www.statsmodels.org/stable/index.html) 的依赖项,因此,Pandas 也是 Python 中统计计算生态系统的重要组成部分。 +* Pandas 已广泛应用于金融领域。 + +## 数据结构 + +维数 | 名称 | 描述 +---|---|--- +1 | Series | 带标签的一维同构数组 +2 | DataFrame | 带标签的,大小可变的,二维异构表格 + +### 为什么有多个数据结构? + +Pandas 数据结构就像是低维数据的容器。比如,DataFrame 是 Series 的容器,Series 则是标量的容器。使用这种方式,可以在容器中以字典的形式插入或删除对象。 + +此外,通用 API 函数的默认操作要顾及时间序列与截面数据集的方向。多维数组存储二维或三维数据时,编写函数要注意数据集的方向,这对用户来说是一种负担;如果不考虑 C 或 Fortran 中连续性对性能的影响,一般情况下,不同的轴在程序里其实没有什么区别。Pandas 里,轴的概念主要是为了给数据赋予更直观的语义,即用“更恰当”的方式表示数据集的方向。这样做可以让用户编写数据转换函数时,少费点脑子。 + +处理 DataFrame 等表格数据时,**index**(行)或 **columns**(列)比 **axis 0** 和 **axis 1** 更直观。用这种方式迭代 DataFrame 的列,代码更易读易懂: + +``` python +for col in df.columns: + series = df[col] + # do something with series +``` + +## 大小可变与数据复制 + +Pandas 所有数据结构的值都是可变的,但数据结构的大小并非都是可变的,比如,Series 的长度不可改变,但 DataFrame 里就可以插入列。 + +Pandas 里,绝大多数方法都不改变原始的输入数据,而是复制数据,生成新的对象。 一般来说,原始输入数据**不变**更稳妥。 + +## 获得支持 + +发现 Pandas 的问题或有任何建议,请反馈到 [Github 问题跟踪器](https://github.com/Pandas-dev/Pandas/issues)。日常应用问题请在 [Stack Overflow](https://stackoverflow.com/questions/tagged/Pandas) 上咨询 Pandas 社区专家。 + +## 社区 + +Pandas 如今由来自全球的同道中人组成的社区提供支持,社区里的每个人都贡献了宝贵的时间和精力,正因如此,才成就了开源 Pandas,在此,我们要感谢[所有贡献者](https://github.com/Pandas-dev/Pandas/graphs/contributors)。 + +若您有意为 Pandas 贡献自己的力量,请先阅读[贡献指南](https://Pandas.pydata.org/Pandas-docs/stable/development/contributing.html#contributing)。 + +Pandas 是 [NumFOCUS](https://www.numfocus.org/open-source-projects/) 赞助的项目。有了稳定的资金来源,就确保了 Pandas,这一世界级开源项目的成功,为本项目[捐款](https://Pandas.pydata.org/donate.html)也更有保障。 + +## 项目监管 + +自 2008 年以来,Pandas 沿用的监管流程已正式编纂为[项目监管文档](https://github.com/Pandas-dev/Pandas-governance)。这些文件阐明了如何决策,如何处理营利组织与非营利实体进行开源协作开发的关系等内容。 + +Wes McKinney 是仁慈的终身独裁者。 + +## 开发团队 +核心团队成员列表及详细信息可在 Github 仓库的[人员页面](https://github.com/Pandas-dev/Pandas-governance/blob/master/people.md)上查询。 + +## 机构合作伙伴 + +现有机构合作伙伴信息可在 [Pandas 网站页面](/about/)上查询。 + +## 许可协议 + +``` +BSD 3-Clause License + +Copyright (c) 2008-2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +* Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +* Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +* Neither the name of the copyright holder nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +``` diff --git a/Python/pandas/getting_started/tutorials.md b/Python/pandas/getting_started/tutorials.md new file mode 100644 index 00000000..76e9a371 --- /dev/null +++ b/Python/pandas/getting_started/tutorials.md @@ -0,0 +1,73 @@ +--- +meta: + - name: keywords + content: pandas教程,pandas资料 + - name: description + content: 为方便新用户上手 Pandas,本节收录了众多 Pandas 教程。Pandas 团队出品。 +--- + +# 教程资料 + +为方便新用户上手 Pandas,本节收录了众多 Pandas 教程。 + +## 官方指南 + +[十分钟入门 Pandas](/docs/getting_started/10min.html),Pandas 团队出品。 + +[Cookbook](/docs/user_guide/cookbook.html) ,Pandas 实用案例。 + +[Pandas 速查表](http://Pandas.pydata.org/Pandas_Cheat_Sheet.pdf),案头必备。 + +## 社区指南 + +### 《Pandas Cookbook》Julia Evans 著 + +[Julia Evans](http://jvns.ca/) 2015 年编著的《Pandas Cookbook》包含了很多 Pandas 实战例子,这些例子大多基于实战数据,涵盖了绝大多数新手小白遇到的实际问题。代码请参阅 Pandas-cookbook 的 [GitHub 仓库](http://github.com/jvns/Pandas-cookbook)。 + +### 《Learn Pandas》 Hernan Rojas 著 + +面对新用户介绍 Pandas 学习经验: +[https://bitbucket.org/hrojas/learn-Pandas](https://bitbucket.org/hrojas/learn-Pandas) + +### Python 数据分析实战 + +该[指南](http://wavedatalab.github.io/datawithpython)介绍了用 Python 数据生态系统对开源数据集进行数据分析的过程。涵盖了[数据整理](http://wavedatalab.github.io/datawithpython/munge.html)、[数据聚合](http://wavedatalab.github.io/datawithpython/aggregate.html)、[数据可视化](http://wavedatalab.github.io/datawithpython/visualize.html)和[时间序列](http://wavedatalab.github.io/datawithpython/timeseries.html)。 + +### 新手习题 + +用真实数据集与习题,锻炼 Pandas 运用能力。更多资源,请参阅[这个仓库](https://github.com/guipsamora/Pandas_exercises)。 + +### 现代 Pandas + +[Tom Augspurger](https://github.com/TomAugspurger) 2016 年编写的系列教程。源文件在 GitHub 存储库 [TomAugspurger/effective-Pandas](https://github.com/TomAugspurger/effective-Pandas)。 + +- [现代 Pandas](http://tomaugspurger.github.io/modern-1-intro.html) +- [方法链](http://tomaugspurger.github.io/method-chaining.html) +- [索引](http://tomaugspurger.github.io/modern-3-indexes.html) +- [性能](http://tomaugspurger.github.io/modern-4-performance.html) +- [清洗数据](http://tomaugspurger.github.io/modern-5-tidy.html) +- [可视化](http://tomaugspurger.github.io/modern-6-visualization.html) +- [时间序列](http://tomaugspurger.github.io/modern-7-timeseries.html) + +### 用 Pandas、vincent 和 xlsxwriter 绘制 Excel 图 + +[用 Pandas、vincent 和 xlsxwriter 绘制 Excel 图](https://Pandas-xlsxwriter-charts.readthedocs.io/) + +### 视频教程 + +- [Pandas 零基础](https://www.youtube.com/watch?v=5JnMutdy6Fw) (2015) (2:24) [GitHub repo](https://github.com/brandon-rhodes/pycon-Pandas-tutorial) +- [Pandas 简介](https://www.youtube.com/watch?v=-NR-ynQg0YM) (2016) (1:28) [GitHub repo](https://github.com/chendaniely/2016-pydata-carolinas-Pandas) +- [Pandas:从 head() 到 tail()](https://www.youtube.com/watch?v=7vuO9QXDN50) (2016) (1:26) [GitHub repo](https://github.com/TomAugspurger/pydata-chi-h2t) +- [Pandas 数据分析](https://www.youtube.com/playlist?list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y) (2016-2018) [GitHub repo](https://github.com/justmarkham/Pandas-videos) 与 [Jupyter Notebook](http://nbviewer.jupyter.org/github/justmarkham/Pandas-videos/blob/master/Pandas.ipynb) +- [Pandas 最佳实践](https://www.youtube.com/playlist?list=PL5-da3qGB5IBITZj_dYSFqnd_15JgqwA6) (2018) [GitHub repo](https://github.com/justmarkham/pycon-2018-tutorial) 与 [Jupyter Notebook](http://nbviewer.jupyter.org/github/justmarkham/pycon-2018-tutorial/blob/master/tutorial.ipynb) + +### 其它教程 + +- [Wes McKinney(Pandas 仁慈的终身独裁者)博客](http://blog.wesmckinney.com/) +- [轻松上手 Python 统计分析 - SciPy 与 Pandas,Randal Olson](http://www.randalolson.com/2012/08/06/statistical-analysis-made-easy-in-python/) +- [Python 统计数据分析,Christopher Fonnesbeck,SciPy 2013](http://conference.scipy.org/scipy2013/tutorial_detail.php?id=109) +- [Python 金融分析,Thomas Wiecki](http://nbviewer.ipython.org/github/twiecki/financial-analysis-python-tutorial/blob/master/1.%20Pandas%20Basics.ipynb) +- [Pandas 数据结构,Greg Reda](http://www.gregreda.com/2013/10/26/intro-to-Pandas-data-structures/) +- [Pandas 与 Python:Top 10,Manish Amde](http://manishamde.github.io/blog/2013/03/07/Pandas-and-python-top-10/) +- [Pandas DataFrames 教程,Karlijn Willems](http://www.datacamp.com/community/tutorials/Pandas-tutorial-dataframe-python) +- [实战案例简明教程](https://tutswiki.com/Pandas-cookbook/chapter1) \ No newline at end of file diff --git a/Python/pandas/user_guide/README.md b/Python/pandas/user_guide/README.md new file mode 100644 index 00000000..f13c6460 --- /dev/null +++ b/Python/pandas/user_guide/README.md @@ -0,0 +1,213 @@ +--- +meta: + - name: keywords + content: Pandas指南 + - name: description + content: “用户指南” 按主题划分区域涵盖了几乎所有Pandas的功能。每个小节都介绍了一个主题(例如“处理缺失的数据”),并讨论了Pandas如何解决问题,其中包含许多示例。 +--- + +# Pandas 用户指南目录 + +“用户指南” 按主题划分区域涵盖了几乎所有Pandas的功能。每个小节都介绍了一个主题(例如“处理缺失的数据”),并讨论了Pandas如何解决问题,其中包含许多示例。 + +刚开始接触Pandas的同学应该从[十分钟入门Pandas](/docs/getting_started/10min.html)开始看起。 + +有关任何特定方法的更多信息,请[参阅API参考](/docs/reference.html)。 + +- [IO工具(文本,CSV,HDF5,…)](io.html) + - [CSV & text files](io.html#csv-text-files) + - [JSON](io.html#json) + - [HTML](io.html#html) + - [Excel files](io.html#excel-files) + - [OpenDocument Spreadsheets](io.html#opendocument-spreadsheets) + - [Clipboard](io.html#clipboard) + - [Pickling](io.html#pickling) + - [msgpack](io.html#msgpack) + - [HDF5 (PyTables)](io.html#hdf5-pytables) + - [Feather](io.html#feather) + - [Parquet](io.html#parquet) + - [SQL queries](io.html#sql-queries) + - [Google BigQuery](io.html#google-bigquery) + - [Stata format](io.html#stata-format) + - [SAS formats](io.html#sas-formats) + - [Other file formats](io.html#other-file-formats) + - [Performance considerations](io.html#performance-considerations) +- [索引和数据选择器](indexing.html) + - [Different choices for indexing](indexing.html#different-choices-for-indexing) + - [Basics](indexing.html#basics) + - [Attribute access](indexing.html#attribute-access) + - [Slicing ranges](indexing.html#slicing-ranges) + - [Selection by label](indexing.html#selection-by-label) + - [Selection by position](indexing.html#selection-by-position) + - [Selection by callable](indexing.html#selection-by-callable) + - [IX indexer is deprecated](indexing.html#ix-indexer-is-deprecated) + - [Indexing with list with missing labels is deprecated](indexing.html#indexing-with-list-with-missing-labels-is-deprecated) + - [Selecting random samples](indexing.html#selecting-random-samples) + - [Setting with enlargement](indexing.html#setting-with-enlargement) + - [Fast scalar value getting and setting](indexing.html#fast-scalar-value-getting-and-setting) + - [Boolean indexing](indexing.html#boolean-indexing) + - [Indexing with isin](indexing.html#indexing-with-isin) + - [The ``where()`` Method and Masking](indexing.html#the-where-method-and-masking) + - [The ``query()`` Method](indexing.html#the-query-method) + - [Duplicate data](indexing.html#duplicate-data) + - [Dictionary-like ``get()`` method](indexing.html#dictionary-like-get-method) + - [The ``lookup()`` method](indexing.html#the-lookup-method) + - [Index objects](indexing.html#index-objects) + - [Set / reset index](indexing.html#set-reset-index) + - [Returning a view versus a copy](indexing.html#returning-a-view-versus-a-copy) +- [多索引/高级索引](advanced.html) + - [Hierarchical indexing (MultiIndex)](advanced.html#hierarchical-indexing-multiindex) + - [Advanced indexing with hierarchical index](advanced.html#advanced-indexing-with-hierarchical-index) + - [Sorting a ``MultiIndex``](advanced.html#sorting-a-multiindex) + - [Take methods](advanced.html#take-methods) + - [Index types](advanced.html#index-types) + - [Miscellaneous indexing FAQ](advanced.html#miscellaneous-indexing-faq) +- [合并、联接和连接](merging.html) + - [Concatenating objects](merging.html#concatenating-objects) + - [Database-style DataFrame or named Series joining/merging](merging.html#database-style-dataframe-or-named-series-joining-merging) + - [Timeseries friendly merging](merging.html#timeseries-friendly-merging) +- [重塑和数据透视表](reshaping.html) + - [Reshaping by pivoting DataFrame objects](reshaping.html#reshaping-by-pivoting-dataframe-objects) + - [Reshaping by stacking and unstacking](reshaping.html#reshaping-by-stacking-and-unstacking) + - [Reshaping by Melt](reshaping.html#reshaping-by-melt) + - [Combining with stats and GroupBy](reshaping.html#combining-with-stats-and-groupby) + - [Pivot tables](reshaping.html#pivot-tables) + - [Cross tabulations](reshaping.html#cross-tabulations) + - [Tiling](reshaping.html#tiling) + - [Computing indicator / dummy variables](reshaping.html#computing-indicator-dummy-variables) + - [Factorizing values](reshaping.html#factorizing-values) + - [Examples](reshaping.html#examples) + - [Exploding a list-like column](reshaping.html#exploding-a-list-like-column) +- [处理文本字符串](text.html) + - [Splitting and replacing strings](text.html#splitting-and-replacing-strings) + - [Concatenation](text.html#concatenation) + - [Indexing with ``.str``](text.html#indexing-with-str) + - [Extracting substrings](text.html#extracting-substrings) + - [Testing for Strings that match or contain a pattern](text.html#testing-for-strings-that-match-or-contain-a-pattern) + - [Creating indicator variables](text.html#creating-indicator-variables) + - [Method summary](text.html#method-summary) +- [处理丢失的数据](missing_data.html) + - [Values considered “missing”](missing_data.html#values-considered-missing) + - [Sum/prod of empties/nans](missing_data.html#sum-prod-of-empties-nans) + - [NA values in GroupBy](missing_data.html#na-values-in-groupby) + - [Filling missing values: fillna](missing_data.html#filling-missing-values-fillna) + - [Filling with a PandasObject](missing_data.html#filling-with-a-pandasobject) + - [Dropping axis labels with missing data: dropna](missing_data.html#dropping-axis-labels-with-missing-data-dropna) + - [Interpolation](missing_data.html#interpolation) + - [Replacing generic values](missing_data.html#replacing-generic-values) + - [String/regular expression replacement](missing_data.html#string-regular-expression-replacement) + - [Numeric replacement](missing_data.html#numeric-replacement) +- [分类数据](categorical.html) + - [Object creation](categorical.html#object-creation) + - [CategoricalDtype](categorical.html#categoricaldtype) + - [Description](categorical.html#description) + - [Working with categories](categorical.html#working-with-categories) + - [Sorting and order](categorical.html#sorting-and-order) + - [Comparisons](categorical.html#comparisons) + - [Operations](categorical.html#operations) + - [Data munging](categorical.html#data-munging) + - [Getting data in/out](categorical.html#getting-data-in-out) + - [Missing data](categorical.html#missing-data) + - [Differences to R’s factor](categorical.html#differences-to-r-s-factor) + - [Gotchas](categorical.html#gotchas) +- [Nullable整型数据类型](integer_na.html) +- [可视化](visualization.html) + - [Basic plotting: ``plot``](visualization.html#basic-plotting-plot) + - [Other plots](visualization.html#other-plots) + - [Plotting with missing data](visualization.html#plotting-with-missing-data) + - [Plotting Tools](visualization.html#plotting-tools) + - [Plot Formatting](visualization.html#plot-formatting) + - [Plotting directly with matplotlib](visualization.html#plotting-directly-with-matplotlib) + - [Trellis plotting interface](visualization.html#trellis-plotting-interface) +- [计算工具](computation.html) + - [Statistical functions](computation.html#statistical-functions) + - [Window Functions](computation.html#window-functions) + - [Aggregation](computation.html#aggregation) + - [Expanding windows](computation.html#expanding-windows) + - [Exponentially weighted windows](computation.html#exponentially-weighted-windows) +- [组操作: 拆分-应用-组合](groupby.html) + - [Splitting an object into groups](groupby.html#splitting-an-object-into-groups) + - [Iterating through groups](groupby.html#iterating-through-groups) + - [Selecting a group](groupby.html#selecting-a-group) + - [Aggregation](groupby.html#aggregation) + - [Transformation](groupby.html#transformation) + - [Filtration](groupby.html#filtration) + - [Dispatching to instance methods](groupby.html#dispatching-to-instance-methods) + - [Flexible ``apply``](groupby.html#flexible-apply) + - [Other useful features](groupby.html#other-useful-features) + - [Examples](groupby.html#examples) +- [时间序列/日期方法](timeseries.html) + - [Overview](timeseries.html#overview) + - [Timestamps vs. Time Spans](timeseries.html#timestamps-vs-time-spans) + - [Converting to timestamps](timeseries.html#converting-to-timestamps) + - [Generating ranges of timestamps](timeseries.html#generating-ranges-of-timestamps) + - [Timestamp limitations](timeseries.html#timestamp-limitations) + - [Indexing](timeseries.html#indexing) + - [Time/date components](timeseries.html#time-date-components) + - [DateOffset objects](timeseries.html#dateoffset-objects) + - [Time Series-Related Instance Methods](timeseries.html#time-series-related-instance-methods) + - [Resampling](timeseries.html#resampling) + - [Time span representation](timeseries.html#time-span-representation) + - [Converting between representations](timeseries.html#converting-between-representations) + - [Representing out-of-bounds spans](timeseries.html#representing-out-of-bounds-spans) + - [Time zone handling](timeseries.html#time-zone-handling) +- [时间增量](timedeltas.html) + - [Parsing](timedeltas.html#parsing) + - [Operations](timedeltas.html#operations) + - [Reductions](timedeltas.html#reductions) + - [Frequency conversion](timedeltas.html#frequency-conversion) + - [Attributes](timedeltas.html#attributes) + - [TimedeltaIndex](timedeltas.html#timedeltaindex) + - [Resampling](timedeltas.html#resampling) +- [样式](style.html) + - [Building styles](style.html#Building-styles) + - [Finer control: slicing](style.html#Finer-control:-slicing) + - [Finer Control: Display Values](style.html#Finer-Control:-Display-Values) + - [Builtin styles](style.html#Builtin-styles) + - [Sharing styles](style.html#Sharing-styles) + - [Other Options](style.html#Other-Options) + - [Fun stuff](style.html#Fun-stuff) + - [Export to Excel](style.html#Export-to-Excel) + - [Extensibility](style.html#Extensibility) +- [选项和设置](options.html) + - [Overview](options.html#overview) + - [Getting and setting options](options.html#getting-and-setting-options) + - [Setting startup options in Python/IPython environment](options.html#setting-startup-options-in-python-ipython-environment) + - [Frequently Used Options](options.html#frequently-used-options) + - [Available options](options.html#available-options) + - [Number formatting](options.html#number-formatting) + - [Unicode formatting](options.html#unicode-formatting) + - [Table schema display](options.html#table-schema-display) +- [提高性能](enhancingperf.html) + - [Cython (writing C extensions for pandas)](enhancingperf.html#cython-writing-c-extensions-for-pandas) + - [Using Numba](enhancingperf.html#using-numba) + - [Expression evaluation via ``>eval()``](enhancingperf.html#expression-evaluation-via-eval) +- [稀疏数据结构](sparse.html) + - [SparseArray](sparse.html#sparsearray) + - [SparseDtype](sparse.html#sparsedtype) + - [Sparse accessor](sparse.html#sparse-accessor) + - [Sparse calculation](sparse.html#sparse-calculation) + - [Migrating](sparse.html#migrating) + - [Interaction with scipy.sparse](sparse.html#interaction-with-scipy-sparse) + - [Sparse subclasses](sparse.html#sparse-subclasses) +- [常见问题(FAQ)](gotchas.html) + - [DataFrame memory usage](gotchas.html#dataframe-memory-usage) + - [Using if/truth statements with pandas](gotchas.html#using-if-truth-statements-with-pandas) + - [``NaN``, Integer ``NA`` values and ``NA`` type promotions](gotchas.html#nan-integer-na-values-and-na-type-promotions) + - [Differences with NumPy](gotchas.html#differences-with-numpy) + - [Thread-safety](gotchas.html#thread-safety) + - [Byte-Ordering issues](gotchas.html#byte-ordering-issues) +- [烹饪指南](cookbook.html) + - [Idioms](cookbook.html#idioms) + - [Selection](cookbook.html#selection) + - [MultiIndexing](cookbook.html#multiindexing) + - [Missing data](cookbook.html#missing-data) + - [Grouping](cookbook.html#grouping) + - [Timeseries](cookbook.html#timeseries) + - [Merge](cookbook.html#merge) + - [Plotting](cookbook.html#plotting) + - [Data In/Out](cookbook.html#data-in-out) + - [Computation](cookbook.html#computation) + - [Timedeltas](cookbook.html#timedeltas) + - [Aliasing axis names](cookbook.html#aliasing-axis-names) + - [Creating example data](cookbook.html#creating-example-data) \ No newline at end of file diff --git a/Python/pandas/user_guide/advanced.md b/Python/pandas/user_guide/advanced.md new file mode 100644 index 00000000..3f42b51c --- /dev/null +++ b/Python/pandas/user_guide/advanced.md @@ -0,0 +1,2024 @@ +# 多层级索引和高级索引 + +本章节包含了[使用多层级索引](#advanced-hierarchical) 以及 [其他高级索引特性](#indexing-index-types)。 + +请参阅 [索引与选择数据](indexing.html#indexing)来获得更多的通用索引方面的帮助文档 + +::: danger 警告 + +基于实际的使用场景不同,返回的内容也会不尽相同(返回一个数据的副本,或者返回数据的引用)。有时,这种情况被称作连锁赋值,但是这种情况应当被尽力避免。参见[返回视图or返回副本](indexing.html#indexing-view-versus-copy)。 + +::: + +参见 [cookbook](/document/cookbook/index.html),获取更多高级的使用技巧。 + + + +## 分层索引(多层级索引) + +分层/多级索引在处理复杂的数据分析和数据操作方面为开发者奠定了基础,尤其是在处理高纬度数据处理上。本质上,它使您能够在较低维度的数据结构(如 ``Series``(1d)和``DataFrame`` (2d))中存储和操作任意维数的数据。 + +在本节中,我们将展示“层次”索引的确切含义,以及它如何与上面和前面部分描述的所有panda索引功能集成。稍后,在讨论[分组](groupby.html#groupby) 和[数据透视与重塑性数据](reshaping.html#reshaping)时,我们将展示一些重要的应用程序,以说明它如何帮助构建分析数据的结构。 + +请参阅[cookbook](cookbook.html#cookbook-multi-index),查看一些高级策略. + +*在0.24.0版本中的改变:*``MultIndex.labels``被更名为[``MultiIndex.codes``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.codes.html#pandas.MultiIndex.codes) +,同时 ``MultiIndex.set_labels`` 更名为 [``MultiIndex.set_codes``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.set_codes.html#pandas.MultiIndex.set_codes). + +### 创建多级索引和分层索引对象 + + [``MultiIndex``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html#pandas.MultiIndex) 对象是[``标准索引对象``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index) 的分层模拟,标准索引对象通常将axis标签存储在panda对象中。您可以将``MultiIndex``看作一个元组数组,其中每个元组都是惟一的。可以从数组列表(使用 +[``MultiIndex.from_arrays()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_arrays.html#pandas.MultiIndex.from_arrays))、元组数组(使用[``MultiIndex.from_tuples()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_tuples.html#pandas.MultiIndex.from_tuples))或交叉迭代器集(使用[``MultiIndex.from_product()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_product.html#pandas.MultiIndex.from_product))或者将一个 [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)(使用using +[``MultiIndex.from_frame()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_frame.html#pandas.MultiIndex.from_frame))创建多索引。当传递一个元组列表时,``索引``构造函数将尝试返回一个``MultiIndex``。下面的示例演示了初始化多索引的不同方法。 + +``` python +In [1]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ...: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + ...: + +In [2]: tuples = list(zip(*arrays)) + +In [3]: tuples +Out[3]: +[('bar', 'one'), + ('bar', 'two'), + ('baz', 'one'), + ('baz', 'two'), + ('foo', 'one'), + ('foo', 'two'), + ('qux', 'one'), + ('qux', 'two')] + +In [4]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) + +In [5]: index +Out[5]: +MultiIndex([('bar', 'one'), + ('bar', 'two'), + ('baz', 'one'), + ('baz', 'two'), + ('foo', 'one'), + ('foo', 'two'), + ('qux', 'one'), + ('qux', 'two')], + names=['first', 'second']) + +In [6]: s = pd.Series(np.random.randn(8), index=index) + +In [7]: s +Out[7]: +first second +bar one 0.469112 + two -0.282863 +baz one -1.509059 + two -1.135632 +foo one 1.212112 + two -0.173215 +qux one 0.119209 + two -1.044236 +dtype: float64 +``` + +当您想要在两个迭代器中对每个元素进行配对时,可以更容易地使用 [``MultiIndex.from_product()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_product.html#pandas.MultiIndex.from_product)方法: + +``` python +In [8]: iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']] + +In [9]: pd.MultiIndex.from_product(iterables, names=['first', 'second']) +Out[9]: +MultiIndex([('bar', 'one'), + ('bar', 'two'), + ('baz', 'one'), + ('baz', 'two'), + ('foo', 'one'), + ('foo', 'two'), + ('qux', 'one'), + ('qux', 'two')], + names=['first', 'second']) +``` + +还可以使用 [``MultiIndex.from_frame()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.from_frame.html#pandas.MultiIndex.from_frame)方法直接将一个``DataFrame``对象构造一个``多索引``。这是[``MultiIndex.to_frame()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.to_frame.html#pandas.MultiIndex.to_frame)的一个补充方法。 + +*0.24.0版本新增。* + +``` python +In [10]: df = pd.DataFrame([['bar', 'one'], ['bar', 'two'], + ....: ['foo', 'one'], ['foo', 'two']], + ....: columns=['first', 'second']) + ....: + +In [11]: pd.MultiIndex.from_frame(df) +Out[11]: +MultiIndex([('bar', 'one'), + ('bar', 'two'), + ('foo', 'one'), + ('foo', 'two')], + names=['first', 'second']) +``` + +为了方便,您可以将数组列表直接传递到`Series`或`DataFrame`中,从而自动构造一个`MultiIndex`: + +``` python +In [12]: arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), + ....: np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])] + ....: + +In [13]: s = pd.Series(np.random.randn(8), index=arrays) + +In [14]: s +Out[14]: +bar one -0.861849 + two -2.104569 +baz one -0.494929 + two 1.071804 +foo one 0.721555 + two -0.706771 +qux one -1.039575 + two 0.271860 +dtype: float64 + +In [15]: df = pd.DataFrame(np.random.randn(8, 4), index=arrays) + +In [16]: df +Out[16]: + 0 1 2 3 +bar one -0.424972 0.567020 0.276232 -1.087401 + two -0.673690 0.113648 -1.478427 0.524988 +baz one 0.404705 0.577046 -1.715002 -1.039268 + two -0.370647 -1.157892 -1.344312 0.844885 +foo one 1.075770 -0.109050 1.643563 -1.469388 + two 0.357021 -0.674600 -1.776904 -0.968914 +qux one -1.294524 0.413738 0.276662 -0.472035 + two -0.013960 -0.362543 -0.006154 -0.923061 +``` + +所`MultiIndex`构造函数都接受`names`参数,该参数存储级别本身的字符串名称。如果没有提供`name`属性,将分配`None`: + +``` python +In [17]: df.index.names +Out[17]: FrozenList([None, None]) +``` + +此索引可以支持panda对象的任何轴,索引的**级别**由开发者决定: + +``` python +In [18]: df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index) + +In [19]: df +Out[19]: +first bar baz foo qux +second one two one two one two one two +A 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 -1.170299 -0.226169 +B 0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127 -1.436737 +C -1.413681 1.607920 1.024180 0.569605 0.875906 -2.211372 0.974466 -2.006747 + +In [20]: pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6]) +Out[20]: +first bar baz foo +second one two one two one two +first second +bar one -0.410001 -0.078638 0.545952 -1.219217 -1.226825 0.769804 + two -1.281247 -0.727707 -0.121306 -0.097883 0.695775 0.341734 +baz one 0.959726 -1.110336 -0.619976 0.149748 -0.732339 0.687738 + two 0.176444 0.403310 -0.154951 0.301624 -2.179861 -1.369849 +foo one -0.954208 1.462696 -1.743161 -0.826591 -0.345352 1.314232 + two 0.690579 0.995761 2.396780 0.014871 3.357427 -0.317441 +``` + +我们已经“稀疏化”了更高级别的索引,使控制台的输出更容易显示。注意,可以使用`pandas.set_options()`中的`multi_sparse`选项控制索引的显示方式: + +``` python +In [21]: with pd.option_context('display.multi_sparse', False): + ....: df + ....: +``` + +值得记住的是,没有什么可以阻止您使用元组作为轴上的原子标签: + +``` python +In [22]: pd.Series(np.random.randn(8), index=tuples) +Out[22]: +(bar, one) -1.236269 +(bar, two) 0.896171 +(baz, one) -0.487602 +(baz, two) -0.082240 +(foo, one) -2.182937 +(foo, two) 0.380396 +(qux, one) 0.084844 +(qux, two) 0.432390 +dtype: float64 +``` + +`MultiIndex`之所以重要,是因为它允许您进行分组、选择和重新构造操作,我们将在下面的文档和后续部分中进行描述。正如您将在后面的部分中看到的,您可以发现自己使用分层索引的数据,而不需要显式地创建一个`MultiIndex`。然而,当从文件中加载数据时,您可能希望在准备数据集时生成自己的`MultiIndex`。 + +### 重构层次标签 + +[``get_level_values()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.get_level_values.html#pandas.MultiIndex.get_level_values)方法将返回特定级别每个位置的标签向量: + +``` python +In [23]: index.get_level_values(0) +Out[23]: Index(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], dtype='object', name='first') + +In [24]: index.get_level_values('second') +Out[24]: Index(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'], dtype='object', name='second') +``` + +### 基本索引轴上的多索引 + +层次索引的一个重要特性是,您可以通过`partial`标签来选择数据,该标签标识数据中的子组。**局部** 选择“降低”层次索引的级别,其结果完全类似于在常规数据Dataframe中选择列: + +``` python +In [25]: df['bar'] +Out[25]: +second one two +A 0.895717 0.805244 +B 0.410835 0.813850 +C -1.413681 1.607920 + +In [26]: df['bar', 'one'] +Out[26]: +A 0.895717 +B 0.410835 +C -1.413681 +Name: (bar, one), dtype: float64 + +In [27]: df['bar']['one'] +Out[27]: +A 0.895717 +B 0.410835 +C -1.413681 +Name: one, dtype: float64 + +In [28]: s['qux'] +Out[28]: +one -1.039575 +two 0.271860 +dtype: float64 +``` + +有关如何在更深层次上进行选择,请参见[具有层次索引的横切](#advanced-xs)。 + +### 定义不同层次索引 + + [``MultiIndex``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html#pandas.MultiIndex) 保存了所有被定义的索引层级,即使它们实际上没有被使用。在切割索引时,您可能会注意到这一点。例如: + +``` python +In [29]: df.columns.levels # original MultiIndex +Out[29]: FrozenList([['bar', 'baz', 'foo', 'qux'], ['one', 'two']]) + +In [30]: df[['foo','qux']].columns.levels # sliced +Out[30]: FrozenList([['bar', 'baz', 'foo', 'qux'], ['one', 'two']]) +``` + +这样做是为了避免重新计算级别,从而使切片具有很高的性能。如果只想查看使用的级别,可以使用[``remove_unused_levels()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.remove_unused_levels.html#pandas.MultiIndex.remove_unused_levels)方法。 + + +``` python +In [31]: df[['foo', 'qux']].columns.to_numpy() +Out[31]: +array([('foo', 'one'), ('foo', 'two'), ('qux', 'one'), ('qux', 'two')], + dtype=object) + +# for a specific level +In [32]: df[['foo', 'qux']].columns.get_level_values(0) +Out[32]: Index(['foo', 'foo', 'qux', 'qux'], dtype='object', name='first') +``` + +若要仅使用已使用的级别来重构`MultiIndex `,可以使用[``remove_unused_levels()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.remove_unused_levels.html#pandas.MultiIndex.remove_unused_levels)方法。 + +*0.20.0.中更新*。 + +``` python +In [33]: new_mi = df[['foo', 'qux']].columns.remove_unused_levels() + +In [34]: new_mi.levels +Out[34]: FrozenList([['foo', 'qux'], ['one', 'two']]) +``` + +### 数据对齐和使用 ``reindex`` + +在轴上具有`MultiIndex`的不同索引对象之间的操作将如您所期望的那样工作;数据对齐的工作原理与元组索引相同: + +``` python +In [35]: s + s[:-2] +Out[35]: +bar one -1.723698 + two -4.209138 +baz one -0.989859 + two 2.143608 +foo one 1.443110 + two -1.413542 +qux one NaN + two NaN +dtype: float64 + +In [36]: s + s[::2] +Out[36]: +bar one -1.723698 + two NaN +baz one -0.989859 + two NaN +foo one 1.443110 + two NaN +qux one -2.079150 + two NaN +dtype: float64 +``` + +``Series/DataFrames``对象的 [``reindex()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex) 方法可以调用另一个``MultiIndex`` ,甚至一个列表或数组元组: + +``` python +In [37]: s.reindex(index[:3]) +Out[37]: +first second +bar one -0.861849 + two -2.104569 +baz one -0.494929 +dtype: float64 + +In [38]: s.reindex([('foo', 'two'), ('bar', 'one'), ('qux', 'one'), ('baz', 'one')]) +Out[38]: +foo two -0.706771 +bar one -0.861849 +qux one -1.039575 +baz one -0.494929 +dtype: float64 +``` + + + +## 具有层次索引的高级索引方法 + +语法上,使用``.loc``方法,在高级索引中加入 ``MultiIndex``(多层索引)是有一些挑战的,但是我们一直在尽己所能地去实现这个功能。简单来说,多层索引的索引键(keys)来自元组的格式。例如,下列代码将会按照你的期望工作: + +``` python +In [39]: df = df.T + +In [40]: df +Out[40]: + A B C +first second +bar one 0.895717 0.410835 -1.413681 + two 0.805244 0.813850 1.607920 +baz one -1.206412 0.132003 1.024180 + two 2.565646 -0.827317 0.569605 +foo one 1.431256 -0.076467 0.875906 + two 1.340309 -1.187678 -2.211372 +qux one -1.170299 1.130127 0.974466 + two -0.226169 -1.436737 -2.006747 + +In [41]: df.loc[('bar', 'two')] +Out[41]: +A 0.805244 +B 0.813850 +C 1.607920 +Name: (bar, two), dtype: float64 +``` + +注意 ``df.loc['bar', 'two']``也将会在这个用例中正常工作,但是这种便捷的简写方法总的来说是容易产生歧义的。 + +如果你也希望使用 ``.loc``对某个特定的列进行索引,你需要使用如下的元组样式: + +``` python +In [42]: df.loc[('bar', 'two'), 'A'] +Out[42]: 0.8052440253863785 +``` + +你可以只输入元组的第一个元素,而不需要写出所有的多级索引的每一个层级。例如,你可以使用“局部”索引,来获得所有在第一层为``bar``的元素,参见下例: +```python +df.loc[‘bar’] +``` +这种方式是对于更为冗长的方式``df.loc[('bar',),]``的一个简写(在本例中,等同于``df.loc['bar',]``) + +您也可以类似地使用“局部”切片。 + +``` python +In [43]: df.loc['baz':'foo'] +Out[43]: + A B C +first second +baz one -1.206412 0.132003 1.024180 + two 2.565646 -0.827317 0.569605 +foo one 1.431256 -0.076467 0.875906 + two 1.340309 -1.187678 -2.211372 +``` + +您可以通过使用一个元组的切片,提供一个值的范围(a ‘range’ of values),来进行切片 + +``` python +In [44]: df.loc[('baz', 'two'):('qux', 'one')] +Out[44]: + A B C +first second +baz two 2.565646 -0.827317 0.569605 +foo one 1.431256 -0.076467 0.875906 + two 1.340309 -1.187678 -2.211372 +qux one -1.170299 1.130127 0.974466 + +In [45]: df.loc[('baz', 'two'):'foo'] +Out[45]: + A B C +first second +baz two 2.565646 -0.827317 0.569605 +foo one 1.431256 -0.076467 0.875906 + two 1.340309 -1.187678 -2.211372 +``` + +类似于重命名索引(reindexing),您可以通过输入一个标签的元组来实现: + +``` python +In [46]: df.loc[[('bar', 'two'), ('qux', 'one')]] +Out[46]: + A B C +first second +bar two 0.805244 0.813850 1.607920 +qux one -1.170299 1.130127 0.974466 +``` + +::: tip 小技巧 + +在pandas中,元组和列表,在索引时,是有区别的。一个元组会被识别为一个多层级的索引值(key),而列表被用于表明多个不同的索引值(several keys)。换句话说,元组是按照横向展开的,即水平层级(trasvering levels),而列表是纵向的,即扫描层级(scanning levels)。 + +::: + +注意,一个元组构成的列表提供的是完整的多级索引,而一个列表构成的元组提供的是同一个级别中的多个值: + +``` python +In [47]: s = pd.Series([1, 2, 3, 4, 5, 6], + ....: index=pd.MultiIndex.from_product([["A", "B"], ["c", "d", "e"]])) + ....: + +In [48]: s.loc[[("A", "c"), ("B", "d")]] # list of tuples +Out[48]: +A c 1 +B d 5 +dtype: int64 + +In [49]: s.loc[(["A", "B"], ["c", "d"])] # tuple of lists +Out[49]: +A c 1 + d 2 +B c 4 + d 5 +dtype: int64 +``` + +### 使用切片器 + + +你可以使用多级索引器来切片一个``多级索引 + +你可以提供任意的选择器,就仿佛你按照标签索引一样,参见[按照标签索引](indexing.html#indexing-label), 包含切片,标签构成的列表,标签,和布尔值索引器。 + +你可以使用``slice(None)``来选择所有的该级别的内容。你不需要指明所有的*深层级别*,他们将按照``slice(None)``的方式来做默认推测。 + +一如既往,切片器的**两侧**都会被包含进来,因为这是按照标签索引的方式进行的。 + +::: danger 警告 +你需要在``.loc``中声明所有的维度,这意味着同时包含**行索引**以及**列索引**。在一些情况下,索引器中的数据有可能会被错误地识别为在两个维度*同时*进行索引,而不是只对行进行多层级索引。 + +建议使用下列的方式: + +``` python +df.loc[(slice('A1', 'A3'), ...), :] # noqa: E999 +``` + +**不建议**使用下列的方式: + +``` python +df.loc[(slice('A1', 'A3'), ...)] # noqa: E999 +``` + +::: + +``` python +In [50]: def mklbl(prefix, n): + ....: return ["%s%s" % (prefix, i) for i in range(n)] + ....: + +In [51]: miindex = pd.MultiIndex.from_product([mklbl('A', 4), + ....: mklbl('B', 2), + ....: mklbl('C', 4), + ....: mklbl('D', 2)]) + ....: + +In [52]: micolumns = pd.MultiIndex.from_tuples([('a', 'foo'), ('a', 'bar'), + ....: ('b', 'foo'), ('b', 'bah')], + ....: names=['lvl0', 'lvl1']) + ....: + +In [53]: dfmi = pd.DataFrame(np.arange(len(miindex) * len(micolumns)) + ....: .reshape((len(miindex), len(micolumns))), + ....: index=miindex, + ....: columns=micolumns).sort_index().sort_index(axis=1) + ....: + +In [54]: dfmi +Out[54]: +lvl0 a b +lvl1 bar foo bah foo +A0 B0 C0 D0 1 0 3 2 + D1 5 4 7 6 + C1 D0 9 8 11 10 + D1 13 12 15 14 + C2 D0 17 16 19 18 +... ... ... ... ... +A3 B1 C1 D1 237 236 239 238 + C2 D0 241 240 243 242 + D1 245 244 247 246 + C3 D0 249 248 251 250 + D1 253 252 255 254 + +[64 rows x 4 columns] +``` + +使用切片,列表和标签来进行简单的多层级切片 + +``` python +In [55]: dfmi.loc[(slice('A1', 'A3'), slice(None), ['C1', 'C3']), :] +Out[55]: +lvl0 a b +lvl1 bar foo bah foo +A1 B0 C1 D0 73 72 75 74 + D1 77 76 79 78 + C3 D0 89 88 91 90 + D1 93 92 95 94 + B1 C1 D0 105 104 107 106 +... ... ... ... ... +A3 B0 C3 D1 221 220 223 222 + B1 C1 D0 233 232 235 234 + D1 237 236 239 238 + C3 D0 249 248 251 250 + D1 253 252 255 254 + +[24 rows x 4 columns] +``` + +你可以使用 [``pandas.IndexSlice``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IndexSlice.html#pandas.IndexSlice),即使用``:``,一个更为符合习惯的语法,而不是使用slice(None)。 + +``` python +In [56]: idx = pd.IndexSlice + +In [57]: dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']] +Out[57]: +lvl0 a b +lvl1 foo foo +A0 B0 C1 D0 8 10 + D1 12 14 + C3 D0 24 26 + D1 28 30 + B1 C1 D0 40 42 +... ... ... +A3 B0 C3 D1 220 222 + B1 C1 D0 232 234 + D1 236 238 + C3 D0 248 250 + D1 252 254 + +[32 rows x 2 columns] +``` + +您可以使用这种方法在两个维度上同时实现非常复杂的选择。 + +``` python +In [58]: dfmi.loc['A1', (slice(None), 'foo')] +Out[58]: +lvl0 a b +lvl1 foo foo +B0 C0 D0 64 66 + D1 68 70 + C1 D0 72 74 + D1 76 78 + C2 D0 80 82 +... ... ... +B1 C1 D1 108 110 + C2 D0 112 114 + D1 116 118 + C3 D0 120 122 + D1 124 126 + +[16 rows x 2 columns] + +In [59]: dfmi.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']] +Out[59]: +lvl0 a b +lvl1 foo foo +A0 B0 C1 D0 8 10 + D1 12 14 + C3 D0 24 26 + D1 28 30 + B1 C1 D0 40 42 +... ... ... +A3 B0 C3 D1 220 222 + B1 C1 D0 232 234 + D1 236 238 + C3 D0 248 250 + D1 252 254 + +[32 rows x 2 columns] +``` + +使用布尔索引器,您可以对*数值*进行选择。 + +``` python +In [60]: mask = dfmi[('a', 'foo')] > 200 + +In [61]: dfmi.loc[idx[mask, :, ['C1', 'C3']], idx[:, 'foo']] +Out[61]: +lvl0 a b +lvl1 foo foo +A3 B0 C1 D1 204 206 + C3 D0 216 218 + D1 220 222 + B1 C1 D0 232 234 + D1 236 238 + C3 D0 248 250 + D1 252 254 +``` + +您也可以使用``.loc``来明确您所希望的``维度``(``axis``),从而只在一个维度上来进行切片。 + +``` python +In [62]: dfmi.loc(axis=0)[:, :, ['C1', 'C3']] +Out[62]: +lvl0 a b +lvl1 bar foo bah foo +A0 B0 C1 D0 9 8 11 10 + D1 13 12 15 14 + C3 D0 25 24 27 26 + D1 29 28 31 30 + B1 C1 D0 41 40 43 42 +... ... ... ... ... +A3 B0 C3 D1 221 220 223 222 + B1 C1 D0 233 232 235 234 + D1 237 236 239 238 + C3 D0 249 248 251 250 + D1 253 252 255 254 + +[32 rows x 4 columns] +``` + +进一步,您可以使用下列的方式来*赋值* +``` python +In [63]: df2 = dfmi.copy() + +In [64]: df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10 + +In [65]: df2 +Out[65]: +lvl0 a b +lvl1 bar foo bah foo +A0 B0 C0 D0 1 0 3 2 + D1 5 4 7 6 + C1 D0 -10 -10 -10 -10 + D1 -10 -10 -10 -10 + C2 D0 17 16 19 18 +... ... ... ... ... +A3 B1 C1 D1 -10 -10 -10 -10 + C2 D0 241 240 243 242 + D1 245 244 247 246 + C3 D0 -10 -10 -10 -10 + D1 -10 -10 -10 -10 + +[64 rows x 4 columns] +``` + +您也可以在等号的右侧使用一个可以被“重命名”的对象来赋值 +``` python +In [66]: df2 = dfmi.copy() + +In [67]: df2.loc[idx[:, :, ['C1', 'C3']], :] = df2 * 1000 + +In [68]: df2 +Out[68]: +lvl0 a b +lvl1 bar foo bah foo +A0 B0 C0 D0 1 0 3 2 + D1 5 4 7 6 + C1 D0 9000 8000 11000 10000 + D1 13000 12000 15000 14000 + C2 D0 17 16 19 18 +... ... ... ... ... +A3 B1 C1 D1 237000 236000 239000 238000 + C2 D0 241 240 243 242 + D1 245 244 247 246 + C3 D0 249000 248000 251000 250000 + D1 253000 252000 255000 254000 + +[64 rows x 4 columns] +``` + +### 交叉选择 + +``DataFrame`` 的 [``xs()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.xs.html#pandas.DataFrame.xs)方法接受一个额外的参数,从而可以简便地在某个特定的``多级索引``中的某一个层级进行数据的选取。 + +``` python +In [69]: df +Out[69]: + A B C +first second +bar one 0.895717 0.410835 -1.413681 + two 0.805244 0.813850 1.607920 +baz one -1.206412 0.132003 1.024180 + two 2.565646 -0.827317 0.569605 +foo one 1.431256 -0.076467 0.875906 + two 1.340309 -1.187678 -2.211372 +qux one -1.170299 1.130127 0.974466 + two -0.226169 -1.436737 -2.006747 + +In [70]: df.xs('one', level='second') +Out[70]: + A B C +first +bar 0.895717 0.410835 -1.413681 +baz -1.206412 0.132003 1.024180 +foo 1.431256 -0.076467 0.875906 +qux -1.170299 1.130127 0.974466 +``` + +``` python +# using the slicers +In [71]: df.loc[(slice(None), 'one'), :] +Out[71]: + A B C +first second +bar one 0.895717 0.410835 -1.413681 +baz one -1.206412 0.132003 1.024180 +foo one 1.431256 -0.076467 0.875906 +qux one -1.170299 1.130127 0.974466 +``` + +您也可以用``xs()``并填写坐标参数来选择列。 + +``` python +In [72]: df = df.T + +In [73]: df.xs('one', level='second', axis=1) +Out[73]: +first bar baz foo qux +A 0.895717 -1.206412 1.431256 -1.170299 +B 0.410835 0.132003 -0.076467 1.130127 +C -1.413681 1.024180 0.875906 0.974466 +``` + +``` python +# using the slicers +In [74]: df.loc[:, (slice(None), 'one')] +Out[74]: +first bar baz foo qux +second one one one one +A 0.895717 -1.206412 1.431256 -1.170299 +B 0.410835 0.132003 -0.076467 1.130127 +C -1.413681 1.024180 0.875906 0.974466 +``` + +``xs``也接受多个键(keys)来进行选取 + +``` python +In [75]: df.xs(('one', 'bar'), level=('second', 'first'), axis=1) +Out[75]: +first bar +second one +A 0.895717 +B 0.410835 +C -1.413681 +``` + +``` python +# using the slicers +In [76]: df.loc[:, ('bar', 'one')] +Out[76]: +A 0.895717 +B 0.410835 +C -1.413681 +Name: (bar, one), dtype: float64 +``` + +您可以向``xs``传入 ``drop_level=False`` 来保留那些已经选取的层级。 + +``` python +In [77]: df.xs('one', level='second', axis=1, drop_level=False) +Out[77]: +first bar baz foo qux +second one one one one +A 0.895717 -1.206412 1.431256 -1.170299 +B 0.410835 0.132003 -0.076467 1.130127 +C -1.413681 1.024180 0.875906 0.974466 +``` + +请比较上面,使用 ``drop_level=True``(默认值)的结果。 + +``` python +In [78]: df.xs('one', level='second', axis=1, drop_level=True) +Out[78]: +first bar baz foo qux +A 0.895717 -1.206412 1.431256 -1.170299 +B 0.410835 0.132003 -0.076467 1.130127 +C -1.413681 1.024180 0.875906 0.974466 +``` + +### 高级重命名索引及对齐 + +``level``参数已经被加入到pandas对象中的 [``reindex()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html#pandas.DataFrame.reindex) 和 [``align()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.align.html#pandas.DataFrame.align) 方法中。这将有助于沿着一个层级来广播值(broadcast values)。例如: + +``` python +In [79]: midx = pd.MultiIndex(levels=[['zero', 'one'], ['x', 'y']], + ....: codes=[[1, 1, 0, 0], [1, 0, 1, 0]]) + ....: + +In [80]: df = pd.DataFrame(np.random.randn(4, 2), index=midx) + +In [81]: df +Out[81]: + 0 1 +one y 1.519970 -0.493662 + x 0.600178 0.274230 +zero y 0.132885 -0.023688 + x 2.410179 1.450520 + +In [82]: df2 = df.mean(level=0) + +In [83]: df2 +Out[83]: + 0 1 +one 1.060074 -0.109716 +zero 1.271532 0.713416 + +In [84]: df2.reindex(df.index, level=0) +Out[84]: + 0 1 +one y 1.060074 -0.109716 + x 1.060074 -0.109716 +zero y 1.271532 0.713416 + x 1.271532 0.713416 + +# aligning +In [85]: df_aligned, df2_aligned = df.align(df2, level=0) + +In [86]: df_aligned +Out[86]: + 0 1 +one y 1.519970 -0.493662 + x 0.600178 0.274230 +zero y 0.132885 -0.023688 + x 2.410179 1.450520 + +In [87]: df2_aligned +Out[87]: + 0 1 +one y 1.060074 -0.109716 + x 1.060074 -0.109716 +zero y 1.271532 0.713416 + x 1.271532 0.713416 +``` + +### 使用``swaplevel``来交换层级 + +[``swaplevel()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.swaplevel.html#pandas.MultiIndex.swaplevel)函数可以用来交换两个层级 + +``` python +In [88]: df[:5] +Out[88]: + 0 1 +one y 1.519970 -0.493662 + x 0.600178 0.274230 +zero y 0.132885 -0.023688 + x 2.410179 1.450520 + +In [89]: df[:5].swaplevel(0, 1, axis=0) +Out[89]: + 0 1 +y one 1.519970 -0.493662 +x one 0.600178 0.274230 +y zero 0.132885 -0.023688 +x zero 2.410179 1.450520 +``` + +### 使用``reorder_levels``来进行层级重排序 + +[``reorder_levels()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.reorder_levels.html#pandas.MultiIndex.reorder_levels)是一个更一般化的 ``swaplevel``方法,允许您用简单的一步来重排列索引的层级: + +``` python +In [90]: df[:5].reorder_levels([1, 0], axis=0) +Out[90]: + 0 1 +y one 1.519970 -0.493662 +x one 0.600178 0.274230 +y zero 0.132885 -0.023688 +x zero 2.410179 1.450520 +``` + +### 对``索引``和``多层索``引进行重命名 + + [``rename()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html#pandas.DataFrame.rename)方法可以用来重命名``多层索引``,并且他经常被用于``DataFrame``的列名重命名。``renames``的``columns``参数可以接受一个字典,从而仅仅重命名你希望更改名字的列。 + +``` python +In [91]: df.rename(columns={0: "col0", 1: "col1"}) +Out[91]: + col0 col1 +one y 1.519970 -0.493662 + x 0.600178 0.274230 +zero y 0.132885 -0.023688 + x 2.410179 1.450520 +``` + +该方法也可以被用于重命名一些``DataFrame``的特定主索引的名称。 + +``` python +In [92]: df.rename(index={"one": "two", "y": "z"}) +Out[92]: + 0 1 +two z 1.519970 -0.493662 + x 0.600178 0.274230 +zero z 0.132885 -0.023688 + x 2.410179 1.450520 +``` + +[``rename_axis()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename_axis.html#pandas.DataFrame.rename_axis) 方法可以用于对``Index`` 或者 ``MultiIndex``进行重命名。尤其的,你可以明确``MultiIndex``中的不同层级的名称,这可以被用于在之后使用 ``reset_index()`` ,把``多层级索引``的值转换为一个列 + +``` python +In [93]: df.rename_axis(index=['abc', 'def']) +Out[93]: + 0 1 +abc def +one y 1.519970 -0.493662 + x 0.600178 0.274230 +zero y 0.132885 -0.023688 + x 2.410179 1.450520 +``` + +注意,``DataFrame``的列也是一个索引,因此在``rename_axis``中使用 ``columns`` 参数,将会改变那个索引的名称 + +``` python +In [94]: df.rename_axis(columns="Cols").columns +Out[94]: RangeIndex(start=0, stop=2, step=1, name='Cols') +``` + +``rename`` 和``rename_axis``都支持一个明确的字典,``Series`` 或者一个映射函数,将标签,名称映射为新的值 + +## 对``多索引``进行排序 + +对于拥有多层级索引的对象来说,你可以通过排序来是的索引或切片更为高效。就如同其他任何的索引操作一样,你可以使用 ``sort_index``方法来实现。 + +``` python +In [95]: import random + +In [96]: random.shuffle(tuples) + +In [97]: s = pd.Series(np.random.randn(8), index=pd.MultiIndex.from_tuples(tuples)) + +In [98]: s +Out[98]: +baz one 0.206053 +foo two -0.251905 +qux one -2.213588 +foo one 1.063327 +bar two 1.266143 +baz two 0.299368 +bar one -0.863838 +qux two 0.408204 +dtype: float64 + +In [99]: s.sort_index() +Out[99]: +bar one -0.863838 + two 1.266143 +baz one 0.206053 + two 0.299368 +foo one 1.063327 + two -0.251905 +qux one -2.213588 + two 0.408204 +dtype: float64 + +In [100]: s.sort_index(level=0) +Out[100]: +bar one -0.863838 + two 1.266143 +baz one 0.206053 + two 0.299368 +foo one 1.063327 + two -0.251905 +qux one -2.213588 + two 0.408204 +dtype: float64 + +In [101]: s.sort_index(level=1) +Out[101]: +bar one -0.863838 +baz one 0.206053 +foo one 1.063327 +qux one -2.213588 +bar two 1.266143 +baz two 0.299368 +foo two -0.251905 +qux two 0.408204 +dtype: float64 +``` + +如果你的``多层级索引``都被命名了的话,你也可以向 ``sort_index`` 传入一个层级名称。 + +``` python +In [102]: s.index.set_names(['L1', 'L2'], inplace=True) + +In [103]: s.sort_index(level='L1') +Out[103]: +L1 L2 +bar one -0.863838 + two 1.266143 +baz one 0.206053 + two 0.299368 +foo one 1.063327 + two -0.251905 +qux one -2.213588 + two 0.408204 +dtype: float64 + +In [104]: s.sort_index(level='L2') +Out[104]: +L1 L2 +bar one -0.863838 +baz one 0.206053 +foo one 1.063327 +qux one -2.213588 +bar two 1.266143 +baz two 0.299368 +foo two -0.251905 +qux two 0.408204 +dtype: float64 +``` + +对于多维度的对象来说,你也可以对任意的的维度来进行索引,只要他们是具有``多层级索引``的: + +``` python +In [105]: df.T.sort_index(level=1, axis=1) +Out[105]: + one zero one zero + x x y y +0 0.600178 2.410179 1.519970 0.132885 +1 0.274230 1.450520 -0.493662 -0.023688 +``` + +即便数据没有排序,你仍然可以对他们进行索引,但是索引的效率会极大降低,并且也会抛出``PerformanceWarning``警告。而且,这将返回一个数据的副本而非一个数据的视图: + +``` python +In [106]: dfm = pd.DataFrame({'jim': [0, 0, 1, 1], + .....: 'joe': ['x', 'x', 'z', 'y'], + .....: 'jolie': np.random.rand(4)}) + .....: + +In [107]: dfm = dfm.set_index(['jim', 'joe']) + +In [108]: dfm +Out[108]: + jolie +jim joe +0 x 0.490671 + x 0.120248 +1 z 0.537020 + y 0.110968 +``` + +``` python +In [4]: dfm.loc[(1, 'z')] +PerformanceWarning: indexing past lexsort depth may impact performance. + +Out[4]: + jolie +jim joe +1 z 0.64094 +``` + +另外,如果你试图索引一个没有完全lexsorted的对象,你将会碰到如下的错误: + +``` python +In [5]: dfm.loc[(0, 'y'):(1, 'z')] +UnsortedIndexError: 'Key length (2) was greater than MultiIndex lexsort depth (1)' +``` + +在``MultiIndex``上使用 [``is_lexsorted()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.is_lexsorted.html#pandas.MultiIndex.is_lexsorted) 方法,你可以查看这个索引是否已经被排序。而使用``lexsort_depth`` 属性则可以返回排序的深度 + +``` python +In [109]: dfm.index.is_lexsorted() +Out[109]: False + +In [110]: dfm.index.lexsort_depth +Out[110]: 1 +``` + +``` python +In [111]: dfm = dfm.sort_index() + +In [112]: dfm +Out[112]: + jolie +jim joe +0 x 0.490671 + x 0.120248 +1 y 0.110968 + z 0.537020 + +In [113]: dfm.index.is_lexsorted() +Out[113]: True + +In [114]: dfm.index.lexsort_depth +Out[114]: 2 +``` + +现在,你的选择就可以正常工作了。 + +``` python +In [115]: dfm.loc[(0, 'y'):(1, 'z')] +Out[115]: + jolie +jim joe +1 y 0.110968 + z 0.537020 +``` + +## Take方法 + +与``NumPy``的``ndarrays``相似,pandas的 ``Index``, ``Series``,和``DataFrame`` 也提供 [``take()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.take.html#pandas.DataFrame.take) 方法。他可以沿着某个维度,按照给定的索引取回所有的元素。这个给定的索引必须要是一个由整数组成的列表或者ndarray,用以指明在索引中的位置。``take`` 也可以接受负整数,作为相对于结尾的相对位置。 + +``` python +In [116]: index = pd.Index(np.random.randint(0, 1000, 10)) + +In [117]: index +Out[117]: Int64Index([214, 502, 712, 567, 786, 175, 993, 133, 758, 329], dtype='int64') + +In [118]: positions = [0, 9, 3] + +In [119]: index[positions] +Out[119]: Int64Index([214, 329, 567], dtype='int64') + +In [120]: index.take(positions) +Out[120]: Int64Index([214, 329, 567], dtype='int64') + +In [121]: ser = pd.Series(np.random.randn(10)) + +In [122]: ser.iloc[positions] +Out[122]: +0 -0.179666 +9 1.824375 +3 0.392149 +dtype: float64 + +In [123]: ser.take(positions) +Out[123]: +0 -0.179666 +9 1.824375 +3 0.392149 +dtype: float64 +``` + +对于``DataFrames``来说,这个给定的索引应当是一个一维列表或者ndarray,用于指明行或者列的位置。 + +``` python +In [124]: frm = pd.DataFrame(np.random.randn(5, 3)) + +In [125]: frm.take([1, 4, 3]) +Out[125]: + 0 1 2 +1 -1.237881 0.106854 -1.276829 +4 0.629675 -1.425966 1.857704 +3 0.979542 -1.633678 0.615855 + +In [126]: frm.take([0, 2], axis=1) +Out[126]: + 0 2 +0 0.595974 0.601544 +1 -1.237881 -1.276829 +2 -0.767101 1.499591 +3 0.979542 0.615855 +4 0.629675 1.857704 +``` + +需要注意的是, pandas对象的``take`` 方法并不会正常地工作在布尔索引上,并且有可能会返回一切意外的结果。 + +``` python +In [127]: arr = np.random.randn(10) + +In [128]: arr.take([False, False, True, True]) +Out[128]: array([-1.1935, -1.1935, 0.6775, 0.6775]) + +In [129]: arr[[0, 1]] +Out[129]: array([-1.1935, 0.6775]) + +In [130]: ser = pd.Series(np.random.randn(10)) + +In [131]: ser.take([False, False, True, True]) +Out[131]: +0 0.233141 +0 0.233141 +1 -0.223540 +1 -0.223540 +dtype: float64 + +In [132]: ser.iloc[[0, 1]] +Out[132]: +0 0.233141 +1 -0.223540 +dtype: float64 +``` + +最后,关于性能方面的一个小建议,因为 ``take`` 方法处理的是一个范围更窄的输入,因此会比话实索引(fancy indexing)的速度快很多。 + +``` python +In [133]: arr = np.random.randn(10000, 5) + +In [134]: indexer = np.arange(10000) + +In [135]: random.shuffle(indexer) + +In [136]: %timeit arr[indexer] + .....: %timeit arr.take(indexer, axis=0) + .....: +152 us +- 988 ns per loop (mean +- std. dev. of 7 runs, 10000 loops each) +41.7 us +- 204 ns per loop (mean +- std. dev. of 7 runs, 10000 loops each) +``` + +``` python +In [137]: ser = pd.Series(arr[:, 0]) + +In [138]: %timeit ser.iloc[indexer] + .....: %timeit ser.take(indexer) + .....: +120 us +- 1.05 us per loop (mean +- std. dev. of 7 runs, 10000 loops each) +110 us +- 795 ns per loop (mean +- std. dev. of 7 runs, 10000 loops each) +``` + + + +## 索引类型 + +我们在前面已经较为深入地探讨过了多层索引。你可以在 [这里](timeseries.html#timeseries-overview),可以找到关于 ``DatetimeIndex`` 和``PeriodIndex``的说明文件。在 [这里](timedeltas.html#timedeltas-index),你可以找到关于``TimedeltaIndex``的说明。 + +In the following sub-sections we will highlight some other index types. + +下面的一个子章节,我们将会着重探讨另外的一些索引的类型。 + +### 分类索引 + +[``CategoricalIndex``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.CategoricalIndex.html#pandas.CategoricalIndex)分类索引 这种索引类型非常适合有重复的索引。这是一个围绕 [``Categorical``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html#pandas.Categorical) 而创建的容器。这可以非常高效地存储和索引的具有大量重复元素的索引。 + +``` python +In [139]: from pandas.api.types import CategoricalDtype + +In [140]: df = pd.DataFrame({'A': np.arange(6), + .....: 'B': list('aabbca')}) + .....: + +In [141]: df['B'] = df['B'].astype(CategoricalDtype(list('cab'))) + +In [142]: df +Out[142]: + A B +0 0 a +1 1 a +2 2 b +3 3 b +4 4 c +5 5 a + +In [143]: df.dtypes +Out[143]: +A int64 +B category +dtype: object + +In [144]: df.B.cat.categories +Out[144]: Index(['c', 'a', 'b'], dtype='object') +``` + +通过设置索引将会建立一个 ``CategoricalIndex`` 分类索引. + +``` python +In [145]: df2 = df.set_index('B') + +In [146]: df2.index +Out[146]: CategoricalIndex(['a', 'a', 'b', 'b', 'c', 'a'], categories=['c', 'a', 'b'], ordered=False, name='B', dtype='category') +``` + +使用 ``__getitem__/.iloc/.loc`` 进行索引,在含有重复值的``索引``上的工作原理相似。索引值**必须**在一个分类中,否者将会引发``KeyError``错误。 + +``` python +In [147]: df2.loc['a'] +Out[147]: + A +B +a 0 +a 1 +a 5 +``` + +``CategoricalIndex`` 在索引之后也会被**保留**: + +``` python +In [148]: df2.loc['a'].index +Out[148]: CategoricalIndex(['a', 'a', 'a'], categories=['c', 'a', 'b'], ordered=False, name='B', dtype='category') +``` + +索引排序将会按照类别清单中的顺序进行(我们已经基于 ``CategoricalDtype(list('cab'))``建立了一个索引,因此排序的顺序是``cab``) + +``` python +In [149]: df2.sort_index() +Out[149]: + A +B +c 4 +a 0 +a 1 +a 5 +b 2 +b 3 +``` + +分组操作(Groupby)也会保留索引的全部信息。 + +``` python +In [150]: df2.groupby(level=0).sum() +Out[150]: + A +B +c 4 +a 6 +b 5 + +In [151]: df2.groupby(level=0).sum().index +Out[151]: CategoricalIndex(['c', 'a', 'b'], categories=['c', 'a', 'b'], ordered=False, name='B', dtype='category') +``` + +重设索引的操作将会根据输入的索引值返回一个索引。传入一个列表,将会返回一个最普通的``Index``;如果使用类别对象``Categorical``,则会返回一个分类索引``CategoricalIndex``,按照其中**传入的**的类别值``Categorical`` dtype来进行索引。正如同你可以对**任意**pandas的索引进行重新索引一样,这将允许你随意索引任意的索引值,即便它们并**不存在**在你的类别对象中。 +``` python +In [152]: df2.reindex(['a', 'e']) +Out[152]: + A +B +a 0.0 +a 1.0 +a 5.0 +e NaN + +In [153]: df2.reindex(['a', 'e']).index +Out[153]: Index(['a', 'a', 'a', 'e'], dtype='object', name='B') + +In [154]: df2.reindex(pd.Categorical(['a', 'e'], categories=list('abcde'))) +Out[154]: + A +B +a 0.0 +a 1.0 +a 5.0 +e NaN + +In [155]: df2.reindex(pd.Categorical(['a', 'e'], categories=list('abcde'))).index +Out[155]: CategoricalIndex(['a', 'a', 'a', 'e'], categories=['a', 'b', 'c', 'd', 'e'], ordered=False, name='B', dtype='category') +``` + +::: danger 警告 + +对于一个``分类索引``的对象进行变形或者比较操作,一定要确保他们的索引包含相同的列别,否则将会出发类型错误``TypeError`` + +``` python +In [9]: df3 = pd.DataFrame({'A': np.arange(6), 'B': pd.Series(list('aabbca')).astype('category')}) + +In [11]: df3 = df3.set_index('B') + +In [11]: df3.index +Out[11]: CategoricalIndex(['a', 'a', 'b', 'b', 'c', 'a'], categories=['a', 'b', 'c'], ordered=False, name='B', dtype='category') + +In [12]: pd.concat([df2, df3]) +TypeError: categories must match existing categories when appending +``` + +::: + +### 64位整型索引和范围索引 + +::: danger 警告 + +使用浮点数进行基于数值的索引已经再0.18.0的版本中进行了声明。想查看更改的汇总,请参见 [这里](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.18.0.html#whatsnew-0180-float-indexers)。 +::: + +[``Int64Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Int64Index.html#pandas.Int64Index) 64位整型索引是pandas中的一种非常基本的索引操作。这是一个不可变的数组组成的一个有序的,可切片的集合。再0.18.0之前,``Int64Index``是会为所有``NDFrame`` 对象提供默认的索引。 + +[``RangeIndex``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.RangeIndex.html#pandas.RangeIndex) 范围索引是``64位整型索引``的子集,在v0.18.0版本加入。现在由范围索引来为所有的``NDFrame``对象提供默认索引。 +``RangeIndex``是一个对于 ``Int64Index`` 的优化版本,能够提供一个有序且严格单调的集合。这个索引与python的 [range types](https://docs.python.org/3/library/stdtypes.html#typesseq-range)是相似的 + +### 64位浮点索引 + +默认情况下,当传入浮点数、或者浮点整型混合数的时候,一个64位浮点索引 [``Float64Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Float64Index.html#pandas.Float64Index)将会自动被建立。这样将能够确保一个存粹而统一的基于标签的索引切片行为,这样``[],ix,loc``对于标量索引和切片的工作行为将会完全一致。 + +``` python +In [156]: indexf = pd.Index([1.5, 2, 3, 4.5, 5]) + +In [157]: indexf +Out[157]: Float64Index([1.5, 2.0, 3.0, 4.5, 5.0], dtype='float64') + +In [158]: sf = pd.Series(range(5), index=indexf) + +In [159]: sf +Out[159]: +1.5 0 +2.0 1 +3.0 2 +4.5 3 +5.0 4 +dtype: int64 +``` + +标量选择对于``[],.loc``永远都是基于标签的。一个整型将会自动匹配一个浮点标签(例如,``3`` 等于 ``3.0``) + +``` python +In [160]: sf[3] +Out[160]: 2 + +In [161]: sf[3.0] +Out[161]: 2 + +In [162]: sf.loc[3] +Out[162]: 2 + +In [163]: sf.loc[3.0] +Out[163]: 2 +``` + +唯一能够通过位置进行索引的方式是通过``iloc``方法。 + +``` python +In [164]: sf.iloc[3] +Out[164]: 3 +``` + +一个找不到的标量索引会触发一个``KeyError``错误。当使用``[],ix,loc``是,切片操作优先会选择索引的值,但是``iloc``**永远**都会按位置索引。唯一的例外是使用布尔索引,此时将始终按位置选择。 + +``` python +In [165]: sf[2:4] +Out[165]: +2.0 1 +3.0 2 +dtype: int64 + +In [166]: sf.loc[2:4] +Out[166]: +2.0 1 +3.0 2 +dtype: int64 + +In [167]: sf.iloc[2:4] +Out[167]: +3.0 2 +4.5 3 +dtype: int64 +``` + +如果你使用的是浮点数索引,那么使用浮点数切片也是可以执行的。 + +``` python +In [168]: sf[2.1:4.6] +Out[168]: +3.0 2 +4.5 3 +dtype: int64 + +In [169]: sf.loc[2.1:4.6] +Out[169]: +3.0 2 +4.5 3 +dtype: int64 +``` + +在非浮点数中,如果使用浮点索引,将会触发``TypeError``错误。 + +``` python +In [1]: pd.Series(range(5))[3.5] +TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index) + +In [1]: pd.Series(range(5))[3.5:4.5] +TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index) +``` + +::: danger 警告 + +从0.18.0开始,``.iloc``将不能够使用标量浮点数进行索引,因此下列操作将触发``TypeError``错误。 + +``` python +In [3]: pd.Series(range(5)).iloc[3.0] +TypeError: cannot do positional indexing on with these indexers [3.0] of +``` + +::: + +这里有一个典型的场景来使用这种类型的索引方式。设想你有一个不规范的类timedelta的索引方案,但是日期是按照浮点数的方式记录的。这将会导致(例如)毫秒级的延迟。 + +``` python +In [170]: dfir = pd.concat([pd.DataFrame(np.random.randn(5, 2), + .....: index=np.arange(5) * 250.0, + .....: columns=list('AB')), + .....: pd.DataFrame(np.random.randn(6, 2), + .....: index=np.arange(4, 10) * 250.1, + .....: columns=list('AB'))]) + .....: + +In [171]: dfir +Out[171]: + A B +0.0 -0.435772 -1.188928 +250.0 -0.808286 -0.284634 +500.0 -1.815703 1.347213 +750.0 -0.243487 0.514704 +1000.0 1.162969 -0.287725 +1000.4 -0.179734 0.993962 +1250.5 -0.212673 0.909872 +1500.6 -0.733333 -0.349893 +1750.7 0.456434 -0.306735 +2000.8 0.553396 0.166221 +2250.9 -0.101684 -0.734907 +``` + +因此选择操作将总是按照值来进行所有的选择工作, + +``` python +In [172]: dfir[0:1000.4] +Out[172]: + A B +0.0 -0.435772 -1.188928 +250.0 -0.808286 -0.284634 +500.0 -1.815703 1.347213 +750.0 -0.243487 0.514704 +1000.0 1.162969 -0.287725 +1000.4 -0.179734 0.993962 + +In [173]: dfir.loc[0:1001, 'A'] +Out[173]: +0.0 -0.435772 +250.0 -0.808286 +500.0 -1.815703 +750.0 -0.243487 +1000.0 1.162969 +1000.4 -0.179734 +Name: A, dtype: float64 + +In [174]: dfir.loc[1000.4] +Out[174]: +A -0.179734 +B 0.993962 +Name: 1000.4, dtype: float64 +``` + +你可以返回第一秒(1000毫秒)的数据: + +``` python +In [175]: dfir[0:1000] +Out[175]: + A B +0.0 -0.435772 -1.188928 +250.0 -0.808286 -0.284634 +500.0 -1.815703 1.347213 +750.0 -0.243487 0.514704 +1000.0 1.162969 -0.287725 +``` + +如果你想要使用基于整型的选择,你应该使用``iloc``: + +``` python +In [176]: dfir.iloc[0:5] +Out[176]: + A B +0.0 -0.435772 -1.188928 +250.0 -0.808286 -0.284634 +500.0 -1.815703 1.347213 +750.0 -0.243487 0.514704 +1000.0 1.162969 -0.287725 +``` + +### 间隔索引 + +*0.20.0中新加入* +[``IntervalIndex``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalIndex.html#pandas.IntervalIndex)和它自己特有的``IntervalDtype``以及 [``Interval``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Interval.html#pandas.Interval) 标量类型,在pandas中,间隔数据是获得头等支持的。 + + ``IntervalIndex``间隔索引允许一些唯一的索引,并且也是 [``cut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut) 和[``qcut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html#pandas.qcut)的返回类型 + +#### 使用``间隔索引``来进行数据索引 + +``` python +In [177]: df = pd.DataFrame({'A': [1, 2, 3, 4]}, + .....: index=pd.IntervalIndex.from_breaks([0, 1, 2, 3, 4])) + .....: + +In [178]: df +Out[178]: + A +(0, 1] 1 +(1, 2] 2 +(2, 3] 3 +(3, 4] 4 +``` + +在间隔序列上使用基于标签的索引``.loc`` ,正如你所预料到的,将会选择那个特定的间隔 + +``` python +In [179]: df.loc[2] +Out[179]: +A 2 +Name: (1, 2], dtype: int64 + +In [180]: df.loc[[2, 3]] +Out[180]: + A +(1, 2] 2 +(2, 3] 3 +``` + +如果你选取了一个标签,被*包含*在间隔当中,这个间隔也将会被选择 + +``` python +In [181]: df.loc[2.5] +Out[181]: +A 3 +Name: (2, 3], dtype: int64 + +In [182]: df.loc[[2.5, 3.5]] +Out[182]: + A +(2, 3] 3 +(3, 4] 4 +``` + +使用 ``Interval``来选择,将只返回严格匹配(从pandas0.25.0开始)。 + +``` python +In [183]: df.loc[pd.Interval(1, 2)] +Out[183]: +A 2 +Name: (1, 2], dtype: int64 +``` + +试图选择一个没有被严格包含在 ``IntervalIndex`` 内的区间``Interval``,将会出发``KeyError``错误。 + +``` python +In [7]: df.loc[pd.Interval(0.5, 2.5)] +--------------------------------------------------------------------------- +KeyError: Interval(0.5, 2.5, closed='right') +``` + +可以使用[``overlaps()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IntervalIndex.overlaps.html#pandas.IntervalIndex.overlaps)来创建一个布尔选择器,来选中所有与``给定区间``(``Interval``)重复的所有区间。 + +``` python +In [184]: idxr = df.index.overlaps(pd.Interval(0.5, 2.5)) + +In [185]: idxr +Out[185]: array([ True, True, True, False]) + +In [186]: df[idxr] +Out[186]: + A +(0, 1] 1 +(1, 2] 2 +(2, 3] 3 +``` + +#### 使用 ``cut`` 和 ``qcut``来为数据分块 + +[``cut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut) 和 [``qcut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html#pandas.qcut) 都将返回一个分类``Categorical`` 对象,并且每个分块区域都会以 分类索引``IntervalIndex``的方式被创建并保存在它的``.categories``属性中。 + +``` python +In [187]: c = pd.cut(range(4), bins=2) + +In [188]: c +Out[188]: +[(-0.003, 1.5], (-0.003, 1.5], (1.5, 3.0], (1.5, 3.0]] +Categories (2, interval[float64]): [(-0.003, 1.5] < (1.5, 3.0]] + +In [189]: c.categories +Out[189]: +IntervalIndex([(-0.003, 1.5], (1.5, 3.0]], + closed='right', + dtype='interval[float64]') +``` + +[``cut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut) 也可以接受一个 ``IntervalIndex`` 作为他的 ``bins`` 参数,这样可以使用一个非常有用的pandas的写法。 +首先,我们调用 [``cut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut) 在一些数据上面,并且将 ``bins``设置为某一个固定的数 ,从而生成bins。 + +随后,我们可以在其他的数据上调用 [``cut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut),并传入``.categories`` 的值,作为 ``bins``参数。这样新的数据就也将会被分配到同样的bins里面 + +``` python +In [190]: pd.cut([0, 3, 5, 1], bins=c.categories) +Out[190]: +[(-0.003, 1.5], (1.5, 3.0], NaN, (-0.003, 1.5]] +Categories (2, interval[float64]): [(-0.003, 1.5] < (1.5, 3.0]] +``` + +任何落在bins之外的数据都将会被设为 ``NaN`` + +#### 生成一定区间内的间隔 + +如果我们需要经常地使用步进区间,我们可以使用 [``interval_range()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.interval_range.html#pandas.interval_range) 函数,结合 ``start``, ``end``, 和 ``periods``来建立一个 ``IntervalIndex`` +对于数值型的间隔,默认的 ``interval_range``间隔频率是1,对于datetime类型的间隔则是日历日。 + +``` python +In [191]: pd.interval_range(start=0, end=5) +Out[191]: +IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]], + closed='right', + dtype='interval[int64]') + +In [192]: pd.interval_range(start=pd.Timestamp('2017-01-01'), periods=4) +Out[192]: +IntervalIndex([(2017-01-01, 2017-01-02], (2017-01-02, 2017-01-03], (2017-01-03, 2017-01-04], (2017-01-04, 2017-01-05]], + closed='right', + dtype='interval[datetime64[ns]]') + +In [193]: pd.interval_range(end=pd.Timedelta('3 days'), periods=3) +Out[193]: +IntervalIndex([(0 days 00:00:00, 1 days 00:00:00], (1 days 00:00:00, 2 days 00:00:00], (2 days 00:00:00, 3 days 00:00:00]], + closed='right', + dtype='interval[timedelta64[ns]]') +``` + + ``freq`` 参数可以被用来明确非默认的频率,并且可以充分地利用各种各样的 [frequency aliases](timeseries.html#timeseries-offset-aliases) datetime类型的时间间隔。 + +``` python +In [194]: pd.interval_range(start=0, periods=5, freq=1.5) +Out[194]: +IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0], (6.0, 7.5]], + closed='right', + dtype='interval[float64]') + +In [195]: pd.interval_range(start=pd.Timestamp('2017-01-01'), periods=4, freq='W') +Out[195]: +IntervalIndex([(2017-01-01, 2017-01-08], (2017-01-08, 2017-01-15], (2017-01-15, 2017-01-22], (2017-01-22, 2017-01-29]], + closed='right', + dtype='interval[datetime64[ns]]') + +In [196]: pd.interval_range(start=pd.Timedelta('0 days'), periods=3, freq='9H') +Out[196]: +IntervalIndex([(0 days 00:00:00, 0 days 09:00:00], (0 days 09:00:00, 0 days 18:00:00], (0 days 18:00:00, 1 days 03:00:00]], + closed='right', + dtype='interval[timedelta64[ns]]') +``` + +此外, ``closed`` 参数可以用来声明哪个边界是包含的。默认情况下,间隔的右界是包含的。 + +``` python +In [197]: pd.interval_range(start=0, end=4, closed='both') +Out[197]: +IntervalIndex([[0, 1], [1, 2], [2, 3], [3, 4]], + closed='both', + dtype='interval[int64]') + +In [198]: pd.interval_range(start=0, end=4, closed='neither') +Out[198]: +IntervalIndex([(0, 1), (1, 2), (2, 3), (3, 4)], + closed='neither', + dtype='interval[int64]') +``` +*v0.23.0新加入* + +使用``start``, ``end``, 和 ``periods``可以从 ``start`` 到 ``end``(包含)生成一个平均分配的间隔,在返回``IntervalIndex``中生成``periods``这么多的元素(译者:区间)。 + +``` python +In [199]: pd.interval_range(start=0, end=6, periods=4) +Out[199]: +IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]], + closed='right', + dtype='interval[float64]') + +In [200]: pd.interval_range(pd.Timestamp('2018-01-01'), + .....: pd.Timestamp('2018-02-28'), periods=3) + .....: +Out[200]: +IntervalIndex([(2018-01-01, 2018-01-20 08:00:00], (2018-01-20 08:00:00, 2018-02-08 16:00:00], (2018-02-08 16:00:00, 2018-02-28]], + closed='right', + dtype='interval[datetime64[ns]]') +``` + + +## 其他索引常见问题 + +### 数值索引 + +使用数值作为各维度的标签,再基于标签进行索引是一个非常痛苦的话题。在Scientific Python社区的邮件列表中,进行着剧烈的争论。在Pandas中,我们一般性的观点是,标签比实际的(用数值表示的)位置更为重要。因此,对于使用数值作为标签的的对象来说,*只有*基于标签的索引才可以在标准工具,例如``.loc``方法,中正常使用。下面的代码将引发错误: + +``` python +In [201]: s = pd.Series(range(5)) + +In [202]: s[-1] +--------------------------------------------------------------------------- +KeyError Traceback (most recent call last) + in +----> 1 s[-1] + +/pandas/pandas/core/series.py in __getitem__(self, key) + 1062 key = com.apply_if_callable(key, self) + 1063 try: +-> 1064 result = self.index.get_value(self, key) + 1065 + 1066 if not is_scalar(result): + +/pandas/pandas/core/indexes/base.py in get_value(self, series, key) + 4721 k = self._convert_scalar_indexer(k, kind="getitem") + 4722 try: +-> 4723 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None)) + 4724 except KeyError as e1: + 4725 if len(self) > 0 and (self.holds_integer() or self.is_boolean()): + +/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value() + +/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value() + +/pandas/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() + +/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() + +/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item() + +KeyError: -1 + +In [203]: df = pd.DataFrame(np.random.randn(5, 4)) + +In [204]: df +Out[204]: + 0 1 2 3 +0 -0.130121 -0.476046 0.759104 0.213379 +1 -0.082641 0.448008 0.656420 -1.051443 +2 0.594956 -0.151360 -0.069303 1.221431 +3 -0.182832 0.791235 0.042745 2.069775 +4 1.446552 0.019814 -1.389212 -0.702312 + +In [205]: df.loc[-2:] +Out[205]: + 0 1 2 3 +0 -0.130121 -0.476046 0.759104 0.213379 +1 -0.082641 0.448008 0.656420 -1.051443 +2 0.594956 -0.151360 -0.069303 1.221431 +3 -0.182832 0.791235 0.042745 2.069775 +4 1.446552 0.019814 -1.389212 -0.702312 +``` + +我们特意地做了这样的设计,是为了阻止歧义性以及一些难以避免的小bug(当我们修改了函数,从而阻止了“滚回”到基于位置的索引方式以后,许多用户报告说,他们发现了bug)。 + +### 非单调索引要求严格匹配 + +如果一个 ``序列`` 或者 ``数据表``是单调递增或递减的,那么基于标签的切片行为的边界是可以超出索引的,这与普通的python``列表``的索引切片非常相似。索引的单调性可以使用 [``is_monotonic_increasing()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.is_monotonic_increasing.html#pandas.Index.is_monotonic_increasing) 和[``is_monotonic_decreasing()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.is_monotonic_decreasing.html#pandas.Index.is_monotonic_decreasing) 属性来检查 + +``` python +In [206]: df = pd.DataFrame(index=[2, 3, 3, 4, 5], columns=['data'], data=list(range(5))) + +In [207]: df.index.is_monotonic_increasing +Out[207]: True + +# no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4: +In [208]: df.loc[0:4, :] +Out[208]: + data +2 0 +3 1 +3 2 +4 3 + +# slice is are outside the index, so empty DataFrame is returned +In [209]: df.loc[13:15, :] +Out[209]: +Empty DataFrame +Columns: [data] +Index: [] +``` + +另一方面,如果索引不是单调的,那么切片的两侧边界都必须是索引值中的*唯一*值。 + +``` python +In [210]: df = pd.DataFrame(index=[2, 3, 1, 4, 3, 5], + .....: columns=['data'], data=list(range(6))) + .....: + +In [211]: df.index.is_monotonic_increasing +Out[211]: False + +# OK because 2 and 4 are in the index +In [212]: df.loc[2:4, :] +Out[212]: + data +2 0 +3 1 +1 2 +4 3 +``` + +``` python +# 0 is not in the index +In [9]: df.loc[0:4, :] +KeyError: 0 + +# 3 is not a unique label +In [11]: df.loc[2:3, :] +KeyError: 'Cannot get right slice bound for non-unique label: 3' +``` + +``Index.is_monotonic_increasing``和``Index.is_monotonic_decreasing``方法只能进行弱单调性的检查。要进行严格的单调性检查,你可以配合 [``is_unique()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.is_unique.html#pandas.Index.is_unique) 方法一起使用。 + +``` python +In [213]: weakly_monotonic = pd.Index(['a', 'b', 'c', 'c']) + +In [214]: weakly_monotonic +Out[214]: Index(['a', 'b', 'c', 'c'], dtype='object') + +In [215]: weakly_monotonic.is_monotonic_increasing +Out[215]: True + +In [216]: weakly_monotonic.is_monotonic_increasing & weakly_monotonic.is_unique +Out[216]: False +``` + +### 终止点被包含在内 + +与表中的python序列切片中,终止点不被包含不同,基于标签的切片在Pandas中,终止点是被**包含在内**的。最主要的原因是因为,我们很难准确地确定在索引中的“下一个”标签“到底是什么。例如下面这个``序列``: + +``` python +In [217]: s = pd.Series(np.random.randn(6), index=list('abcdef')) + +In [218]: s +Out[218]: +a 0.301379 +b 1.240445 +c -0.846068 +d -0.043312 +e -1.658747 +f -0.819549 +dtype: float64 +``` + +如果我们希望从``c``选取到``e``,如果我们使用基于数值的索引,那将会由如下操作: + +``` python +In [219]: s[2:5] +Out[219]: +c -0.846068 +d -0.043312 +e -1.658747 +dtype: float64 +``` + +然而,如果你只有``c``和``e``,确定下一个索引中的元素将会是比较困难的。例如,下面的这种方法完全是行不通的: + +``` python +s.loc['c':'e' + 1] +``` + +一个非常常见的用例是限制一个时间序列的起始和终止日期。为了能够便于操作,我们决定在基于标签的切片行为中包含两个端点: + +``` python +In [220]: s.loc['c':'e'] +Out[220]: +c -0.846068 +d -0.043312 +e -1.658747 +dtype: float64 +``` + +这是一个非常典型的“显示战胜理想”的情况,但是如果你仅仅是想当然的认为基于标签的索引应该会和标准python中的整数型索引有着相同的行为时,你也确实需要多加留意。 + +### 索引会潜在地改变序列的dtype + +不同的索引操作有可能会潜在地改变一个``序列``的dtypes + +``` python +In [221]: series1 = pd.Series([1, 2, 3]) + +In [222]: series1.dtype +Out[222]: dtype('int64') + +In [223]: res = series1.reindex([0, 4]) + +In [224]: res.dtype +Out[224]: dtype('float64') + +In [225]: res +Out[225]: +0 1.0 +4 NaN +dtype: float64 +``` + +``` python +In [226]: series2 = pd.Series([True]) + +In [227]: series2.dtype +Out[227]: dtype('bool') + +In [228]: res = series2.reindex_like(series1) + +In [229]: res.dtype +Out[229]: dtype('O') + +In [230]: res +Out[230]: +0 True +1 NaN +2 NaN +dtype: object +``` + +这是因为上述(重新)索引的操作悄悄地插入了 ``NaNs`` ,因此dtype也就随之发生改变了。如果你在使用一些``numpy``的``ufuncs``,如 ``numpy.logical_and``时,将会导致一些问题。 + +参见 [this old issue](https://github.com/pydata/pandas/issues/2388)了解更详细的讨论过程 \ No newline at end of file diff --git a/Python/pandas/user_guide/categorical.md b/Python/pandas/user_guide/categorical.md new file mode 100644 index 00000000..fa55a9fe --- /dev/null +++ b/Python/pandas/user_guide/categorical.md @@ -0,0 +1,2012 @@ +# Categorical data + +This is an introduction to pandas categorical data type, including a short comparison +with R’s ``factor``. + +*Categoricals* are a pandas data type corresponding to categorical variables in +statistics. A categorical variable takes on a limited, and usually fixed, +number of possible values (*categories*; *levels* in R). Examples are gender, +social class, blood type, country affiliation, observation time or rating via +Likert scales. + +In contrast to statistical categorical variables, categorical data might have an order (e.g. +‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical +operations (additions, divisions, …) are not possible. + +All values of categorical data are either in *categories* or *np.nan*. Order is defined by +the order of *categories*, not lexical order of the values. Internally, the data structure +consists of a *categories* array and an integer array of *codes* which point to the real value in +the *categories* array. + +The categorical data type is useful in the following cases: + +- A string variable consisting of only a few different values. Converting such a string +variable to a categorical variable will save some memory, see [here](#categorical-memory). +- The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). +By converting to a categorical and specifying an order on the categories, sorting and +min/max will use the logical order instead of the lexical order, see [here](#categorical-sort). +- As a signal to other Python libraries that this column should be treated as a categorical +variable (e.g. to use suitable statistical methods or plot types). + +See also the [API docs on categoricals](https://pandas.pydata.org/pandas-docs/stable/reference/arrays.html#api-arrays-categorical). + +## Object creation + +### Series creation + +Categorical ``Series`` or columns in a ``DataFrame`` can be created in several ways: + +By specifying ``dtype="category"`` when constructing a ``Series``: + +``` python +In [1]: s = pd.Series(["a", "b", "c", "a"], dtype="category") + +In [2]: s +Out[2]: +0 a +1 b +2 c +3 a +dtype: category +Categories (3, object): [a, b, c] +``` + +By converting an existing ``Series`` or column to a ``category`` dtype: + +``` python +In [3]: df = pd.DataFrame({"A": ["a", "b", "c", "a"]}) + +In [4]: df["B"] = df["A"].astype('category') + +In [5]: df +Out[5]: + A B +0 a a +1 b b +2 c c +3 a a +``` + +By using special functions, such as [``cut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut), which groups data into +discrete bins. See the [example on tiling](reshaping.html#reshaping-tile-cut) in the docs. + +``` python +In [6]: df = pd.DataFrame({'value': np.random.randint(0, 100, 20)}) + +In [7]: labels = ["{0} - {1}".format(i, i + 9) for i in range(0, 100, 10)] + +In [8]: df['group'] = pd.cut(df.value, range(0, 105, 10), right=False, labels=labels) + +In [9]: df.head(10) +Out[9]: + value group +0 65 60 - 69 +1 49 40 - 49 +2 56 50 - 59 +3 43 40 - 49 +4 43 40 - 49 +5 91 90 - 99 +6 32 30 - 39 +7 87 80 - 89 +8 36 30 - 39 +9 8 0 - 9 +``` + +By passing a [``pandas.Categorical``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html#pandas.Categorical) object to a ``Series`` or assigning it to a ``DataFrame``. + +``` python +In [10]: raw_cat = pd.Categorical(["a", "b", "c", "a"], categories=["b", "c", "d"], + ....: ordered=False) + ....: + +In [11]: s = pd.Series(raw_cat) + +In [12]: s +Out[12]: +0 NaN +1 b +2 c +3 NaN +dtype: category +Categories (3, object): [b, c, d] + +In [13]: df = pd.DataFrame({"A": ["a", "b", "c", "a"]}) + +In [14]: df["B"] = raw_cat + +In [15]: df +Out[15]: + A B +0 a NaN +1 b b +2 c c +3 a NaN +``` + +Categorical data has a specific ``category`` [dtype](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes): + +``` python +In [16]: df.dtypes +Out[16]: +A object +B category +dtype: object +``` + +### DataFrame creation + +Similar to the previous section where a single column was converted to categorical, all columns in a +``DataFrame`` can be batch converted to categorical either during or after construction. + +This can be done during construction by specifying ``dtype="category"`` in the ``DataFrame`` constructor: + +``` python +In [17]: df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')}, dtype="category") + +In [18]: df.dtypes +Out[18]: +A category +B category +dtype: object +``` + +Note that the categories present in each column differ; the conversion is done column by column, so +only labels present in a given column are categories: + +``` python +In [19]: df['A'] +Out[19]: +0 a +1 b +2 c +3 a +Name: A, dtype: category +Categories (3, object): [a, b, c] + +In [20]: df['B'] +Out[20]: +0 b +1 c +2 c +3 d +Name: B, dtype: category +Categories (3, object): [b, c, d] +``` + +*New in version 0.23.0.* + +Analogously, all columns in an existing ``DataFrame`` can be batch converted using [``DataFrame.astype()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype): + +``` python +In [21]: df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')}) + +In [22]: df_cat = df.astype('category') + +In [23]: df_cat.dtypes +Out[23]: +A category +B category +dtype: object +``` + +This conversion is likewise done column by column: + +``` python +In [24]: df_cat['A'] +Out[24]: +0 a +1 b +2 c +3 a +Name: A, dtype: category +Categories (3, object): [a, b, c] + +In [25]: df_cat['B'] +Out[25]: +0 b +1 c +2 c +3 d +Name: B, dtype: category +Categories (3, object): [b, c, d] +``` + +### Controlling behavior + +In the examples above where we passed ``dtype='category'``, we used the default +behavior: + +1. Categories are inferred from the data. +1. Categories are unordered. + +To control those behaviors, instead of passing ``'category'``, use an instance +of ``CategoricalDtype``. + +``` python +In [26]: from pandas.api.types import CategoricalDtype + +In [27]: s = pd.Series(["a", "b", "c", "a"]) + +In [28]: cat_type = CategoricalDtype(categories=["b", "c", "d"], + ....: ordered=True) + ....: + +In [29]: s_cat = s.astype(cat_type) + +In [30]: s_cat +Out[30]: +0 NaN +1 b +2 c +3 NaN +dtype: category +Categories (3, object): [b < c < d] +``` + +Similarly, a ``CategoricalDtype`` can be used with a ``DataFrame`` to ensure that categories +are consistent among all columns. + +``` python +In [31]: from pandas.api.types import CategoricalDtype + +In [32]: df = pd.DataFrame({'A': list('abca'), 'B': list('bccd')}) + +In [33]: cat_type = CategoricalDtype(categories=list('abcd'), + ....: ordered=True) + ....: + +In [34]: df_cat = df.astype(cat_type) + +In [35]: df_cat['A'] +Out[35]: +0 a +1 b +2 c +3 a +Name: A, dtype: category +Categories (4, object): [a < b < c < d] + +In [36]: df_cat['B'] +Out[36]: +0 b +1 c +2 c +3 d +Name: B, dtype: category +Categories (4, object): [a < b < c < d] +``` + +::: tip Note + +To perform table-wise conversion, where all labels in the entire ``DataFrame`` are used as +categories for each column, the ``categories`` parameter can be determined programmatically by +``categories = pd.unique(df.to_numpy().ravel())``. + +::: + +If you already have ``codes`` and ``categories``, you can use the +[``from_codes()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.from_codes.html#pandas.Categorical.from_codes) constructor to save the factorize step +during normal constructor mode: + +``` python +In [37]: splitter = np.random.choice([0, 1], 5, p=[0.5, 0.5]) + +In [38]: s = pd.Series(pd.Categorical.from_codes(splitter, + ....: categories=["train", "test"])) + ....: +``` + +### Regaining original data + +To get back to the original ``Series`` or NumPy array, use +``Series.astype(original_dtype)`` or ``np.asarray(categorical)``: + +``` python +In [39]: s = pd.Series(["a", "b", "c", "a"]) + +In [40]: s +Out[40]: +0 a +1 b +2 c +3 a +dtype: object + +In [41]: s2 = s.astype('category') + +In [42]: s2 +Out[42]: +0 a +1 b +2 c +3 a +dtype: category +Categories (3, object): [a, b, c] + +In [43]: s2.astype(str) +Out[43]: +0 a +1 b +2 c +3 a +dtype: object + +In [44]: np.asarray(s2) +Out[44]: array(['a', 'b', 'c', 'a'], dtype=object) +``` + +::: tip Note + +In contrast to R’s *factor* function, categorical data is not converting input values to +strings; categories will end up the same data type as the original values. + +::: + +::: tip Note + +In contrast to R’s *factor* function, there is currently no way to assign/change labels at +creation time. Use *categories* to change the categories after creation time. + +::: + +## CategoricalDtype + +*Changed in version 0.21.0.* + +A categorical’s type is fully described by + +1. ``categories``: a sequence of unique values and no missing values +1. ``ordered``: a boolean + +This information can be stored in a ``CategoricalDtype``. +The ``categories`` argument is optional, which implies that the actual categories +should be inferred from whatever is present in the data when the +[``pandas.Categorical``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html#pandas.Categorical) is created. The categories are assumed to be unordered +by default. + +``` python +In [45]: from pandas.api.types import CategoricalDtype + +In [46]: CategoricalDtype(['a', 'b', 'c']) +Out[46]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=None) + +In [47]: CategoricalDtype(['a', 'b', 'c'], ordered=True) +Out[47]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=True) + +In [48]: CategoricalDtype() +Out[48]: CategoricalDtype(categories=None, ordered=None) +``` + +A ``CategoricalDtype`` can be used in any place pandas +expects a *dtype*. For example [``pandas.read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv), +[``pandas.DataFrame.astype()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype), or in the ``Series`` constructor. + +::: tip Note + +As a convenience, you can use the string ``'category'`` in place of a +``CategoricalDtype`` when you want the default behavior of +the categories being unordered, and equal to the set values present in the +array. In other words, ``dtype='category'`` is equivalent to +``dtype=CategoricalDtype()``. + +::: + +### Equality semantics + +Two instances of ``CategoricalDtype`` compare equal +whenever they have the same categories and order. When comparing two +unordered categoricals, the order of the ``categories`` is not considered. + +``` python +In [49]: c1 = CategoricalDtype(['a', 'b', 'c'], ordered=False) + +# Equal, since order is not considered when ordered=False +In [50]: c1 == CategoricalDtype(['b', 'c', 'a'], ordered=False) +Out[50]: True + +# Unequal, since the second CategoricalDtype is ordered +In [51]: c1 == CategoricalDtype(['a', 'b', 'c'], ordered=True) +Out[51]: False +``` + +All instances of ``CategoricalDtype`` compare equal to the string ``'category'``. + +``` python +In [52]: c1 == 'category' +Out[52]: True +``` + +::: danger Warning + +Since ``dtype='category'`` is essentially ``CategoricalDtype(None, False)``, +and since all instances ``CategoricalDtype`` compare equal to ``'category'``, +all instances of ``CategoricalDtype`` compare equal to a +``CategoricalDtype(None, False)``, regardless of ``categories`` or +``ordered``. + +::: + +## Description + +Using [``describe()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html#pandas.DataFrame.describe) on categorical data will produce similar +output to a ``Series`` or ``DataFrame`` of type ``string``. + +``` python +In [53]: cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"]) + +In [54]: df = pd.DataFrame({"cat": cat, "s": ["a", "c", "c", np.nan]}) + +In [55]: df.describe() +Out[55]: + cat s +count 3 3 +unique 2 2 +top c c +freq 2 2 + +In [56]: df["cat"].describe() +Out[56]: +count 3 +unique 2 +top c +freq 2 +Name: cat, dtype: object +``` + +## Working with categories + +Categorical data has a *categories* and a *ordered* property, which list their +possible values and whether the ordering matters or not. These properties are +exposed as ``s.cat.categories`` and ``s.cat.ordered``. If you don’t manually +specify categories and ordering, they are inferred from the passed arguments. + +``` python +In [57]: s = pd.Series(["a", "b", "c", "a"], dtype="category") + +In [58]: s.cat.categories +Out[58]: Index(['a', 'b', 'c'], dtype='object') + +In [59]: s.cat.ordered +Out[59]: False +``` + +It’s also possible to pass in the categories in a specific order: + +``` python +In [60]: s = pd.Series(pd.Categorical(["a", "b", "c", "a"], + ....: categories=["c", "b", "a"])) + ....: + +In [61]: s.cat.categories +Out[61]: Index(['c', 'b', 'a'], dtype='object') + +In [62]: s.cat.ordered +Out[62]: False +``` + +::: tip Note + +New categorical data are **not** automatically ordered. You must explicitly +pass ``ordered=True`` to indicate an ordered ``Categorical``. + +::: + +::: tip Note + +The result of [``unique()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.unique.html#pandas.Series.unique) is not always the same as ``Series.cat.categories``, +because ``Series.unique()`` has a couple of guarantees, namely that it returns categories +in the order of appearance, and it only includes values that are actually present. + +``` python +In [63]: s = pd.Series(list('babc')).astype(CategoricalDtype(list('abcd'))) + +In [64]: s +Out[64]: +0 b +1 a +2 b +3 c +dtype: category +Categories (4, object): [a, b, c, d] + +# categories +In [65]: s.cat.categories +Out[65]: Index(['a', 'b', 'c', 'd'], dtype='object') + +# uniques +In [66]: s.unique() +Out[66]: +[b, a, c] +Categories (3, object): [b, a, c] +``` + +::: + +### Renaming categories + +Renaming categories is done by assigning new values to the +``Series.cat.categories`` property or by using the +``rename_categories()`` method: + +``` python +In [67]: s = pd.Series(["a", "b", "c", "a"], dtype="category") + +In [68]: s +Out[68]: +0 a +1 b +2 c +3 a +dtype: category +Categories (3, object): [a, b, c] + +In [69]: s.cat.categories = ["Group %s" % g for g in s.cat.categories] + +In [70]: s +Out[70]: +0 Group a +1 Group b +2 Group c +3 Group a +dtype: category +Categories (3, object): [Group a, Group b, Group c] + +In [71]: s = s.cat.rename_categories([1, 2, 3]) + +In [72]: s +Out[72]: +0 1 +1 2 +2 3 +3 1 +dtype: category +Categories (3, int64): [1, 2, 3] + +# You can also pass a dict-like object to map the renaming +In [73]: s = s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'}) + +In [74]: s +Out[74]: +0 x +1 y +2 z +3 x +dtype: category +Categories (3, object): [x, y, z] +``` + +::: tip Note + +In contrast to R’s *factor*, categorical data can have categories of other types than string. + +::: + +::: tip Note + +Be aware that assigning new categories is an inplace operation, while most other operations +under ``Series.cat`` per default return a new ``Series`` of dtype *category*. + +::: + +Categories must be unique or a *ValueError* is raised: + +``` python +In [75]: try: + ....: s.cat.categories = [1, 1, 1] + ....: except ValueError as e: + ....: print("ValueError:", str(e)) + ....: +ValueError: Categorical categories must be unique +``` + +Categories must also not be ``NaN`` or a *ValueError* is raised: + +``` python +In [76]: try: + ....: s.cat.categories = [1, 2, np.nan] + ....: except ValueError as e: + ....: print("ValueError:", str(e)) + ....: +ValueError: Categorial categories cannot be null +``` + +### Appending new categories + +Appending categories can be done by using the +``add_categories()`` method: + +``` python +In [77]: s = s.cat.add_categories([4]) + +In [78]: s.cat.categories +Out[78]: Index(['x', 'y', 'z', 4], dtype='object') + +In [79]: s +Out[79]: +0 x +1 y +2 z +3 x +dtype: category +Categories (4, object): [x, y, z, 4] +``` + +### Removing categories + +Removing categories can be done by using the +``remove_categories()`` method. Values which are removed +are replaced by ``np.nan``.: + +``` python +In [80]: s = s.cat.remove_categories([4]) + +In [81]: s +Out[81]: +0 x +1 y +2 z +3 x +dtype: category +Categories (3, object): [x, y, z] +``` + +### Removing unused categories + +Removing unused categories can also be done: + +``` python +In [82]: s = pd.Series(pd.Categorical(["a", "b", "a"], + ....: categories=["a", "b", "c", "d"])) + ....: + +In [83]: s +Out[83]: +0 a +1 b +2 a +dtype: category +Categories (4, object): [a, b, c, d] + +In [84]: s.cat.remove_unused_categories() +Out[84]: +0 a +1 b +2 a +dtype: category +Categories (2, object): [a, b] +``` + +### Setting categories + +If you want to do remove and add new categories in one step (which has some +speed advantage), or simply set the categories to a predefined scale, +use ``set_categories()``. + +``` python +In [85]: s = pd.Series(["one", "two", "four", "-"], dtype="category") + +In [86]: s +Out[86]: +0 one +1 two +2 four +3 - +dtype: category +Categories (4, object): [-, four, one, two] + +In [87]: s = s.cat.set_categories(["one", "two", "three", "four"]) + +In [88]: s +Out[88]: +0 one +1 two +2 four +3 NaN +dtype: category +Categories (4, object): [one, two, three, four] +``` + +::: tip Note + +Be aware that ``Categorical.set_categories()`` cannot know whether some category is omitted +intentionally or because it is misspelled or (under Python3) due to a type difference (e.g., +NumPy S1 dtype and Python strings). This can result in surprising behaviour! + +::: + +## Sorting and order + +If categorical data is ordered (``s.cat.ordered == True``), then the order of the categories has a +meaning and certain operations are possible. If the categorical is unordered, ``.min()/.max()`` will raise a ``TypeError``. + +``` python +In [89]: s = pd.Series(pd.Categorical(["a", "b", "c", "a"], ordered=False)) + +In [90]: s.sort_values(inplace=True) + +In [91]: s = pd.Series(["a", "b", "c", "a"]).astype( + ....: CategoricalDtype(ordered=True) + ....: ) + ....: + +In [92]: s.sort_values(inplace=True) + +In [93]: s +Out[93]: +0 a +3 a +1 b +2 c +dtype: category +Categories (3, object): [a < b < c] + +In [94]: s.min(), s.max() +Out[94]: ('a', 'c') +``` + +You can set categorical data to be ordered by using ``as_ordered()`` or unordered by using ``as_unordered()``. These will by +default return a *new* object. + +``` python +In [95]: s.cat.as_ordered() +Out[95]: +0 a +3 a +1 b +2 c +dtype: category +Categories (3, object): [a < b < c] + +In [96]: s.cat.as_unordered() +Out[96]: +0 a +3 a +1 b +2 c +dtype: category +Categories (3, object): [a, b, c] +``` + +Sorting will use the order defined by categories, not any lexical order present on the data type. +This is even true for strings and numeric data: + +``` python +In [97]: s = pd.Series([1, 2, 3, 1], dtype="category") + +In [98]: s = s.cat.set_categories([2, 3, 1], ordered=True) + +In [99]: s +Out[99]: +0 1 +1 2 +2 3 +3 1 +dtype: category +Categories (3, int64): [2 < 3 < 1] + +In [100]: s.sort_values(inplace=True) + +In [101]: s +Out[101]: +1 2 +2 3 +0 1 +3 1 +dtype: category +Categories (3, int64): [2 < 3 < 1] + +In [102]: s.min(), s.max() +Out[102]: (2, 1) +``` + +### Reordering + +Reordering the categories is possible via the ``Categorical.reorder_categories()`` and +the ``Categorical.set_categories()`` methods. For ``Categorical.reorder_categories()``, all +old categories must be included in the new categories and no new categories are allowed. This will +necessarily make the sort order the same as the categories order. + +``` python +In [103]: s = pd.Series([1, 2, 3, 1], dtype="category") + +In [104]: s = s.cat.reorder_categories([2, 3, 1], ordered=True) + +In [105]: s +Out[105]: +0 1 +1 2 +2 3 +3 1 +dtype: category +Categories (3, int64): [2 < 3 < 1] + +In [106]: s.sort_values(inplace=True) + +In [107]: s +Out[107]: +1 2 +2 3 +0 1 +3 1 +dtype: category +Categories (3, int64): [2 < 3 < 1] + +In [108]: s.min(), s.max() +Out[108]: (2, 1) +``` + +::: tip Note + +Note the difference between assigning new categories and reordering the categories: the first +renames categories and therefore the individual values in the ``Series``, but if the first +position was sorted last, the renamed value will still be sorted last. Reordering means that the +way values are sorted is different afterwards, but not that individual values in the +``Series`` are changed. + +::: + +::: tip Note + +If the ``Categorical`` is not ordered, [``Series.min()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.min.html#pandas.Series.min) and [``Series.max()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.max.html#pandas.Series.max) will raise +``TypeError``. Numeric operations like ``+``, ``-``, ``*``, ``/`` and operations based on them +(e.g. [``Series.median()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.median.html#pandas.Series.median), which would need to compute the mean between two values if the length +of an array is even) do not work and raise a ``TypeError``. + +::: + +### Multi column sorting + +A categorical dtyped column will participate in a multi-column sort in a similar manner to other columns. +The ordering of the categorical is determined by the ``categories`` of that column. + +``` python +In [109]: dfs = pd.DataFrame({'A': pd.Categorical(list('bbeebbaa'), + .....: categories=['e', 'a', 'b'], + .....: ordered=True), + .....: 'B': [1, 2, 1, 2, 2, 1, 2, 1]}) + .....: + +In [110]: dfs.sort_values(by=['A', 'B']) +Out[110]: + A B +2 e 1 +3 e 2 +7 a 1 +6 a 2 +0 b 1 +5 b 1 +1 b 2 +4 b 2 +``` + +Reordering the ``categories`` changes a future sort. + +``` python +In [111]: dfs['A'] = dfs['A'].cat.reorder_categories(['a', 'b', 'e']) + +In [112]: dfs.sort_values(by=['A', 'B']) +Out[112]: + A B +7 a 1 +6 a 2 +0 b 1 +5 b 1 +1 b 2 +4 b 2 +2 e 1 +3 e 2 +``` + +## Comparisons + +Comparing categorical data with other objects is possible in three cases: + +- Comparing equality (``==`` and ``!=``) to a list-like object (list, Series, array, +…) of the same length as the categorical data. +- All comparisons (``==``, ``!=``, ``>``, ``>=``, ``<``, and ``<=``) of categorical data to +another categorical Series, when ``ordered==True`` and the *categories* are the same. +- All comparisons of a categorical data to a scalar. + +All other comparisons, especially “non-equality” comparisons of two categoricals with different +categories or a categorical with any list-like object, will raise a ``TypeError``. + +::: tip Note + +Any “non-equality” comparisons of categorical data with a ``Series``, ``np.array``, ``list`` or +categorical data with different categories or ordering will raise a ``TypeError`` because custom +categories ordering could be interpreted in two ways: one with taking into account the +ordering and one without. + +::: + +``` python +In [113]: cat = pd.Series([1, 2, 3]).astype( + .....: CategoricalDtype([3, 2, 1], ordered=True) + .....: ) + .....: + +In [114]: cat_base = pd.Series([2, 2, 2]).astype( + .....: CategoricalDtype([3, 2, 1], ordered=True) + .....: ) + .....: + +In [115]: cat_base2 = pd.Series([2, 2, 2]).astype( + .....: CategoricalDtype(ordered=True) + .....: ) + .....: + +In [116]: cat +Out[116]: +0 1 +1 2 +2 3 +dtype: category +Categories (3, int64): [3 < 2 < 1] + +In [117]: cat_base +Out[117]: +0 2 +1 2 +2 2 +dtype: category +Categories (3, int64): [3 < 2 < 1] + +In [118]: cat_base2 +Out[118]: +0 2 +1 2 +2 2 +dtype: category +Categories (1, int64): [2] +``` + +Comparing to a categorical with the same categories and ordering or to a scalar works: + +``` python +In [119]: cat > cat_base +Out[119]: +0 True +1 False +2 False +dtype: bool + +In [120]: cat > 2 +Out[120]: +0 True +1 False +2 False +dtype: bool +``` + +Equality comparisons work with any list-like object of same length and scalars: + +``` python +In [121]: cat == cat_base +Out[121]: +0 False +1 True +2 False +dtype: bool + +In [122]: cat == np.array([1, 2, 3]) +Out[122]: +0 True +1 True +2 True +dtype: bool + +In [123]: cat == 2 +Out[123]: +0 False +1 True +2 False +dtype: bool +``` + +This doesn’t work because the categories are not the same: + +``` python +In [124]: try: + .....: cat > cat_base2 + .....: except TypeError as e: + .....: print("TypeError:", str(e)) + .....: +TypeError: Categoricals can only be compared if 'categories' are the same. Categories are different lengths +``` + +If you want to do a “non-equality” comparison of a categorical series with a list-like object +which is not categorical data, you need to be explicit and convert the categorical data back to +the original values: + +``` python +In [125]: base = np.array([1, 2, 3]) + +In [126]: try: + .....: cat > base + .....: except TypeError as e: + .....: print("TypeError:", str(e)) + .....: +TypeError: Cannot compare a Categorical for op __gt__ with type . +If you want to compare values, use 'np.asarray(cat) other'. + +In [127]: np.asarray(cat) > base +Out[127]: array([False, False, False]) +``` + +When you compare two unordered categoricals with the same categories, the order is not considered: + +``` python +In [128]: c1 = pd.Categorical(['a', 'b'], categories=['a', 'b'], ordered=False) + +In [129]: c2 = pd.Categorical(['a', 'b'], categories=['b', 'a'], ordered=False) + +In [130]: c1 == c2 +Out[130]: array([ True, True]) +``` + +## Operations + +Apart from [``Series.min()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.min.html#pandas.Series.min), [``Series.max()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.max.html#pandas.Series.max) and [``Series.mode()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.mode.html#pandas.Series.mode), the +following operations are possible with categorical data: + +``Series`` methods like [``Series.value_counts()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html#pandas.Series.value_counts) will use all categories, +even if some categories are not present in the data: + +``` python +In [131]: s = pd.Series(pd.Categorical(["a", "b", "c", "c"], + .....: categories=["c", "a", "b", "d"])) + .....: + +In [132]: s.value_counts() +Out[132]: +c 2 +b 1 +a 1 +d 0 +dtype: int64 +``` + +Groupby will also show “unused” categories: + +``` python +In [133]: cats = pd.Categorical(["a", "b", "b", "b", "c", "c", "c"], + .....: categories=["a", "b", "c", "d"]) + .....: + +In [134]: df = pd.DataFrame({"cats": cats, "values": [1, 2, 2, 2, 3, 4, 5]}) + +In [135]: df.groupby("cats").mean() +Out[135]: + values +cats +a 1.0 +b 2.0 +c 4.0 +d NaN + +In [136]: cats2 = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"]) + +In [137]: df2 = pd.DataFrame({"cats": cats2, + .....: "B": ["c", "d", "c", "d"], + .....: "values": [1, 2, 3, 4]}) + .....: + +In [138]: df2.groupby(["cats", "B"]).mean() +Out[138]: + values +cats B +a c 1.0 + d 2.0 +b c 3.0 + d 4.0 +c c NaN + d NaN +``` + +Pivot tables: + +``` python +In [139]: raw_cat = pd.Categorical(["a", "a", "b", "b"], categories=["a", "b", "c"]) + +In [140]: df = pd.DataFrame({"A": raw_cat, + .....: "B": ["c", "d", "c", "d"], + .....: "values": [1, 2, 3, 4]}) + .....: + +In [141]: pd.pivot_table(df, values='values', index=['A', 'B']) +Out[141]: + values +A B +a c 1 + d 2 +b c 3 + d 4 +``` + +## Data munging + +The optimized pandas data access methods ``.loc``, ``.iloc``, ``.at``, and ``.iat``, +work as normal. The only difference is the return type (for getting) and +that only values already in *categories* can be assigned. + +### Getting + +If the slicing operation returns either a ``DataFrame`` or a column of type +``Series``, the ``category`` dtype is preserved. + +``` python +In [142]: idx = pd.Index(["h", "i", "j", "k", "l", "m", "n"]) + +In [143]: cats = pd.Series(["a", "b", "b", "b", "c", "c", "c"], + .....: dtype="category", index=idx) + .....: + +In [144]: values = [1, 2, 2, 2, 3, 4, 5] + +In [145]: df = pd.DataFrame({"cats": cats, "values": values}, index=idx) + +In [146]: df.iloc[2:4, :] +Out[146]: + cats values +j b 2 +k b 2 + +In [147]: df.iloc[2:4, :].dtypes +Out[147]: +cats category +values int64 +dtype: object + +In [148]: df.loc["h":"j", "cats"] +Out[148]: +h a +i b +j b +Name: cats, dtype: category +Categories (3, object): [a, b, c] + +In [149]: df[df["cats"] == "b"] +Out[149]: + cats values +i b 2 +j b 2 +k b 2 +``` + +An example where the category type is not preserved is if you take one single +row: the resulting ``Series`` is of dtype ``object``: + +``` python +# get the complete "h" row as a Series +In [150]: df.loc["h", :] +Out[150]: +cats a +values 1 +Name: h, dtype: object +``` + +Returning a single item from categorical data will also return the value, not a categorical +of length “1”. + +``` python +In [151]: df.iat[0, 0] +Out[151]: 'a' + +In [152]: df["cats"].cat.categories = ["x", "y", "z"] + +In [153]: df.at["h", "cats"] # returns a string +Out[153]: 'x' +``` + +::: tip Note + +The is in contrast to R’s *factor* function, where ``factor(c(1,2,3))[1]`` +returns a single value *factor*. + +::: + +To get a single value ``Series`` of type ``category``, you pass in a list with +a single value: + +``` python +In [154]: df.loc[["h"], "cats"] +Out[154]: +h x +Name: cats, dtype: category +Categories (3, object): [x, y, z] +``` + +### String and datetime accessors + +The accessors ``.dt`` and ``.str`` will work if the ``s.cat.categories`` are of +an appropriate type: + +``` python +In [155]: str_s = pd.Series(list('aabb')) + +In [156]: str_cat = str_s.astype('category') + +In [157]: str_cat +Out[157]: +0 a +1 a +2 b +3 b +dtype: category +Categories (2, object): [a, b] + +In [158]: str_cat.str.contains("a") +Out[158]: +0 True +1 True +2 False +3 False +dtype: bool + +In [159]: date_s = pd.Series(pd.date_range('1/1/2015', periods=5)) + +In [160]: date_cat = date_s.astype('category') + +In [161]: date_cat +Out[161]: +0 2015-01-01 +1 2015-01-02 +2 2015-01-03 +3 2015-01-04 +4 2015-01-05 +dtype: category +Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05] + +In [162]: date_cat.dt.day +Out[162]: +0 1 +1 2 +2 3 +3 4 +4 5 +dtype: int64 +``` + +::: tip Note + +The returned ``Series`` (or ``DataFrame``) is of the same type as if you used the +``.str.`` / ``.dt.`` on a ``Series`` of that type (and not of +type ``category``!). + +::: + +That means, that the returned values from methods and properties on the accessors of a +``Series`` and the returned values from methods and properties on the accessors of this +``Series`` transformed to one of type *category* will be equal: + +``` python +In [163]: ret_s = str_s.str.contains("a") + +In [164]: ret_cat = str_cat.str.contains("a") + +In [165]: ret_s.dtype == ret_cat.dtype +Out[165]: True + +In [166]: ret_s == ret_cat +Out[166]: +0 True +1 True +2 True +3 True +dtype: bool +``` + +::: tip Note + +The work is done on the ``categories`` and then a new ``Series`` is constructed. This has +some performance implication if you have a ``Series`` of type string, where lots of elements +are repeated (i.e. the number of unique elements in the ``Series`` is a lot smaller than the +length of the ``Series``). In this case it can be faster to convert the original ``Series`` +to one of type ``category`` and use ``.str.`` or ``.dt.`` on that. + +::: + +### Setting + +Setting values in a categorical column (or ``Series``) works as long as the +value is included in the *categories*: + +``` python +In [167]: idx = pd.Index(["h", "i", "j", "k", "l", "m", "n"]) + +In [168]: cats = pd.Categorical(["a", "a", "a", "a", "a", "a", "a"], + .....: categories=["a", "b"]) + .....: + +In [169]: values = [1, 1, 1, 1, 1, 1, 1] + +In [170]: df = pd.DataFrame({"cats": cats, "values": values}, index=idx) + +In [171]: df.iloc[2:4, :] = [["b", 2], ["b", 2]] + +In [172]: df +Out[172]: + cats values +h a 1 +i a 1 +j b 2 +k b 2 +l a 1 +m a 1 +n a 1 + +In [173]: try: + .....: df.iloc[2:4, :] = [["c", 3], ["c", 3]] + .....: except ValueError as e: + .....: print("ValueError:", str(e)) + .....: +ValueError: Cannot setitem on a Categorical with a new category, set the categories first +``` + +Setting values by assigning categorical data will also check that the *categories* match: + +``` python +In [174]: df.loc["j":"k", "cats"] = pd.Categorical(["a", "a"], categories=["a", "b"]) + +In [175]: df +Out[175]: + cats values +h a 1 +i a 1 +j a 2 +k a 2 +l a 1 +m a 1 +n a 1 + +In [176]: try: + .....: df.loc["j":"k", "cats"] = pd.Categorical(["b", "b"], + .....: categories=["a", "b", "c"]) + .....: except ValueError as e: + .....: print("ValueError:", str(e)) + .....: +ValueError: Cannot set a Categorical with another, without identical categories +``` + +Assigning a ``Categorical`` to parts of a column of other types will use the values: + +``` python +In [177]: df = pd.DataFrame({"a": [1, 1, 1, 1, 1], "b": ["a", "a", "a", "a", "a"]}) + +In [178]: df.loc[1:2, "a"] = pd.Categorical(["b", "b"], categories=["a", "b"]) + +In [179]: df.loc[2:3, "b"] = pd.Categorical(["b", "b"], categories=["a", "b"]) + +In [180]: df +Out[180]: + a b +0 1 a +1 b a +2 b b +3 1 b +4 1 a + +In [181]: df.dtypes +Out[181]: +a object +b object +dtype: object +``` + +### Merging + +You can concat two ``DataFrames`` containing categorical data together, +but the categories of these categoricals need to be the same: + +``` python +In [182]: cat = pd.Series(["a", "b"], dtype="category") + +In [183]: vals = [1, 2] + +In [184]: df = pd.DataFrame({"cats": cat, "vals": vals}) + +In [185]: res = pd.concat([df, df]) + +In [186]: res +Out[186]: + cats vals +0 a 1 +1 b 2 +0 a 1 +1 b 2 + +In [187]: res.dtypes +Out[187]: +cats category +vals int64 +dtype: object +``` + +In this case the categories are not the same, and therefore an error is raised: + +``` python +In [188]: df_different = df.copy() + +In [189]: df_different["cats"].cat.categories = ["c", "d"] + +In [190]: try: + .....: pd.concat([df, df_different]) + .....: except ValueError as e: + .....: print("ValueError:", str(e)) + .....: +``` + +The same applies to ``df.append(df_different)``. + +See also the section on [merge dtypes](merging.html#merging-dtypes) for notes about preserving merge dtypes and performance. + +### Unioning + +*New in version 0.19.0.* + +If you want to combine categoricals that do not necessarily have the same +categories, the [``union_categoricals()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.types.union_categoricals.html#pandas.api.types.union_categoricals) function will +combine a list-like of categoricals. The new categories will be the union of +the categories being combined. + +``` python +In [191]: from pandas.api.types import union_categoricals + +In [192]: a = pd.Categorical(["b", "c"]) + +In [193]: b = pd.Categorical(["a", "b"]) + +In [194]: union_categoricals([a, b]) +Out[194]: +[b, c, a, b] +Categories (3, object): [b, c, a] +``` + +By default, the resulting categories will be ordered as +they appear in the data. If you want the categories to +be lexsorted, use ``sort_categories=True`` argument. + +``` python +In [195]: union_categoricals([a, b], sort_categories=True) +Out[195]: +[b, c, a, b] +Categories (3, object): [a, b, c] +``` + +``union_categoricals`` also works with the “easy” case of combining two +categoricals of the same categories and order information +(e.g. what you could also ``append`` for). + +``` python +In [196]: a = pd.Categorical(["a", "b"], ordered=True) + +In [197]: b = pd.Categorical(["a", "b", "a"], ordered=True) + +In [198]: union_categoricals([a, b]) +Out[198]: +[a, b, a, b, a] +Categories (2, object): [a < b] +``` + +The below raises ``TypeError`` because the categories are ordered and not identical. + +``` python +In [1]: a = pd.Categorical(["a", "b"], ordered=True) +In [2]: b = pd.Categorical(["a", "b", "c"], ordered=True) +In [3]: union_categoricals([a, b]) +Out[3]: +TypeError: to union ordered Categoricals, all categories must be the same +``` + +*New in version 0.20.0.* + +Ordered categoricals with different categories or orderings can be combined by +using the ``ignore_ordered=True`` argument. + +``` python +In [199]: a = pd.Categorical(["a", "b", "c"], ordered=True) + +In [200]: b = pd.Categorical(["c", "b", "a"], ordered=True) + +In [201]: union_categoricals([a, b], ignore_order=True) +Out[201]: +[a, b, c, c, b, a] +Categories (3, object): [a, b, c] +``` + +[``union_categoricals()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.types.union_categoricals.html#pandas.api.types.union_categoricals) also works with a +``CategoricalIndex``, or ``Series`` containing categorical data, but note that +the resulting array will always be a plain ``Categorical``: + +``` python +In [202]: a = pd.Series(["b", "c"], dtype='category') + +In [203]: b = pd.Series(["a", "b"], dtype='category') + +In [204]: union_categoricals([a, b]) +Out[204]: +[b, c, a, b] +Categories (3, object): [b, c, a] +``` + +::: tip Note + +``union_categoricals`` may recode the integer codes for categories +when combining categoricals. This is likely what you want, +but if you are relying on the exact numbering of the categories, be +aware. + +``` python +In [205]: c1 = pd.Categorical(["b", "c"]) + +In [206]: c2 = pd.Categorical(["a", "b"]) + +In [207]: c1 +Out[207]: +[b, c] +Categories (2, object): [b, c] + +# "b" is coded to 0 +In [208]: c1.codes +Out[208]: array([0, 1], dtype=int8) + +In [209]: c2 +Out[209]: +[a, b] +Categories (2, object): [a, b] + +# "b" is coded to 1 +In [210]: c2.codes +Out[210]: array([0, 1], dtype=int8) + +In [211]: c = union_categoricals([c1, c2]) + +In [212]: c +Out[212]: +[b, c, a, b] +Categories (3, object): [b, c, a] + +# "b" is coded to 0 throughout, same as c1, different from c2 +In [213]: c.codes +Out[213]: array([0, 1, 2, 0], dtype=int8) +``` + +::: + +### Concatenation + +This section describes concatenations specific to ``category`` dtype. See [Concatenating objects](merging.html#merging-concat) for general description. + +By default, ``Series`` or ``DataFrame`` concatenation which contains the same categories +results in ``category`` dtype, otherwise results in ``object`` dtype. +Use ``.astype`` or ``union_categoricals`` to get ``category`` result. + +``` python +# same categories +In [214]: s1 = pd.Series(['a', 'b'], dtype='category') + +In [215]: s2 = pd.Series(['a', 'b', 'a'], dtype='category') + +In [216]: pd.concat([s1, s2]) +Out[216]: +0 a +1 b +0 a +1 b +2 a +dtype: category +Categories (2, object): [a, b] + +# different categories +In [217]: s3 = pd.Series(['b', 'c'], dtype='category') + +In [218]: pd.concat([s1, s3]) +Out[218]: +0 a +1 b +0 b +1 c +dtype: object + +In [219]: pd.concat([s1, s3]).astype('category') +Out[219]: +0 a +1 b +0 b +1 c +dtype: category +Categories (3, object): [a, b, c] + +In [220]: union_categoricals([s1.array, s3.array]) +Out[220]: +[a, b, b, c] +Categories (3, object): [a, b, c] +``` + +Following table summarizes the results of ``Categoricals`` related concatenations. + +arg1 | arg2 | result +---|---|--- +category | category (identical categories) | category +category | category (different categories, both not ordered) | object (dtype is inferred) +category | category (different categories, either one is ordered) | object (dtype is inferred) +category | not category | object (dtype is inferred) + +## Getting data in/out + +You can write data that contains ``category`` dtypes to a ``HDFStore``. +See [here](io.html#io-hdf5-categorical) for an example and caveats. + +It is also possible to write data to and reading data from *Stata* format files. +See [here](io.html#io-stata-categorical) for an example and caveats. + +Writing to a CSV file will convert the data, effectively removing any information about the +categorical (categories and ordering). So if you read back the CSV file you have to convert the +relevant columns back to *category* and assign the right categories and categories ordering. + +``` python +In [221]: import io + +In [222]: s = pd.Series(pd.Categorical(['a', 'b', 'b', 'a', 'a', 'd'])) + +# rename the categories +In [223]: s.cat.categories = ["very good", "good", "bad"] + +# reorder the categories and add missing categories +In [224]: s = s.cat.set_categories(["very bad", "bad", "medium", "good", "very good"]) + +In [225]: df = pd.DataFrame({"cats": s, "vals": [1, 2, 3, 4, 5, 6]}) + +In [226]: csv = io.StringIO() + +In [227]: df.to_csv(csv) + +In [228]: df2 = pd.read_csv(io.StringIO(csv.getvalue())) + +In [229]: df2.dtypes +Out[229]: +Unnamed: 0 int64 +cats object +vals int64 +dtype: object + +In [230]: df2["cats"] +Out[230]: +0 very good +1 good +2 good +3 very good +4 very good +5 bad +Name: cats, dtype: object + +# Redo the category +In [231]: df2["cats"] = df2["cats"].astype("category") + +In [232]: df2["cats"].cat.set_categories(["very bad", "bad", "medium", + .....: "good", "very good"], + .....: inplace=True) + .....: + +In [233]: df2.dtypes +Out[233]: +Unnamed: 0 int64 +cats category +vals int64 +dtype: object + +In [234]: df2["cats"] +Out[234]: +0 very good +1 good +2 good +3 very good +4 very good +5 bad +Name: cats, dtype: category +Categories (5, object): [very bad, bad, medium, good, very good] +``` + +The same holds for writing to a SQL database with ``to_sql``. + +## Missing data + +pandas primarily uses the value *np.nan* to represent missing data. It is by +default not included in computations. See the [Missing Data section](missing_data.html#missing-data). + +Missing values should **not** be included in the Categorical’s ``categories``, +only in the ``values``. +Instead, it is understood that NaN is different, and is always a possibility. +When working with the Categorical’s ``codes``, missing values will always have +a code of ``-1``. + +``` python +In [235]: s = pd.Series(["a", "b", np.nan, "a"], dtype="category") + +# only two categories +In [236]: s +Out[236]: +0 a +1 b +2 NaN +3 a +dtype: category +Categories (2, object): [a, b] + +In [237]: s.cat.codes +Out[237]: +0 0 +1 1 +2 -1 +3 0 +dtype: int8 +``` + +Methods for working with missing data, e.g. [``isna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isna.html#pandas.Series.isna), [``fillna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.fillna.html#pandas.Series.fillna), +[``dropna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dropna.html#pandas.Series.dropna), all work normally: + +``` python +In [238]: s = pd.Series(["a", "b", np.nan], dtype="category") + +In [239]: s +Out[239]: +0 a +1 b +2 NaN +dtype: category +Categories (2, object): [a, b] + +In [240]: pd.isna(s) +Out[240]: +0 False +1 False +2 True +dtype: bool + +In [241]: s.fillna("a") +Out[241]: +0 a +1 b +2 a +dtype: category +Categories (2, object): [a, b] +``` + +## Differences to R’s factor + +The following differences to R’s factor functions can be observed: + +- R’s *levels* are named *categories*. +- R’s *levels* are always of type string, while *categories* in pandas can be of any dtype. +- It’s not possible to specify labels at creation time. Use ``s.cat.rename_categories(new_labels)`` +afterwards. +- In contrast to R’s *factor* function, using categorical data as the sole input to create a +new categorical series will *not* remove unused categories but create a new categorical series +which is equal to the passed in one! +- R allows for missing values to be included in its *levels* (pandas’ *categories*). Pandas +does not allow *NaN* categories, but missing values can still be in the *values*. + +## Gotchas + +### Memory usage + +The memory usage of a ``Categorical`` is proportional to the number of categories plus the length of the data. In contrast, +an ``object`` dtype is a constant times the length of the data. + +``` python +In [242]: s = pd.Series(['foo', 'bar'] * 1000) + +# object dtype +In [243]: s.nbytes +Out[243]: 16000 + +# category dtype +In [244]: s.astype('category').nbytes +Out[244]: 2016 +``` + +::: tip Note + +If the number of categories approaches the length of the data, the ``Categorical`` will use nearly the same or +more memory than an equivalent ``object`` dtype representation. + +``` python +In [245]: s = pd.Series(['foo%04d' % i for i in range(2000)]) + +# object dtype +In [246]: s.nbytes +Out[246]: 16000 + +# category dtype +In [247]: s.astype('category').nbytes +Out[247]: 20000 +``` + +::: + +### Categorical is not a numpy array + +Currently, categorical data and the underlying ``Categorical`` is implemented as a Python +object and not as a low-level NumPy array dtype. This leads to some problems. + +NumPy itself doesn’t know about the new *dtype*: + +``` python +In [248]: try: + .....: np.dtype("category") + .....: except TypeError as e: + .....: print("TypeError:", str(e)) + .....: +TypeError: data type "category" not understood + +In [249]: dtype = pd.Categorical(["a"]).dtype + +In [250]: try: + .....: np.dtype(dtype) + .....: except TypeError as e: + .....: print("TypeError:", str(e)) + .....: +TypeError: data type not understood +``` + +Dtype comparisons work: + +``` python +In [251]: dtype == np.str_ +Out[251]: False + +In [252]: np.str_ == dtype +Out[252]: False +``` + +To check if a Series contains Categorical data, use ``hasattr(s, 'cat')``: + +``` python +In [253]: hasattr(pd.Series(['a'], dtype='category'), 'cat') +Out[253]: True + +In [254]: hasattr(pd.Series(['a']), 'cat') +Out[254]: False +``` + +Using NumPy functions on a ``Series`` of type ``category`` should not work as *Categoricals* +are not numeric data (even in the case that ``.categories`` is numeric). + +``` python +In [255]: s = pd.Series(pd.Categorical([1, 2, 3, 4])) + +In [256]: try: + .....: np.sum(s) + .....: except TypeError as e: + .....: print("TypeError:", str(e)) + .....: +TypeError: Categorical cannot perform the operation sum +``` + +::: tip Note + +If such a function works, please file a bug at [https://github.com/pandas-dev/pandas](https://github.com/pandas-dev/pandas)! + +::: + +### dtype in apply + +Pandas currently does not preserve the dtype in apply functions: If you apply along rows you get +a *Series* of ``object`` *dtype* (same as getting a row -> getting one element will return a +basic type) and applying along columns will also convert to object. ``NaN`` values are unaffected. +You can use ``fillna`` to handle missing values before applying a function. + +``` python +In [257]: df = pd.DataFrame({"a": [1, 2, 3, 4], + .....: "b": ["a", "b", "c", "d"], + .....: "cats": pd.Categorical([1, 2, 3, 2])}) + .....: + +In [258]: df.apply(lambda row: type(row["cats"]), axis=1) +Out[258]: +0 +1 +2 +3 +dtype: object + +In [259]: df.apply(lambda col: col.dtype, axis=0) +Out[259]: +a int64 +b object +cats category +dtype: object +``` + +### Categorical index + +``CategoricalIndex`` is a type of index that is useful for supporting +indexing with duplicates. This is a container around a ``Categorical`` +and allows efficient indexing and storage of an index with a large number of duplicated elements. +See the [advanced indexing docs](advanced.html#indexing-categoricalindex) for a more detailed +explanation. + +Setting the index will create a ``CategoricalIndex``: + +``` python +In [260]: cats = pd.Categorical([1, 2, 3, 4], categories=[4, 2, 3, 1]) + +In [261]: strings = ["a", "b", "c", "d"] + +In [262]: values = [4, 2, 3, 1] + +In [263]: df = pd.DataFrame({"strings": strings, "values": values}, index=cats) + +In [264]: df.index +Out[264]: CategoricalIndex([1, 2, 3, 4], categories=[4, 2, 3, 1], ordered=False, dtype='category') + +# This now sorts by the categories order +In [265]: df.sort_index() +Out[265]: + strings values +4 d 1 +2 b 2 +3 c 3 +1 a 4 +``` + +### Side effects + +Constructing a ``Series`` from a ``Categorical`` will not copy the input +``Categorical``. This means that changes to the ``Series`` will in most cases +change the original ``Categorical``: + +``` python +In [266]: cat = pd.Categorical([1, 2, 3, 10], categories=[1, 2, 3, 4, 10]) + +In [267]: s = pd.Series(cat, name="cat") + +In [268]: cat +Out[268]: +[1, 2, 3, 10] +Categories (5, int64): [1, 2, 3, 4, 10] + +In [269]: s.iloc[0:2] = 10 + +In [270]: cat +Out[270]: +[10, 10, 3, 10] +Categories (5, int64): [1, 2, 3, 4, 10] + +In [271]: df = pd.DataFrame(s) + +In [272]: df["cat"].cat.categories = [1, 2, 3, 4, 5] + +In [273]: cat +Out[273]: +[5, 5, 3, 5] +Categories (5, int64): [1, 2, 3, 4, 5] +``` + +Use ``copy=True`` to prevent such a behaviour or simply don’t reuse ``Categoricals``: + +``` python +In [274]: cat = pd.Categorical([1, 2, 3, 10], categories=[1, 2, 3, 4, 10]) + +In [275]: s = pd.Series(cat, name="cat", copy=True) + +In [276]: cat +Out[276]: +[1, 2, 3, 10] +Categories (5, int64): [1, 2, 3, 4, 10] + +In [277]: s.iloc[0:2] = 10 + +In [278]: cat +Out[278]: +[1, 2, 3, 10] +Categories (5, int64): [1, 2, 3, 4, 10] +``` + +::: tip Note + +This also happens in some cases when you supply a NumPy array instead of a ``Categorical``: +using an int array (e.g. ``np.array([1,2,3,4])``) will exhibit the same behavior, while using +a string array (e.g. ``np.array(["a","b","c","a"])``) will not. + +::: diff --git a/Python/pandas/user_guide/computation.md b/Python/pandas/user_guide/computation.md new file mode 100644 index 00000000..24c5ee95 --- /dev/null +++ b/Python/pandas/user_guide/computation.md @@ -0,0 +1,1452 @@ +# Computational tools + +## Statistical functions + +### Percent change + +``Series`` and ``DataFrame`` have a method +[``pct_change()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html#pandas.DataFrame.pct_change) to compute the percent change over a given number +of periods (using ``fill_method`` to fill NA/null values *before* computing +the percent change). + +``` python +In [1]: ser = pd.Series(np.random.randn(8)) + +In [2]: ser.pct_change() +Out[2]: +0 NaN +1 -1.602976 +2 4.334938 +3 -0.247456 +4 -2.067345 +5 -1.142903 +6 -1.688214 +7 -9.759729 +dtype: float64 +``` + +``` python +In [3]: df = pd.DataFrame(np.random.randn(10, 4)) + +In [4]: df.pct_change(periods=3) +Out[4]: + 0 1 2 3 +0 NaN NaN NaN NaN +1 NaN NaN NaN NaN +2 NaN NaN NaN NaN +3 -0.218320 -1.054001 1.987147 -0.510183 +4 -0.439121 -1.816454 0.649715 -4.822809 +5 -0.127833 -3.042065 -5.866604 -1.776977 +6 -2.596833 -1.959538 -2.111697 -3.798900 +7 -0.117826 -2.169058 0.036094 -0.067696 +8 2.492606 -1.357320 -1.205802 -1.558697 +9 -1.012977 2.324558 -1.003744 -0.371806 +``` + +### Covariance + +[``Series.cov()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.cov.html#pandas.Series.cov) can be used to compute covariance between series +(excluding missing values). + +``` python +In [5]: s1 = pd.Series(np.random.randn(1000)) + +In [6]: s2 = pd.Series(np.random.randn(1000)) + +In [7]: s1.cov(s2) +Out[7]: 0.000680108817431082 +``` + +Analogously, [``DataFrame.cov()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cov.html#pandas.DataFrame.cov) to compute pairwise covariances among the +series in the DataFrame, also excluding NA/null values. + +::: tip Note + +Assuming the missing data are missing at random this results in an estimate +for the covariance matrix which is unbiased. However, for many applications +this estimate may not be acceptable because the estimated covariance matrix +is not guaranteed to be positive semi-definite. This could lead to +estimated correlations having absolute values which are greater than one, +and/or a non-invertible covariance matrix. See [Estimation of covariance +matrices](http://en.wikipedia.org/w/index.php?title=Estimation_of_covariance_matrices) +for more details. + +::: + +``` python +In [8]: frame = pd.DataFrame(np.random.randn(1000, 5), + ...: columns=['a', 'b', 'c', 'd', 'e']) + ...: + +In [9]: frame.cov() +Out[9]: + a b c d e +a 1.000882 -0.003177 -0.002698 -0.006889 0.031912 +b -0.003177 1.024721 0.000191 0.009212 0.000857 +c -0.002698 0.000191 0.950735 -0.031743 -0.005087 +d -0.006889 0.009212 -0.031743 1.002983 -0.047952 +e 0.031912 0.000857 -0.005087 -0.047952 1.042487 +``` + +``DataFrame.cov`` also supports an optional ``min_periods`` keyword that +specifies the required minimum number of observations for each column pair +in order to have a valid result. + +``` python +In [10]: frame = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c']) + +In [11]: frame.loc[frame.index[:5], 'a'] = np.nan + +In [12]: frame.loc[frame.index[5:10], 'b'] = np.nan + +In [13]: frame.cov() +Out[13]: + a b c +a 1.123670 -0.412851 0.018169 +b -0.412851 1.154141 0.305260 +c 0.018169 0.305260 1.301149 + +In [14]: frame.cov(min_periods=12) +Out[14]: + a b c +a 1.123670 NaN 0.018169 +b NaN 1.154141 0.305260 +c 0.018169 0.305260 1.301149 +``` + +### Correlation + +Correlation may be computed using the [``corr()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html#pandas.DataFrame.corr) method. +Using the ``method`` parameter, several methods for computing correlations are +provided: + +Method name | Description +---|--- +pearson (default) | Standard correlation coefficient +kendall | Kendall Tau correlation coefficient +spearman | Spearman rank correlation coefficient + +All of these are currently computed using pairwise complete observations. +Wikipedia has articles covering the above correlation coefficients: + +- [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient) +- [Kendall rank correlation coefficient](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient) +- [Spearman’s rank correlation coefficient](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient) + +::: tip Note + +Please see the [caveats](#computation-covariance-caveats) associated +with this method of calculating correlation matrices in the +[covariance section](#computation-covariance). + +::: + +``` python +In [15]: frame = pd.DataFrame(np.random.randn(1000, 5), + ....: columns=['a', 'b', 'c', 'd', 'e']) + ....: + +In [16]: frame.iloc[::2] = np.nan + +# Series with Series +In [17]: frame['a'].corr(frame['b']) +Out[17]: 0.013479040400098794 + +In [18]: frame['a'].corr(frame['b'], method='spearman') +Out[18]: -0.007289885159540637 + +# Pairwise correlation of DataFrame columns +In [19]: frame.corr() +Out[19]: + a b c d e +a 1.000000 0.013479 -0.049269 -0.042239 -0.028525 +b 0.013479 1.000000 -0.020433 -0.011139 0.005654 +c -0.049269 -0.020433 1.000000 0.018587 -0.054269 +d -0.042239 -0.011139 0.018587 1.000000 -0.017060 +e -0.028525 0.005654 -0.054269 -0.017060 1.000000 +``` + +Note that non-numeric columns will be automatically excluded from the +correlation calculation. + +Like ``cov``, ``corr`` also supports the optional ``min_periods`` keyword: + +``` python +In [20]: frame = pd.DataFrame(np.random.randn(20, 3), columns=['a', 'b', 'c']) + +In [21]: frame.loc[frame.index[:5], 'a'] = np.nan + +In [22]: frame.loc[frame.index[5:10], 'b'] = np.nan + +In [23]: frame.corr() +Out[23]: + a b c +a 1.000000 -0.121111 0.069544 +b -0.121111 1.000000 0.051742 +c 0.069544 0.051742 1.000000 + +In [24]: frame.corr(min_periods=12) +Out[24]: + a b c +a 1.000000 NaN 0.069544 +b NaN 1.000000 0.051742 +c 0.069544 0.051742 1.000000 +``` + +*New in version 0.24.0.* + +The ``method`` argument can also be a callable for a generic correlation +calculation. In this case, it should be a single function +that produces a single value from two ndarray inputs. Suppose we wanted to +compute the correlation based on histogram intersection: + +``` python +# histogram intersection +In [25]: def histogram_intersection(a, b): + ....: return np.minimum(np.true_divide(a, a.sum()), + ....: np.true_divide(b, b.sum())).sum() + ....: + +In [26]: frame.corr(method=histogram_intersection) +Out[26]: + a b c +a 1.000000 -6.404882 -2.058431 +b -6.404882 1.000000 -19.255743 +c -2.058431 -19.255743 1.000000 +``` + +A related method [``corrwith()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corrwith.html#pandas.DataFrame.corrwith) is implemented on DataFrame to +compute the correlation between like-labeled Series contained in different +DataFrame objects. + +``` python +In [27]: index = ['a', 'b', 'c', 'd', 'e'] + +In [28]: columns = ['one', 'two', 'three', 'four'] + +In [29]: df1 = pd.DataFrame(np.random.randn(5, 4), index=index, columns=columns) + +In [30]: df2 = pd.DataFrame(np.random.randn(4, 4), index=index[:4], columns=columns) + +In [31]: df1.corrwith(df2) +Out[31]: +one -0.125501 +two -0.493244 +three 0.344056 +four 0.004183 +dtype: float64 + +In [32]: df2.corrwith(df1, axis=1) +Out[32]: +a -0.675817 +b 0.458296 +c 0.190809 +d -0.186275 +e NaN +dtype: float64 +``` + +### Data ranking + +The [``rank()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rank.html#pandas.Series.rank) method produces a data ranking with ties being +assigned the mean of the ranks (by default) for the group: + +``` python +In [33]: s = pd.Series(np.random.np.random.randn(5), index=list('abcde')) + +In [34]: s['d'] = s['b'] # so there's a tie + +In [35]: s.rank() +Out[35]: +a 5.0 +b 2.5 +c 1.0 +d 2.5 +e 4.0 +dtype: float64 +``` + +[``rank()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html#pandas.DataFrame.rank) is also a DataFrame method and can rank either the rows +(``axis=0``) or the columns (``axis=1``). ``NaN`` values are excluded from the +ranking. + +``` python +In [36]: df = pd.DataFrame(np.random.np.random.randn(10, 6)) + +In [37]: df[4] = df[2][:5] # some ties + +In [38]: df +Out[38]: + 0 1 2 3 4 5 +0 -0.904948 -1.163537 -1.457187 0.135463 -1.457187 0.294650 +1 -0.976288 -0.244652 -0.748406 -0.999601 -0.748406 -0.800809 +2 0.401965 1.460840 1.256057 1.308127 1.256057 0.876004 +3 0.205954 0.369552 -0.669304 0.038378 -0.669304 1.140296 +4 -0.477586 -0.730705 -1.129149 -0.601463 -1.129149 -0.211196 +5 -1.092970 -0.689246 0.908114 0.204848 NaN 0.463347 +6 0.376892 0.959292 0.095572 -0.593740 NaN -0.069180 +7 -1.002601 1.957794 -0.120708 0.094214 NaN -1.467422 +8 -0.547231 0.664402 -0.519424 -0.073254 NaN -1.263544 +9 -0.250277 -0.237428 -1.056443 0.419477 NaN 1.375064 + +In [39]: df.rank(1) +Out[39]: + 0 1 2 3 4 5 +0 4.0 3.0 1.5 5.0 1.5 6.0 +1 2.0 6.0 4.5 1.0 4.5 3.0 +2 1.0 6.0 3.5 5.0 3.5 2.0 +3 4.0 5.0 1.5 3.0 1.5 6.0 +4 5.0 3.0 1.5 4.0 1.5 6.0 +5 1.0 2.0 5.0 3.0 NaN 4.0 +6 4.0 5.0 3.0 1.0 NaN 2.0 +7 2.0 5.0 3.0 4.0 NaN 1.0 +8 2.0 5.0 3.0 4.0 NaN 1.0 +9 2.0 3.0 1.0 4.0 NaN 5.0 +``` + +``rank`` optionally takes a parameter ``ascending`` which by default is true; +when false, data is reverse-ranked, with larger values assigned a smaller rank. + +``rank`` supports different tie-breaking methods, specified with the ``method`` +parameter: + +- ``average`` : average rank of tied group +- ``min`` : lowest rank in the group +- ``max`` : highest rank in the group +- ``first`` : ranks assigned in the order they appear in the array + +## Window Functions + +For working with data, a number of window functions are provided for +computing common *window* or *rolling* statistics. Among these are count, sum, +mean, median, correlation, variance, covariance, standard deviation, skewness, +and kurtosis. + +The ``rolling()`` and ``expanding()`` +functions can be used directly from DataFrameGroupBy objects, +see the [groupby docs](groupby.html#groupby-transform-window-resample). + +::: tip Note + +The API for window statistics is quite similar to the way one works with ``GroupBy`` objects, see the documentation [here](groupby.html#groupby). + +::: + +We work with ``rolling``, ``expanding`` and ``exponentially weighted`` data through the corresponding +objects, ``Rolling``, ``Expanding`` and ``EWM``. + +``` python +In [40]: s = pd.Series(np.random.randn(1000), + ....: index=pd.date_range('1/1/2000', periods=1000)) + ....: + +In [41]: s = s.cumsum() + +In [42]: s +Out[42]: +2000-01-01 -0.268824 +2000-01-02 -1.771855 +2000-01-03 -0.818003 +2000-01-04 -0.659244 +2000-01-05 -1.942133 + ... +2002-09-22 -67.457323 +2002-09-23 -69.253182 +2002-09-24 -70.296818 +2002-09-25 -70.844674 +2002-09-26 -72.475016 +Freq: D, Length: 1000, dtype: float64 +``` + +These are created from methods on ``Series`` and ``DataFrame``. + +``` python +In [43]: r = s.rolling(window=60) + +In [44]: r +Out[44]: Rolling [window=60,center=False,axis=0] +``` + +These object provide tab-completion of the available methods and properties. + +``` python +In [14]: r. # noqa: E225, E999 +r.agg r.apply r.count r.exclusions r.max r.median r.name r.skew r.sum +r.aggregate r.corr r.cov r.kurt r.mean r.min r.quantile r.std r.var +``` + +Generally these methods all have the same interface. They all +accept the following arguments: + +- ``window``: size of moving window +- ``min_periods``: threshold of non-null data points to require (otherwise +result is NA) +- ``center``: boolean, whether to set the labels at the center (default is False) + +We can then call methods on these ``rolling`` objects. These return like-indexed objects: + +``` python +In [45]: r.mean() +Out[45]: +2000-01-01 NaN +2000-01-02 NaN +2000-01-03 NaN +2000-01-04 NaN +2000-01-05 NaN + ... +2002-09-22 -62.914971 +2002-09-23 -63.061867 +2002-09-24 -63.213876 +2002-09-25 -63.375074 +2002-09-26 -63.539734 +Freq: D, Length: 1000, dtype: float64 +``` + +``` python +In [46]: s.plot(style='k--') +Out[46]: + +In [47]: r.mean().plot(style='k') +Out[47]: +``` + +![rolling_mean_ex](https://static.pypandas.cn/public/static/images/rolling_mean_ex.png) + +They can also be applied to DataFrame objects. This is really just syntactic +sugar for applying the moving window operator to all of the DataFrame’s columns: + +``` python +In [48]: df = pd.DataFrame(np.random.randn(1000, 4), + ....: index=pd.date_range('1/1/2000', periods=1000), + ....: columns=['A', 'B', 'C', 'D']) + ....: + +In [49]: df = df.cumsum() + +In [50]: df.rolling(window=60).sum().plot(subplots=True) +Out[50]: +array([, + , + , + ], + dtype=object) +``` + +![rolling_mean_frame](https://static.pypandas.cn/public/static/images/rolling_mean_frame.png) + +### Method summary + +We provide a number of common statistical functions: + +Method | Description +---|--- +[count()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.count.html#pandas.core.window.Rolling.count) | Number of non-null observations +[sum()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.sum.html#pandas.core.window.Rolling.sum) | Sum of values +[mean()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.mean.html#pandas.core.window.Rolling.mean) | Mean of values +[median()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.median.html#pandas.core.window.Rolling.median) | Arithmetic median of values +[min()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.min.html#pandas.core.window.Rolling.min) | Minimum +[max()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.max.html#pandas.core.window.Rolling.max) | Maximum +[std()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.std.html#pandas.core.window.Rolling.std) | Bessel-corrected sample standard deviation +[var()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.var.html#pandas.core.window.Rolling.var) | Unbiased variance +[skew()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.skew.html#pandas.core.window.Rolling.skew) | Sample skewness (3rd moment) +[kurt()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.kurt.html#pandas.core.window.Rolling.kurt) | Sample kurtosis (4th moment) +[quantile()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.quantile.html#pandas.core.window.Rolling.quantile) | Sample quantile (value at %) +[apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.apply.html#pandas.core.window.Rolling.apply) | Generic apply +[cov()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.cov.html#pandas.core.window.Rolling.cov) | Unbiased covariance (binary) +[corr()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.corr.html#pandas.core.window.Rolling.corr) | Correlation (binary) + +The [``apply()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.apply.html#pandas.core.window.Rolling.apply) function takes an extra ``func`` argument and performs +generic rolling computations. The ``func`` argument should be a single function +that produces a single value from an ndarray input. Suppose we wanted to +compute the mean absolute deviation on a rolling basis: + +``` python +In [51]: def mad(x): + ....: return np.fabs(x - x.mean()).mean() + ....: + +In [52]: s.rolling(window=60).apply(mad, raw=True).plot(style='k') +Out[52]: +``` + +![rolling_apply_ex](https://static.pypandas.cn/public/static/images/rolling_apply_ex.png) + +### Rolling windows + +Passing ``win_type`` to ``.rolling`` generates a generic rolling window computation, that is weighted according the ``win_type``. +The following methods are available: + +Method | Description +---|--- +[sum()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Window.sum.html#pandas.core.window.Window.sum) | Sum of values +[mean()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Window.mean.html#pandas.core.window.Window.mean) | Mean of values + +The weights used in the window are specified by the ``win_type`` keyword. +The list of recognized types are the [scipy.signal window functions](https://docs.scipy.org/doc/scipy/reference/signal.html#window-functions): + +- ``boxcar`` +- ``triang`` +- ``blackman`` +- ``hamming`` +- ``bartlett`` +- ``parzen`` +- ``bohman`` +- ``blackmanharris`` +- ``nuttall`` +- ``barthann`` +- ``kaiser`` (needs beta) +- ``gaussian`` (needs std) +- ``general_gaussian`` (needs power, width) +- ``slepian`` (needs width) +- ``exponential`` (needs tau). + +``` python +In [53]: ser = pd.Series(np.random.randn(10), + ....: index=pd.date_range('1/1/2000', periods=10)) + ....: + +In [54]: ser.rolling(window=5, win_type='triang').mean() +Out[54]: +2000-01-01 NaN +2000-01-02 NaN +2000-01-03 NaN +2000-01-04 NaN +2000-01-05 -1.037870 +2000-01-06 -0.767705 +2000-01-07 -0.383197 +2000-01-08 -0.395513 +2000-01-09 -0.558440 +2000-01-10 -0.672416 +Freq: D, dtype: float64 +``` + +Note that the ``boxcar`` window is equivalent to [``mean()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.mean.html#pandas.core.window.Rolling.mean). + +``` python +In [55]: ser.rolling(window=5, win_type='boxcar').mean() +Out[55]: +2000-01-01 NaN +2000-01-02 NaN +2000-01-03 NaN +2000-01-04 NaN +2000-01-05 -0.841164 +2000-01-06 -0.779948 +2000-01-07 -0.565487 +2000-01-08 -0.502815 +2000-01-09 -0.553755 +2000-01-10 -0.472211 +Freq: D, dtype: float64 + +In [56]: ser.rolling(window=5).mean() +Out[56]: +2000-01-01 NaN +2000-01-02 NaN +2000-01-03 NaN +2000-01-04 NaN +2000-01-05 -0.841164 +2000-01-06 -0.779948 +2000-01-07 -0.565487 +2000-01-08 -0.502815 +2000-01-09 -0.553755 +2000-01-10 -0.472211 +Freq: D, dtype: float64 +``` + +For some windowing functions, additional parameters must be specified: + +``` python +In [57]: ser.rolling(window=5, win_type='gaussian').mean(std=0.1) +Out[57]: +2000-01-01 NaN +2000-01-02 NaN +2000-01-03 NaN +2000-01-04 NaN +2000-01-05 -1.309989 +2000-01-06 -1.153000 +2000-01-07 0.606382 +2000-01-08 -0.681101 +2000-01-09 -0.289724 +2000-01-10 -0.996632 +Freq: D, dtype: float64 +``` + +::: tip Note + +For ``.sum()`` with a ``win_type``, there is no normalization done to the +weights for the window. Passing custom weights of ``[1, 1, 1]`` will yield a different +result than passing weights of ``[2, 2, 2]``, for example. When passing a +``win_type`` instead of explicitly specifying the weights, the weights are +already normalized so that the largest weight is 1. + +In contrast, the nature of the ``.mean()`` calculation is +such that the weights are normalized with respect to each other. Weights +of ``[1, 1, 1]`` and ``[2, 2, 2]`` yield the same result. + +::: + +### Time-aware rolling + +*New in version 0.19.0.* + +New in version 0.19.0 are the ability to pass an offset (or convertible) to a ``.rolling()`` method and have it produce +variable sized windows based on the passed time window. For each time point, this includes all preceding values occurring +within the indicated time delta. + +This can be particularly useful for a non-regular time frequency index. + +``` python +In [58]: dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}, + ....: index=pd.date_range('20130101 09:00:00', + ....: periods=5, + ....: freq='s')) + ....: + +In [59]: dft +Out[59]: + B +2013-01-01 09:00:00 0.0 +2013-01-01 09:00:01 1.0 +2013-01-01 09:00:02 2.0 +2013-01-01 09:00:03 NaN +2013-01-01 09:00:04 4.0 +``` + +This is a regular frequency index. Using an integer window parameter works to roll along the window frequency. + +``` python +In [60]: dft.rolling(2).sum() +Out[60]: + B +2013-01-01 09:00:00 NaN +2013-01-01 09:00:01 1.0 +2013-01-01 09:00:02 3.0 +2013-01-01 09:00:03 NaN +2013-01-01 09:00:04 NaN + +In [61]: dft.rolling(2, min_periods=1).sum() +Out[61]: + B +2013-01-01 09:00:00 0.0 +2013-01-01 09:00:01 1.0 +2013-01-01 09:00:02 3.0 +2013-01-01 09:00:03 2.0 +2013-01-01 09:00:04 4.0 +``` + +Specifying an offset allows a more intuitive specification of the rolling frequency. + +``` python +In [62]: dft.rolling('2s').sum() +Out[62]: + B +2013-01-01 09:00:00 0.0 +2013-01-01 09:00:01 1.0 +2013-01-01 09:00:02 3.0 +2013-01-01 09:00:03 2.0 +2013-01-01 09:00:04 4.0 +``` + +Using a non-regular, but still monotonic index, rolling with an integer window does not impart any special calculation. + +``` python +In [63]: dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}, + ....: index=pd.Index([pd.Timestamp('20130101 09:00:00'), + ....: pd.Timestamp('20130101 09:00:02'), + ....: pd.Timestamp('20130101 09:00:03'), + ....: pd.Timestamp('20130101 09:00:05'), + ....: pd.Timestamp('20130101 09:00:06')], + ....: name='foo')) + ....: + +In [64]: dft +Out[64]: + B +foo +2013-01-01 09:00:00 0.0 +2013-01-01 09:00:02 1.0 +2013-01-01 09:00:03 2.0 +2013-01-01 09:00:05 NaN +2013-01-01 09:00:06 4.0 + +In [65]: dft.rolling(2).sum() +Out[65]: + B +foo +2013-01-01 09:00:00 NaN +2013-01-01 09:00:02 1.0 +2013-01-01 09:00:03 3.0 +2013-01-01 09:00:05 NaN +2013-01-01 09:00:06 NaN +``` + +Using the time-specification generates variable windows for this sparse data. + +``` python +In [66]: dft.rolling('2s').sum() +Out[66]: + B +foo +2013-01-01 09:00:00 0.0 +2013-01-01 09:00:02 1.0 +2013-01-01 09:00:03 3.0 +2013-01-01 09:00:05 NaN +2013-01-01 09:00:06 4.0 +``` + +Furthermore, we now allow an optional ``on`` parameter to specify a column (rather than the +default of the index) in a DataFrame. + +``` python +In [67]: dft = dft.reset_index() + +In [68]: dft +Out[68]: + foo B +0 2013-01-01 09:00:00 0.0 +1 2013-01-01 09:00:02 1.0 +2 2013-01-01 09:00:03 2.0 +3 2013-01-01 09:00:05 NaN +4 2013-01-01 09:00:06 4.0 + +In [69]: dft.rolling('2s', on='foo').sum() +Out[69]: + foo B +0 2013-01-01 09:00:00 0.0 +1 2013-01-01 09:00:02 1.0 +2 2013-01-01 09:00:03 3.0 +3 2013-01-01 09:00:05 NaN +4 2013-01-01 09:00:06 4.0 +``` + +### Rolling window endpoints + +*New in version 0.20.0.* + +The inclusion of the interval endpoints in rolling window calculations can be specified with the ``closed`` +parameter: + +closed | Description | Default for +---|---|--- +right | close right endpoint | time-based windows +left | close left endpoint |   +both | close both endpoints | fixed windows +neither | open endpoints |   + +For example, having the right endpoint open is useful in many problems that require that there is no contamination +from present information back to past information. This allows the rolling window to compute statistics +“up to that point in time”, but not including that point in time. + +``` python +In [70]: df = pd.DataFrame({'x': 1}, + ....: index=[pd.Timestamp('20130101 09:00:01'), + ....: pd.Timestamp('20130101 09:00:02'), + ....: pd.Timestamp('20130101 09:00:03'), + ....: pd.Timestamp('20130101 09:00:04'), + ....: pd.Timestamp('20130101 09:00:06')]) + ....: + +In [71]: df["right"] = df.rolling('2s', closed='right').x.sum() # default + +In [72]: df["both"] = df.rolling('2s', closed='both').x.sum() + +In [73]: df["left"] = df.rolling('2s', closed='left').x.sum() + +In [74]: df["neither"] = df.rolling('2s', closed='neither').x.sum() + +In [75]: df +Out[75]: + x right both left neither +2013-01-01 09:00:01 1 1.0 1.0 NaN NaN +2013-01-01 09:00:02 1 2.0 2.0 1.0 1.0 +2013-01-01 09:00:03 1 2.0 3.0 2.0 1.0 +2013-01-01 09:00:04 1 2.0 3.0 2.0 1.0 +2013-01-01 09:00:06 1 1.0 2.0 1.0 NaN +``` + +Currently, this feature is only implemented for time-based windows. +For fixed windows, the closed parameter cannot be set and the rolling window will always have both endpoints closed. + +### Time-aware rolling vs. resampling + +Using ``.rolling()`` with a time-based index is quite similar to [resampling](timeseries.html#timeseries-resampling). They +both operate and perform reductive operations on time-indexed pandas objects. + +When using ``.rolling()`` with an offset. The offset is a time-delta. Take a backwards-in-time looking window, and +aggregate all of the values in that window (including the end-point, but not the start-point). This is the new value +at that point in the result. These are variable sized windows in time-space for each point of the input. You will get +a same sized result as the input. + +When using ``.resample()`` with an offset. Construct a new index that is the frequency of the offset. For each frequency +bin, aggregate points from the input within a backwards-in-time looking window that fall in that bin. The result of this +aggregation is the output for that frequency point. The windows are fixed size in the frequency space. Your result +will have the shape of a regular frequency between the min and the max of the original input object. + +To summarize, ``.rolling()`` is a time-based window operation, while ``.resample()`` is a frequency-based window operation. + +### Centering windows + +By default the labels are set to the right edge of the window, but a +``center`` keyword is available so the labels can be set at the center. + +``` python +In [76]: ser.rolling(window=5).mean() +Out[76]: +2000-01-01 NaN +2000-01-02 NaN +2000-01-03 NaN +2000-01-04 NaN +2000-01-05 -0.841164 +2000-01-06 -0.779948 +2000-01-07 -0.565487 +2000-01-08 -0.502815 +2000-01-09 -0.553755 +2000-01-10 -0.472211 +Freq: D, dtype: float64 + +In [77]: ser.rolling(window=5, center=True).mean() +Out[77]: +2000-01-01 NaN +2000-01-02 NaN +2000-01-03 -0.841164 +2000-01-04 -0.779948 +2000-01-05 -0.565487 +2000-01-06 -0.502815 +2000-01-07 -0.553755 +2000-01-08 -0.472211 +2000-01-09 NaN +2000-01-10 NaN +Freq: D, dtype: float64 +``` + +### Binary window functions + +[``cov()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.cov.html#pandas.core.window.Rolling.cov) and [``corr()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.corr.html#pandas.core.window.Rolling.corr) can compute moving window statistics about +two ``Series`` or any combination of ``DataFrame/Series`` or +``DataFrame/DataFrame``. Here is the behavior in each case: + +- two ``Series``: compute the statistic for the pairing. +- ``DataFrame/Series``: compute the statistics for each column of the DataFrame +with the passed Series, thus returning a DataFrame. +- ``DataFrame/DataFrame``: by default compute the statistic for matching column +names, returning a DataFrame. If the keyword argument ``pairwise=True`` is +passed then computes the statistic for each pair of columns, returning a +``MultiIndexed DataFrame`` whose ``index`` are the dates in question (see [the next section](#stats-moments-corr-pairwise)). + +For example: + +``` python +In [78]: df = pd.DataFrame(np.random.randn(1000, 4), + ....: index=pd.date_range('1/1/2000', periods=1000), + ....: columns=['A', 'B', 'C', 'D']) + ....: + +In [79]: df = df.cumsum() + +In [80]: df2 = df[:20] + +In [81]: df2.rolling(window=5).corr(df2['B']) +Out[81]: + A B C D +2000-01-01 NaN NaN NaN NaN +2000-01-02 NaN NaN NaN NaN +2000-01-03 NaN NaN NaN NaN +2000-01-04 NaN NaN NaN NaN +2000-01-05 0.768775 1.0 -0.977990 0.800252 +... ... ... ... ... +2000-01-16 0.691078 1.0 0.807450 -0.939302 +2000-01-17 0.274506 1.0 0.582601 -0.902954 +2000-01-18 0.330459 1.0 0.515707 -0.545268 +2000-01-19 0.046756 1.0 -0.104334 -0.419799 +2000-01-20 -0.328241 1.0 -0.650974 -0.777777 + +[20 rows x 4 columns] +``` + +### Computing rolling pairwise covariances and correlations + +In financial data analysis and other fields it’s common to compute covariance +and correlation matrices for a collection of time series. Often one is also +interested in moving-window covariance and correlation matrices. This can be +done by passing the ``pairwise`` keyword argument, which in the case of +``DataFrame`` inputs will yield a MultiIndexed ``DataFrame`` whose ``index`` are the dates in +question. In the case of a single DataFrame argument the ``pairwise`` argument +can even be omitted: + +::: tip Note + +Missing values are ignored and each entry is computed using the pairwise +complete observations. Please see the [covariance section](#computation-covariance) for [caveats](#computation-covariance-caveats) associated with this method of +calculating covariance and correlation matrices. + +::: + +``` python +In [82]: covs = (df[['B', 'C', 'D']].rolling(window=50) + ....: .cov(df[['A', 'B', 'C']], pairwise=True)) + ....: + +In [83]: covs.loc['2002-09-22':] +Out[83]: + B C D +2002-09-22 A 1.367467 8.676734 -8.047366 + B 3.067315 0.865946 -1.052533 + C 0.865946 7.739761 -4.943924 +2002-09-23 A 0.910343 8.669065 -8.443062 + B 2.625456 0.565152 -0.907654 + C 0.565152 7.825521 -5.367526 +2002-09-24 A 0.463332 8.514509 -8.776514 + B 2.306695 0.267746 -0.732186 + C 0.267746 7.771425 -5.696962 +2002-09-25 A 0.467976 8.198236 -9.162599 + B 2.307129 0.267287 -0.754080 + C 0.267287 7.466559 -5.822650 +2002-09-26 A 0.545781 7.899084 -9.326238 + B 2.311058 0.322295 -0.844451 + C 0.322295 7.038237 -5.684445 +``` + +``` python +In [84]: correls = df.rolling(window=50).corr() + +In [85]: correls.loc['2002-09-22':] +Out[85]: + A B C D +2002-09-22 A 1.000000 0.186397 0.744551 -0.769767 + B 0.186397 1.000000 0.177725 -0.240802 + C 0.744551 0.177725 1.000000 -0.712051 + D -0.769767 -0.240802 -0.712051 1.000000 +2002-09-23 A 1.000000 0.134723 0.743113 -0.758758 +... ... ... ... ... +2002-09-25 D -0.739160 -0.164179 -0.704686 1.000000 +2002-09-26 A 1.000000 0.087756 0.727792 -0.736562 + B 0.087756 1.000000 0.079913 -0.179477 + C 0.727792 0.079913 1.000000 -0.692303 + D -0.736562 -0.179477 -0.692303 1.000000 + +[20 rows x 4 columns] +``` + +You can efficiently retrieve the time series of correlations between two +columns by reshaping and indexing: + +``` python +In [86]: correls.unstack(1)[('A', 'C')].plot() +Out[86]: +``` + +![rolling_corr_pairwise_ex](https://static.pypandas.cn/public/static/images/rolling_corr_pairwise_ex.png) + +## Aggregation + +Once the ``Rolling``, ``Expanding`` or ``EWM`` objects have been created, several methods are available to +perform multiple computations on the data. These operations are similar to the [aggregating API](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-aggregate), +[groupby API](groupby.html#groupby-aggregate), and [resample API](timeseries.html#timeseries-aggregate). + +``` python +In [87]: dfa = pd.DataFrame(np.random.randn(1000, 3), + ....: index=pd.date_range('1/1/2000', periods=1000), + ....: columns=['A', 'B', 'C']) + ....: + +In [88]: r = dfa.rolling(window=60, min_periods=1) + +In [89]: r +Out[89]: Rolling [window=60,min_periods=1,center=False,axis=0] +``` + +We can aggregate by passing a function to the entire DataFrame, or select a +Series (or multiple Series) via standard ``__getitem__``. + +``` python +In [90]: r.aggregate(np.sum) +Out[90]: + A B C +2000-01-01 -0.289838 -0.370545 -1.284206 +2000-01-02 -0.216612 -1.675528 -1.169415 +2000-01-03 1.154661 -1.634017 -1.566620 +2000-01-04 2.969393 -4.003274 -1.816179 +2000-01-05 4.690630 -4.682017 -2.717209 +... ... ... ... +2002-09-22 2.860036 -9.270337 6.415245 +2002-09-23 3.510163 -8.151439 5.177219 +2002-09-24 6.524983 -10.168078 5.792639 +2002-09-25 6.409626 -9.956226 5.704050 +2002-09-26 5.093787 -7.074515 6.905823 + +[1000 rows x 3 columns] + +In [91]: r['A'].aggregate(np.sum) +Out[91]: +2000-01-01 -0.289838 +2000-01-02 -0.216612 +2000-01-03 1.154661 +2000-01-04 2.969393 +2000-01-05 4.690630 + ... +2002-09-22 2.860036 +2002-09-23 3.510163 +2002-09-24 6.524983 +2002-09-25 6.409626 +2002-09-26 5.093787 +Freq: D, Name: A, Length: 1000, dtype: float64 + +In [92]: r[['A', 'B']].aggregate(np.sum) +Out[92]: + A B +2000-01-01 -0.289838 -0.370545 +2000-01-02 -0.216612 -1.675528 +2000-01-03 1.154661 -1.634017 +2000-01-04 2.969393 -4.003274 +2000-01-05 4.690630 -4.682017 +... ... ... +2002-09-22 2.860036 -9.270337 +2002-09-23 3.510163 -8.151439 +2002-09-24 6.524983 -10.168078 +2002-09-25 6.409626 -9.956226 +2002-09-26 5.093787 -7.074515 + +[1000 rows x 2 columns] +``` + +As you can see, the result of the aggregation will have the selected columns, or all +columns if none are selected. + +### Applying multiple functions + +With windowed ``Series`` you can also pass a list of functions to do +aggregation with, outputting a DataFrame: + +``` python +In [93]: r['A'].agg([np.sum, np.mean, np.std]) +Out[93]: + sum mean std +2000-01-01 -0.289838 -0.289838 NaN +2000-01-02 -0.216612 -0.108306 0.256725 +2000-01-03 1.154661 0.384887 0.873311 +2000-01-04 2.969393 0.742348 1.009734 +2000-01-05 4.690630 0.938126 0.977914 +... ... ... ... +2002-09-22 2.860036 0.047667 1.132051 +2002-09-23 3.510163 0.058503 1.134296 +2002-09-24 6.524983 0.108750 1.144204 +2002-09-25 6.409626 0.106827 1.142913 +2002-09-26 5.093787 0.084896 1.151416 + +[1000 rows x 3 columns] +``` + +On a windowed DataFrame, you can pass a list of functions to apply to each +column, which produces an aggregated result with a hierarchical index: + +``` python +In [94]: r.agg([np.sum, np.mean]) +Out[94]: + A B C + sum mean sum mean sum mean +2000-01-01 -0.289838 -0.289838 -0.370545 -0.370545 -1.284206 -1.284206 +2000-01-02 -0.216612 -0.108306 -1.675528 -0.837764 -1.169415 -0.584708 +2000-01-03 1.154661 0.384887 -1.634017 -0.544672 -1.566620 -0.522207 +2000-01-04 2.969393 0.742348 -4.003274 -1.000819 -1.816179 -0.454045 +2000-01-05 4.690630 0.938126 -4.682017 -0.936403 -2.717209 -0.543442 +... ... ... ... ... ... ... +2002-09-22 2.860036 0.047667 -9.270337 -0.154506 6.415245 0.106921 +2002-09-23 3.510163 0.058503 -8.151439 -0.135857 5.177219 0.086287 +2002-09-24 6.524983 0.108750 -10.168078 -0.169468 5.792639 0.096544 +2002-09-25 6.409626 0.106827 -9.956226 -0.165937 5.704050 0.095068 +2002-09-26 5.093787 0.084896 -7.074515 -0.117909 6.905823 0.115097 + +[1000 rows x 6 columns] +``` + +Passing a dict of functions has different behavior by default, see the next +section. + +### Applying different functions to DataFrame columns + +By passing a dict to ``aggregate`` you can apply a different aggregation to the +columns of a ``DataFrame``: + +``` python +In [95]: r.agg({'A': np.sum, 'B': lambda x: np.std(x, ddof=1)}) +Out[95]: + A B +2000-01-01 -0.289838 NaN +2000-01-02 -0.216612 0.660747 +2000-01-03 1.154661 0.689929 +2000-01-04 2.969393 1.072199 +2000-01-05 4.690630 0.939657 +... ... ... +2002-09-22 2.860036 1.113208 +2002-09-23 3.510163 1.132381 +2002-09-24 6.524983 1.080963 +2002-09-25 6.409626 1.082911 +2002-09-26 5.093787 1.136199 + +[1000 rows x 2 columns] +``` + +The function names can also be strings. In order for a string to be valid it +must be implemented on the windowed object + +``` python +In [96]: r.agg({'A': 'sum', 'B': 'std'}) +Out[96]: + A B +2000-01-01 -0.289838 NaN +2000-01-02 -0.216612 0.660747 +2000-01-03 1.154661 0.689929 +2000-01-04 2.969393 1.072199 +2000-01-05 4.690630 0.939657 +... ... ... +2002-09-22 2.860036 1.113208 +2002-09-23 3.510163 1.132381 +2002-09-24 6.524983 1.080963 +2002-09-25 6.409626 1.082911 +2002-09-26 5.093787 1.136199 + +[1000 rows x 2 columns] +``` + +Furthermore you can pass a nested dict to indicate different aggregations on different columns. + +``` python +In [97]: r.agg({'A': ['sum', 'std'], 'B': ['mean', 'std']}) +Out[97]: + A B + sum std mean std +2000-01-01 -0.289838 NaN -0.370545 NaN +2000-01-02 -0.216612 0.256725 -0.837764 0.660747 +2000-01-03 1.154661 0.873311 -0.544672 0.689929 +2000-01-04 2.969393 1.009734 -1.000819 1.072199 +2000-01-05 4.690630 0.977914 -0.936403 0.939657 +... ... ... ... ... +2002-09-22 2.860036 1.132051 -0.154506 1.113208 +2002-09-23 3.510163 1.134296 -0.135857 1.132381 +2002-09-24 6.524983 1.144204 -0.169468 1.080963 +2002-09-25 6.409626 1.142913 -0.165937 1.082911 +2002-09-26 5.093787 1.151416 -0.117909 1.136199 + +[1000 rows x 4 columns] +``` + +## Expanding windows + +A common alternative to rolling statistics is to use an *expanding* window, +which yields the value of the statistic with all the data available up to that +point in time. + +These follow a similar interface to ``.rolling``, with the ``.expanding`` method +returning an ``Expanding`` object. + +As these calculations are a special case of rolling statistics, +they are implemented in pandas such that the following two calls are equivalent: + +``` python +In [98]: df.rolling(window=len(df), min_periods=1).mean()[:5] +Out[98]: + A B C D +2000-01-01 0.314226 -0.001675 0.071823 0.892566 +2000-01-02 0.654522 -0.171495 0.179278 0.853361 +2000-01-03 0.708733 -0.064489 -0.238271 1.371111 +2000-01-04 0.987613 0.163472 -0.919693 1.566485 +2000-01-05 1.426971 0.288267 -1.358877 1.808650 + +In [99]: df.expanding(min_periods=1).mean()[:5] +Out[99]: + A B C D +2000-01-01 0.314226 -0.001675 0.071823 0.892566 +2000-01-02 0.654522 -0.171495 0.179278 0.853361 +2000-01-03 0.708733 -0.064489 -0.238271 1.371111 +2000-01-04 0.987613 0.163472 -0.919693 1.566485 +2000-01-05 1.426971 0.288267 -1.358877 1.808650 +``` + +These have a similar set of methods to ``.rolling`` methods. + +### Method summary + +Function | Description +---|--- +[count()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.count.html#pandas.core.window.Expanding.count) | Number of non-null observations +[sum()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.sum.html#pandas.core.window.Expanding.sum) | Sum of values +[mean()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.mean.html#pandas.core.window.Expanding.mean) | Mean of values +[median()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.median.html#pandas.core.window.Expanding.median) | Arithmetic median of values +[min()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.min.html#pandas.core.window.Expanding.min) | Minimum +[max()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.max.html#pandas.core.window.Expanding.max) | Maximum +[std()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.std.html#pandas.core.window.Expanding.std) | Unbiased standard deviation +[var()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.var.html#pandas.core.window.Expanding.var) | Unbiased variance +[skew()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.skew.html#pandas.core.window.Expanding.skew) | Unbiased skewness (3rd moment) +[kurt()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.kurt.html#pandas.core.window.Expanding.kurt) | Unbiased kurtosis (4th moment) +[quantile()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.quantile.html#pandas.core.window.Expanding.quantile) | Sample quantile (value at %) +[apply()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.apply.html#pandas.core.window.Expanding.apply) | Generic apply +[cov()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.cov.html#pandas.core.window.Expanding.cov) | Unbiased covariance (binary) +[corr()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.corr.html#pandas.core.window.Expanding.corr) | Correlation (binary) + +Aside from not having a ``window`` parameter, these functions have the same +interfaces as their ``.rolling`` counterparts. Like above, the parameters they +all accept are: + +- ``min_periods``: threshold of non-null data points to require. Defaults to +minimum needed to compute statistic. No ``NaNs`` will be output once +``min_periods`` non-null data points have been seen. +- ``center``: boolean, whether to set the labels at the center (default is False). + +::: tip Note + +The output of the ``.rolling`` and ``.expanding`` methods do not return a +``NaN`` if there are at least ``min_periods`` non-null values in the current +window. For example: + +``` python +In [100]: sn = pd.Series([1, 2, np.nan, 3, np.nan, 4]) + +In [101]: sn +Out[101]: +0 1.0 +1 2.0 +2 NaN +3 3.0 +4 NaN +5 4.0 +dtype: float64 + +In [102]: sn.rolling(2).max() +Out[102]: +0 NaN +1 2.0 +2 NaN +3 NaN +4 NaN +5 NaN +dtype: float64 + +In [103]: sn.rolling(2, min_periods=1).max() +Out[103]: +0 1.0 +1 2.0 +2 2.0 +3 3.0 +4 3.0 +5 4.0 +dtype: float64 +``` + +In case of expanding functions, this differs from [``cumsum()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html#pandas.DataFrame.cumsum), +[``cumprod()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumprod.html#pandas.DataFrame.cumprod), [``cummax()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cummax.html#pandas.DataFrame.cummax), +and [``cummin()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cummin.html#pandas.DataFrame.cummin), which return ``NaN`` in the output wherever +a ``NaN`` is encountered in the input. In order to match the output of ``cumsum`` +with ``expanding``, use [``fillna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna): + +``` python +In [104]: sn.expanding().sum() +Out[104]: +0 1.0 +1 3.0 +2 3.0 +3 6.0 +4 6.0 +5 10.0 +dtype: float64 + +In [105]: sn.cumsum() +Out[105]: +0 1.0 +1 3.0 +2 NaN +3 6.0 +4 NaN +5 10.0 +dtype: float64 + +In [106]: sn.cumsum().fillna(method='ffill') +Out[106]: +0 1.0 +1 3.0 +2 3.0 +3 6.0 +4 6.0 +5 10.0 +dtype: float64 +``` + +::: + +An expanding window statistic will be more stable (and less responsive) than +its rolling window counterpart as the increasing window size decreases the +relative impact of an individual data point. As an example, here is the +[``mean()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Expanding.mean.html#pandas.core.window.Expanding.mean) output for the previous time series dataset: + +``` python +In [107]: s.plot(style='k--') +Out[107]: + +In [108]: s.expanding().mean().plot(style='k') +Out[108]: +``` + +![expanding_mean_frame](https://static.pypandas.cn/public/static/images/expanding_mean_frame.png) + +## Exponentially weighted windows + +A related set of functions are exponentially weighted versions of several of +the above statistics. A similar interface to ``.rolling`` and ``.expanding`` is accessed +through the ``.ewm`` method to receive an ``EWM`` object. +A number of expanding EW (exponentially weighted) +methods are provided: + +Function | Description +---|--- +[mean()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.EWM.mean.html#pandas.core.window.EWM.mean) | EW moving average +[var()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.EWM.var.html#pandas.core.window.EWM.var) | EW moving variance +[std()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.EWM.std.html#pandas.core.window.EWM.std) | EW moving standard deviation +[corr()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.EWM.corr.html#pandas.core.window.EWM.corr) | EW moving correlation +[cov()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.EWM.cov.html#pandas.core.window.EWM.cov) | EW moving covariance + +In general, a weighted moving average is calculated as + +
+\[y_t = \frac{\sum_{i=0}^t w_i x_{t-i}}{\sum_{i=0}^t w_i},\] +
+ +where \\\(x_t\\\) is the input, \\(y_t\\) is the result and the \\(w_i\\) +are the weights. + +The EW functions support two variants of exponential weights. +The default, ``adjust=True``, uses the weights \\(w_i = (1 - \alpha)^i\\) +which gives + +
+\[y_t = \frac{x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ... ++ (1 - \alpha)^t x_{0}}{1 + (1 - \alpha) + (1 - \alpha)^2 + ... ++ (1 - \alpha)^t}\] +
+ +When ``adjust=False`` is specified, moving averages are calculated as + +
+\[\begin{split}y_0 &= x_0 \\ +y_t &= (1 - \alpha) y_{t-1} + \alpha x_t,\end{split}\] +
+ +which is equivalent to using weights + +
+\[\begin{split}w_i = \begin{cases} + \alpha (1 - \alpha)^i & \text{if } i < t \\ + (1 - \alpha)^i & \text{if } i = t. +\end{cases}\end{split}\] +
+ +::: tip Note + +These equations are sometimes written in terms of \\(\alpha' = 1 - \alpha\\), e.g. + +
+\[y_t = \alpha' y_{t-1} + (1 - \alpha') x_t.\] +
+ +::: + +The difference between the above two variants arises because we are +dealing with series which have finite history. Consider a series of infinite +history, with ``adjust=True``: + +
+\[y_t = \alpha' y_{t-1} + (1 - \alpha') x_t.\] +
+ +Noting that the denominator is a geometric series with initial term equal to 1 +and a ratio of \\(1 - \alpha\\) we have + +
+\[\begin{split}y_t &= \frac{x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ...} +{\frac{1}{1 - (1 - \alpha)}}\\ +&= [x_t + (1 - \alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ...] \alpha \\ +&= \alpha x_t + [(1-\alpha)x_{t-1} + (1 - \alpha)^2 x_{t-2} + ...]\alpha \\ +&= \alpha x_t + (1 - \alpha)[x_{t-1} + (1 - \alpha) x_{t-2} + ...]\alpha\\ +&= \alpha x_t + (1 - \alpha) y_{t-1}\end{split}\] +
+ +which is the same expression as ``adjust=False`` above and therefore +shows the equivalence of the two variants for infinite series. +When ``adjust=False``, we have \\(y_0 = x_0\\) and +\\(y_t = \alpha x_t + (1 - \alpha) y_{t-1}\\). +Therefore, there is an assumption that \\(x_0\\) is not an ordinary value +but rather an exponentially weighted moment of the infinite series up to that +point. + +One must have \\(0 < \alpha \leq 1\\), and while since version 0.18.0 +it has been possible to pass \\(\alpha\\) directly, it’s often easier +to think about either the **span**, **center of mass (com)** or **half-life** +of an EW moment: + +
+\[\begin{split}\alpha = + \begin{cases} + \frac{2}{s + 1}, & \text{for span}\ s \geq 1\\ + \frac{1}{1 + c}, & \text{for center of mass}\ c \geq 0\\ + 1 - \exp^{\frac{\log 0.5}{h}}, & \text{for half-life}\ h > 0 + \end{cases}\end{split}\] +
+ +One must specify precisely one of **span**, **center of mass**, **half-life** +and **alpha** to the EW functions: + +- **Span** corresponds to what is commonly called an “N-day EW moving average”. +- **Center of mass** has a more physical interpretation and can be thought of +in terms of span: \\(c = (s - 1) / 2\\). +- **Half-life** is the period of time for the exponential weight to reduce to +one half. +- **Alpha** specifies the smoothing factor directly. + +Here is an example for a univariate time series: + +``` python +In [109]: s.plot(style='k--') +Out[109]: + +In [110]: s.ewm(span=20).mean().plot(style='k') +Out[110]: +``` + +![ewma_ex](https://static.pypandas.cn/public/static/images/ewma_ex.png) + +EWM has a ``min_periods`` argument, which has the same +meaning it does for all the ``.expanding`` and ``.rolling`` methods: +no output values will be set until at least ``min_periods`` non-null values +are encountered in the (expanding) window. + +EWM also has an ``ignore_na`` argument, which determines how +intermediate null values affect the calculation of the weights. +When ``ignore_na=False`` (the default), weights are calculated based on absolute +positions, so that intermediate null values affect the result. +When ``ignore_na=True``, +weights are calculated by ignoring intermediate null values. +For example, assuming ``adjust=True``, if ``ignore_na=False``, the weighted +average of ``3, NaN, 5`` would be calculated as + +
+\[\frac{(1-\alpha)^2 \cdot 3 + 1 \cdot 5}{(1-\alpha)^2 + 1}.\] +
+ +Whereas if ``ignore_na=True``, the weighted average would be calculated as + +
+\[\frac{(1-\alpha) \cdot 3 + 1 \cdot 5}{(1-\alpha) + 1}.\] +
+ +The ``var()``, ``std()``, and ``cov()`` functions have a ``bias`` argument, +specifying whether the result should contain biased or unbiased statistics. +For example, if ``bias=True``, ``ewmvar(x)`` is calculated as +``ewmvar(x) = ewma(x**2) - ewma(x)**2``; +whereas if ``bias=False`` (the default), the biased variance statistics +are scaled by debiasing factors + +
+\[\frac{\left(\sum_{i=0}^t w_i\right)^2}{\left(\sum_{i=0}^t w_i\right)^2 - \sum_{i=0}^t w_i^2}.\] +
+ +(For \\(w_i = 1\\), this reduces to the usual \\(N / (N - 1)\\) factor, +with \\(N = t + 1\\).) +See [Weighted Sample Variance](http://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance) +on Wikipedia for further details. diff --git a/Python/pandas/user_guide/cookbook.md b/Python/pandas/user_guide/cookbook.md new file mode 100644 index 00000000..077a8902 --- /dev/null +++ b/Python/pandas/user_guide/cookbook.md @@ -0,0 +1,2050 @@ +# 烹饪指南 + +本节列出了一些**短小精悍**的 Pandas 实例与链接。 + +我们希望 Pandas 用户能积极踊跃地为本文档添加更多内容。为本节添加实用示例的链接或代码,是 Pandas 用户提交第一个 **Pull Request** 最好的选择。 + +本节列出了简单、精练、易上手的实例代码,以及 Stack Overflow 或 GitHub 上的链接,这些链接包含实例代码的更多详情。 + +`pd` 与 `np` 是 Pandas 与 Numpy 的缩写。为了让新手易于理解,其它模块是显式导入的。 + +下列实例均为 Python 3 代码,简单修改即可用于 Python 早期版本。 + +## 惯用语 + +以下是 Pandas 的`惯用语`。 + +[对一列数据执行 if-then / if-then-else 操作,把计算结果赋值给一列或多列:](https://stackoverflow.com/questions/17128302/python-pandas-idiom-for-if-then-else) + +```python +In [1]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ...: 'BBB': [10, 20, 30, 40], + ...: 'CCC': [100, 50, -30, -50]}) + ...: + +In [2]: df +Out[2]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 +``` + +### if-then… + +在一列上执行 if-then 操作: + +```python +In [3]: df.loc[df.AAA >= 5, 'BBB'] = -1 + +In [4]: df +Out[4]: + AAA BBB CCC +0 4 10 100 +1 5 -1 50 +2 6 -1 -30 +3 7 -1 -50 +``` + +在两列上执行 if-then 操作: + +```python +In [5]: df.loc[df.AAA >= 5, ['BBB', 'CCC']] = 555 + +In [6]: df +Out[6]: + AAA BBB CCC +0 4 10 100 +1 5 555 555 +2 6 555 555 +3 7 555 555 +``` + +再添加一行代码,执行 -else 操作: + +```python +In [7]: df.loc[df.AAA < 5, ['BBB', 'CCC']] = 2000 + +In [8]: df +Out[8]: + AAA BBB CCC +0 4 2000 2000 +1 5 555 555 +2 6 555 555 +3 7 555 555 +``` + +或用 Pandas 的 `where` 设置掩码(mask): + +```python +In [9]: df_mask = pd.DataFrame({'AAA': [True] * 4, + ...: 'BBB': [False] * 4, + ...: 'CCC': [True, False] * 2}) + ...: + +In [10]: df.where(df_mask, -1000) +Out[10]: + AAA BBB CCC +0 4 -1000 2000 +1 5 -1000 -1000 +2 6 -1000 555 +3 7 -1000 -1000 +``` + +[用 NumPy where() 函数实现 if-then-else](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) + +```python +In [11]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}) + ....: + +In [12]: df +Out[12]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 + +In [13]: df['logic'] = np.where(df['AAA'] > 5, 'high', 'low') + +In [14]: df +Out[14]: + AAA BBB CCC logic +0 4 10 100 low +1 5 20 50 low +2 6 30 -30 high +3 7 40 -50 high +``` + +### 切割 + +[用布尔条件切割 DataFrame](https://stackoverflow.com/questions/14957116/how-to-split-a-dataframe-according-to-a-boolean-criterion) + +```python +In [15]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}) + ....: + +In [16]: df +Out[16]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 + +In [17]: df[df.AAA <= 5] +Out[17]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 + +In [18]: df[df.AAA > 5] +Out[18]: + AAA BBB CCC +2 6 30 -30 +3 7 40 -50 +``` + +### 设置条件 + +[多列条件选择](https://stackoverflow.com/questions/15315452/selecting-with-complex-criteria-from-pandas-dataframe) + +```python +In [19]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}) + ....: + +In [20]: df +Out[20]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 +``` + +和(&),不赋值,直接返回 Series: + +```python +In [21]: df.loc[(df['BBB'] < 25) & (df['CCC'] >= -40), 'AAA'] +Out[21]: +0 4 +1 5 +Name: AAA, dtype: int64 +``` + +或(|),不赋值,直接返回 Series: + +```python +In [22]: df.loc[(df['BBB'] > 25) | (df['CCC'] >= -40), 'AAA'] +Out[22]: +0 4 +1 5 +2 6 +3 7 +Name: AAA, dtype: int64 +``` + +或(|),赋值,修改 DataFrame: + +```python +In [23]: df.loc[(df['BBB'] > 25) | (df['CCC'] >= 75), 'AAA'] = 0.1 + +In [24]: df +Out[24]: + AAA BBB CCC +0 0.1 10 100 +1 5.0 20 50 +2 0.1 30 -30 +3 0.1 40 -50 +``` + +[用 argsort 选择最接近指定值的行](https://stackoverflow.com/questions/17758023/return-rows-in-a-dataframe-closest-to-a-user-defined-number) + +```python +In [25]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}) + ....: + +In [26]: df +Out[26]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 + +In [27]: aValue = 43.0 + +In [28]: df.loc[(df.CCC - aValue).abs().argsort()] +Out[28]: + AAA BBB CCC +1 5 20 50 +0 4 10 100 +2 6 30 -30 +3 7 40 -50 +``` + +[用二进制运算符动态减少条件列表](https://stackoverflow.com/questions/21058254/pandas-boolean-operation-in-a-python-list/21058331) + +```python +In [29]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}) + ....: + +In [30]: df +Out[30]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 + +In [31]: Crit1 = df.AAA <= 5.5 + +In [32]: Crit2 = df.BBB == 10.0 + +In [33]: Crit3 = df.CCC > -40.0 +``` + +硬编码方式为: + +```python +In [34]: AllCrit = Crit1 & Crit2 & Crit3 +``` + +生成动态条件列表: + +```python +In [35]: import functools + +In [36]: CritList = [Crit1, Crit2, Crit3] + +In [37]: AllCrit = functools.reduce(lambda x, y: x & y, CritList) + +In [38]: df[AllCrit] +Out[38]: + AAA BBB CCC +0 4 10 100 +``` + +## 选择 + +### DataFrames + +更多信息,请参阅[索引](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing)文档。 + +[行标签与值作为条件](https://stackoverflow.com/questions/14725068/pandas-using-row-labels-in-boolean-indexing) + +```python +In [39]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}) + ....: + +In [40]: df +Out[40]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 + +In [41]: df[(df.AAA <= 6) & (df.index.isin([0, 2, 4]))] +Out[41]: + AAA BBB CCC +0 4 10 100 +2 6 30 -30 +``` + +[标签切片用 loc,位置切片用 iloc](https://github.com/pandas-dev/pandas/issues/2904) + +```python +In [42]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}, + ....: index=['foo', 'bar', 'boo', 'kar']) + ....: +``` + +前 2 个是显式切片方法,第 3 个是通用方法: + +1. 位置切片,Python 切片风格,不包括结尾数据; +2. 标签切片,非 Python 切片风格,包括结尾数据; +3. 通用切片,支持两种切片风格,取决于切片用的是标签还是位置。 + + +```python +In [43]: df.loc['bar':'kar'] # Label +Out[43]: + AAA BBB CCC +bar 5 20 50 +boo 6 30 -30 +kar 7 40 -50 + +# Generic +In [44]: df.iloc[0:3] +Out[44]: + AAA BBB CCC +foo 4 10 100 +bar 5 20 50 +boo 6 30 -30 + +In [45]: df.loc['bar':'kar'] +Out[45]: + AAA BBB CCC +bar 5 20 50 +boo 6 30 -30 +kar 7 40 -50 +``` + +包含整数,且不从 0 开始的索引,或不是逐步递增的索引会引发歧义。 + +```python +In [46]: data = {'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]} + ....: + +In [47]: df2 = pd.DataFrame(data=data, index=[1, 2, 3, 4]) # Note index starts at 1. + +In [48]: df2.iloc[1:3] # Position-oriented +Out[48]: + AAA BBB CCC +2 5 20 50 +3 6 30 -30 + +In [49]: df2.loc[1:3] # Label-oriented +Out[49]: + AAA BBB CCC +1 4 10 100 +2 5 20 50 +3 6 30 -30 +``` + +[用逆运算符 (~)提取掩码的反向内容](https://stackoverflow.com/questions/14986510/picking-out-elements-based-on-complement-of-indices-in-python-pandas) + +```python +In [50]: df = pd.DataFrame({'AAA': [4, 5, 6, 7], + ....: 'BBB': [10, 20, 30, 40], + ....: 'CCC': [100, 50, -30, -50]}) + ....: + +In [51]: df +Out[51]: + AAA BBB CCC +0 4 10 100 +1 5 20 50 +2 6 30 -30 +3 7 40 -50 + +In [52]: df[~((df.AAA <= 6) & (df.index.isin([0, 2, 4])))] +Out[52]: + AAA BBB CCC +1 5 20 50 +3 7 40 -50 +``` + +### 生成新列 + +[用 applymap 高效动态生成新列](https://stackoverflow.com/questions/16575868/efficiently-creating-additional-columns-in-a-pandas-dataframe-using-map) + +```python +In [53]: df = pd.DataFrame({'AAA': [1, 2, 1, 3], + ....: 'BBB': [1, 1, 2, 2], + ....: 'CCC': [2, 1, 3, 1]}) + ....: + +In [54]: df +Out[54]: + AAA BBB CCC +0 1 1 2 +1 2 1 1 +2 1 2 3 +3 3 2 1 + +In [55]: source_cols = df.columns # Or some subset would work too + +In [56]: new_cols = [str(x) + "_cat" for x in source_cols] + +In [57]: categories = {1: 'Alpha', 2: 'Beta', 3: 'Charlie'} + +In [58]: df[new_cols] = df[source_cols].applymap(categories.get) + +In [59]: df +Out[59]: + AAA BBB CCC AAA_cat BBB_cat CCC_cat +0 1 1 2 Alpha Alpha Beta +1 2 1 1 Beta Alpha Alpha +2 1 2 3 Alpha Beta Charlie +3 3 2 1 Charlie Beta Alpha +``` + +[分组时用 min()](https://stackoverflow.com/questions/23394476/keep-other-columns-when-using-min-with-groupby) + +```python +In [60]: df = pd.DataFrame({'AAA': [1, 1, 1, 2, 2, 2, 3, 3], + ....: 'BBB': [2, 1, 3, 4, 5, 1, 2, 3]}) + ....: + +In [61]: df +Out[61]: + AAA BBB +0 1 2 +1 1 1 +2 1 3 +3 2 4 +4 2 5 +5 2 1 +6 3 2 +7 3 3 +``` + +方法1:用 idxmin() 提取每组最小值的索引 + +```python +In [62]: df.loc[df.groupby("AAA")["BBB"].idxmin()] +Out[62]: + AAA BBB +1 1 1 +5 2 1 +6 3 2 +``` + +方法 2:先排序,再提取每组的第一个值 + +```python +In [63]: df.sort_values(by="BBB").groupby("AAA", as_index=False).first() +Out[63]: + AAA BBB +0 1 1 +1 2 1 +2 3 2 +``` + +注意,提取的数据一样,但索引不一样。 + + +## 多层索引 + +更多信息,请参阅[多层索引](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-hierarchical)文档。 + +[用带标签的字典创建多层索引](https://stackoverflow.com/questions/14916358/reshaping-dataframes-in-pandas-based-on-column-labels) + +```python +In [64]: df = pd.DataFrame({'row': [0, 1, 2], + ....: 'One_X': [1.1, 1.1, 1.1], + ....: 'One_Y': [1.2, 1.2, 1.2], + ....: 'Two_X': [1.11, 1.11, 1.11], + ....: 'Two_Y': [1.22, 1.22, 1.22]}) + ....: + +In [65]: df +Out[65]: + row One_X One_Y Two_X Two_Y +0 0 1.1 1.2 1.11 1.22 +1 1 1.1 1.2 1.11 1.22 +2 2 1.1 1.2 1.11 1.22 + +# 设置索引标签 +In [66]: df = df.set_index('row') + +In [67]: df +Out[67]: + One_X One_Y Two_X Two_Y +row +0 1.1 1.2 1.11 1.22 +1 1.1 1.2 1.11 1.22 +2 1.1 1.2 1.11 1.22 + +# 多层索引的列 +In [68]: df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) + ....: for c in df.columns]) + ....: + +In [69]: df +Out[69]: + One Two + X Y X Y +row +0 1.1 1.2 1.11 1.22 +1 1.1 1.2 1.11 1.22 +2 1.1 1.2 1.11 1.22 + +# 先 stack,然后 Reset 索引 + +In [70]: df = df.stack(0).reset_index(1) + +In [71]: df +Out[71]: + level_1 X Y +row +0 One 1.10 1.20 +0 Two 1.11 1.22 +1 One 1.10 1.20 +1 Two 1.11 1.22 +2 One 1.10 1.20 +2 Two 1.11 1.22 + +# 修整标签,注意自动添加了标签 `level_1` +In [72]: df.columns = ['Sample', 'All_X', 'All_Y'] + +In [73]: df +Out[73]: + Sample All_X All_Y +row +0 One 1.10 1.20 +0 Two 1.11 1.22 +1 One 1.10 1.20 +1 Two 1.11 1.22 +2 One 1.10 1.20 +2 Two 1.11 1.22 +``` + +### 运算 + +[多层索引运算要用广播机制](https://stackoverflow.com/questions/19501510/divide-entire-pandas-multiindex-dataframe-by-dataframe-variable/19502176#19502176) + +```python +In [74]: cols = pd.MultiIndex.from_tuples([(x, y) for x in ['A', 'B', 'C'] + ....: for y in ['O', 'I']]) + ....: + +In [75]: df = pd.DataFrame(np.random.randn(2, 6), index=['n', 'm'], columns=cols) + +In [76]: df +Out[76]: + A B C + O I O I O I +n 0.469112 -0.282863 -1.509059 -1.135632 1.212112 -0.173215 +m 0.119209 -1.044236 -0.861849 -2.104569 -0.494929 1.071804 + +In [77]: df = df.div(df['C'], level=1) + +In [78]: df +Out[78]: + A B C + O I O I O I +n 0.387021 1.633022 -1.244983 6.556214 1.0 1.0 +m -0.240860 -0.974279 1.741358 -1.963577 1.0 1.0 +``` + +### 切片 + +[用 xs 切片多层索引](https://stackoverflow.com/questions/12590131/how-to-slice-multindex-columns-in-pandas-dataframes) + +```python +In [79]: coords = [('AA', 'one'), ('AA', 'six'), ('BB', 'one'), ('BB', 'two'), + ....: ('BB', 'six')] + ....: + +In [80]: index = pd.MultiIndex.from_tuples(coords) + +In [81]: df = pd.DataFrame([11, 22, 33, 44, 55], index, ['MyData']) + +In [82]: df +Out[82]: + MyData +AA one 11 + six 22 +BB one 33 + two 44 + six 55 +``` + +提取第一层与索引第一个轴的交叉数据: + +```python +# 注意:level 与 axis 是可选项,默认为 0 +In [83]: df.xs('BB', level=0, axis=0) +Out[83]: + MyData +one 33 +two 44 +six 55 +``` + +……现在是第 1 个轴的第 2 层 + +```python +In [84]: df.xs('six', level=1, axis=0) +Out[84]: + MyData +AA 22 +BB 55 +``` + +[用 xs 切片多层索引,方法 #2](https://stackoverflow.com/questions/14964493/multiindex-based-indexing-in-pandas) + +```python +In [85]: import itertools + +In [86]: index = list(itertools.product(['Ada', 'Quinn', 'Violet'], + ....: ['Comp', 'Math', 'Sci'])) + ....: + +In [87]: headr = list(itertools.product(['Exams', 'Labs'], ['I', 'II'])) + +In [88]: indx = pd.MultiIndex.from_tuples(index, names=['Student', 'Course']) + +In [89]: cols = pd.MultiIndex.from_tuples(headr) # Notice these are un-named + +In [90]: data = [[70 + x + y + (x * y) % 3 for x in range(4)] for y in range(9)] + +In [91]: df = pd.DataFrame(data, indx, cols) + +In [92]: df +Out[92]: + Exams Labs + I II I II +Student Course +Ada Comp 70 71 72 73 + Math 71 73 75 74 + Sci 72 75 75 75 +Quinn Comp 73 74 75 76 + Math 74 76 78 77 + Sci 75 78 78 78 +Violet Comp 76 77 78 79 + Math 77 79 81 80 + Sci 78 81 81 81 + +In [93]: All = slice(None) + +In [94]: df.loc['Violet'] +Out[94]: + Exams Labs + I II I II +Course +Comp 76 77 78 79 +Math 77 79 81 80 +Sci 78 81 81 81 + +In [95]: df.loc[(All, 'Math'), All] +Out[95]: + Exams Labs + I II I II +Student Course +Ada Math 71 73 75 74 +Quinn Math 74 76 78 77 +Violet Math 77 79 81 80 + +In [96]: df.loc[(slice('Ada', 'Quinn'), 'Math'), All] +Out[96]: + Exams Labs + I II I II +Student Course +Ada Math 71 73 75 74 +Quinn Math 74 76 78 77 + +In [97]: df.loc[(All, 'Math'), ('Exams')] +Out[97]: + I II +Student Course +Ada Math 71 73 +Quinn Math 74 76 +Violet Math 77 79 + +In [98]: df.loc[(All, 'Math'), (All, 'II')] +Out[98]: + Exams Labs + II II +Student Course +Ada Math 73 74 +Quinn Math 76 77 +Violet Math 79 80 +``` + +[用 xs 设置多层索引比例](https://stackoverflow.com/questions/19319432/pandas-selecting-a-lower-level-in-a-dataframe-to-do-a-ffill) + +### 排序 + +[用多层索引按指定列或列序列表排序x](https://stackoverflow.com/questions/14733871/mutli-index-sorting-in-pandas) + +```python +In [99]: df.sort_values(by=('Labs', 'II'), ascending=False) +Out[99]: + Exams Labs + I II I II +Student Course +Violet Sci 78 81 81 81 + Math 77 79 81 80 + Comp 76 77 78 79 +Quinn Sci 75 78 78 78 + Math 74 76 78 77 + Comp 73 74 75 76 +Ada Sci 72 75 75 75 + Math 71 73 75 74 + Comp 70 71 72 73 +``` + +[部分选择,需要排序](https://github.com/pandas-dev/pandas/issues/2995) + +### 层级 + +[为多层索引添加一层](http://stackoverflow.com/questions/14744068/prepend-a-level-to-a-pandas-multiindex) + +[平铺结构化列](http://stackoverflow.com/questions/14507794/python-pandas-how-to-flatten-a-hierarchical-index-in-columns) + + + +## 缺失数据 + +[缺失数据](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data) 文档。 + +向前填充逆序时间序列。 + +```python +In [100]: df = pd.DataFrame(np.random.randn(6, 1), + .....: index=pd.date_range('2013-08-01', periods=6, freq='B'), + .....: columns=list('A')) + .....: + +In [101]: df.loc[df.index[3], 'A'] = np.nan + +In [102]: df +Out[102]: + A +2013-08-01 0.721555 +2013-08-02 -0.706771 +2013-08-05 -1.039575 +2013-08-06 NaN +2013-08-07 -0.424972 +2013-08-08 0.567020 + +In [103]: df.reindex(df.index[::-1]).ffill() +Out[103]: + A +2013-08-08 0.567020 +2013-08-07 -0.424972 +2013-08-06 -0.424972 +2013-08-05 -1.039575 +2013-08-02 -0.706771 +2013-08-01 0.721555 +``` + +[空值时重置为 0,有值时累加](http://stackoverflow.com/questions/18196811/cumsum-reset-at-nan) + +### 替换 + +[用反引用替换](http://stackoverflow.com/questions/16818871/extracting-value-and-creating-new-column-out-of-it) + +## 分组 + +[分组](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby) 文档。 + +[用 apply 执行分组基础操作](http://stackoverflow.com/questions/15322632/python-pandas-df-groupy-agg-column-reference-in-agg) + +与聚合不同,传递给 DataFrame 子集的 apply 可回调,可以访问所有列。 + +```python +In [104]: df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(), + .....: 'size': list('SSMMMLL'), + .....: 'weight': [8, 10, 11, 1, 20, 12, 12], + .....: 'adult': [False] * 5 + [True] * 2}) + .....: + +In [105]: df +Out[105]: + animal size weight adult +0 cat S 8 False +1 dog S 10 False +2 cat M 11 False +3 fish M 1 False +4 dog M 20 False +5 cat L 12 True +6 cat L 12 True + +# 提取 size 列最重的动物列表 +In [106]: df.groupby('animal').apply(lambda subf: subf['size'][subf['weight'].idxmax()]) +Out[106]: +animal +cat L +dog M +fish M +dtype: object +``` + +[使用 get_group](http://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key) + +```python +In [107]: gb = df.groupby(['animal']) + +In [108]: gb.get_group('cat') +Out[108]: + animal size weight adult +0 cat S 8 False +2 cat M 11 False +5 cat L 12 True +6 cat L 12 True +``` + +[为同一分组的不同内容使用 Apply 函数](http://stackoverflow.com/questions/15262134/apply-different-functions-to-different-items-in-group-object-python-pandas) + +```python +In [109]: def GrowUp(x): + .....: avg_weight = sum(x[x['size'] == 'S'].weight * 1.5) + .....: avg_weight += sum(x[x['size'] == 'M'].weight * 1.25) + .....: avg_weight += sum(x[x['size'] == 'L'].weight) + .....: avg_weight /= len(x) + .....: return pd.Series(['L', avg_weight, True], + .....: index=['size', 'weight', 'adult']) + .....: + +In [110]: expected_df = gb.apply(GrowUp) + +In [111]: expected_df +Out[111]: + size weight adult +animal +cat L 12.4375 True +dog L 20.0000 True +fish L 1.2500 True +``` + +[Apply 函数扩展](http://stackoverflow.com/questions/14542145/reductions-down-a-column-in-pandas) + +```python +In [112]: S = pd.Series([i / 100.0 for i in range(1, 11)]) + +In [113]: def cum_ret(x, y): + .....: return x * (1 + y) + .....: + +In [114]: def red(x): + .....: return functools.reduce(cum_ret, x, 1.0) + .....: + +In [115]: S.expanding().apply(red, raw=True) +Out[115]: +0 1.010000 +1 1.030200 +2 1.061106 +3 1.103550 +4 1.158728 +5 1.228251 +6 1.314229 +7 1.419367 +8 1.547110 +9 1.701821 +dtype: float64 +``` + +[用分组里的剩余值的平均值进行替换](http://stackoverflow.com/questions/14760757/replacing-values-with-groupby-means) + +```python +In [116]: df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, -1, 1, 2]}) + +In [117]: gb = df.groupby('A') + +In [118]: def replace(g): + .....: mask = g < 0 + .....: return g.where(mask, g[~mask].mean()) + .....: + +In [119]: gb.transform(replace) +Out[119]: + B +0 1.0 +1 -1.0 +2 1.5 +3 1.5 +``` + +[按聚合数据排序](http://stackoverflow.com/questions/14941366/pandas-sort-by-group-aggregate-and-column) + +```python +In [120]: df = pd.DataFrame({'code': ['foo', 'bar', 'baz'] * 2, + .....: 'data': [0.16, -0.21, 0.33, 0.45, -0.59, 0.62], + .....: 'flag': [False, True] * 3}) + .....: + +In [121]: code_groups = df.groupby('code') + +In [122]: agg_n_sort_order = code_groups[['data']].transform(sum).sort_values(by='data') + +In [123]: sorted_df = df.loc[agg_n_sort_order.index] + +In [124]: sorted_df +Out[124]: + code data flag +1 bar -0.21 True +4 bar -0.59 False +0 foo 0.16 False +3 foo 0.45 True +2 baz 0.33 False +5 baz 0.62 True +``` + +[创建多个聚合列](http://stackoverflow.com/questions/14897100/create-multiple-columns-in-pandas-aggregation-function) + +```python +In [125]: rng = pd.date_range(start="2014-10-07", periods=10, freq='2min') + +In [126]: ts = pd.Series(data=list(range(10)), index=rng) + +In [127]: def MyCust(x): + .....: if len(x) > 2: + .....: return x[1] * 1.234 + .....: return pd.NaT + .....: + +In [128]: mhc = {'Mean': np.mean, 'Max': np.max, 'Custom': MyCust} + +In [129]: ts.resample("5min").apply(mhc) +Out[129]: +Mean 2014-10-07 00:00:00 1 + 2014-10-07 00:05:00 3.5 + 2014-10-07 00:10:00 6 + 2014-10-07 00:15:00 8.5 +Max 2014-10-07 00:00:00 2 + 2014-10-07 00:05:00 4 + 2014-10-07 00:10:00 7 + 2014-10-07 00:15:00 9 +Custom 2014-10-07 00:00:00 1.234 + 2014-10-07 00:05:00 NaT + 2014-10-07 00:10:00 7.404 + 2014-10-07 00:15:00 NaT +dtype: object + +In [130]: ts +Out[130]: +2014-10-07 00:00:00 0 +2014-10-07 00:02:00 1 +2014-10-07 00:04:00 2 +2014-10-07 00:06:00 3 +2014-10-07 00:08:00 4 +2014-10-07 00:10:00 5 +2014-10-07 00:12:00 6 +2014-10-07 00:14:00 7 +2014-10-07 00:16:00 8 +2014-10-07 00:18:00 9 +Freq: 2T, dtype: int64 +``` + +[为 DataFrame 创建值计数列](http://stackoverflow.com/questions/17709270/i-want-to-create-a-column-of-value-counts-in-my-pandas-dataframe) + +```python +In [131]: df = pd.DataFrame({'Color': 'Red Red Red Blue'.split(), + .....: 'Value': [100, 150, 50, 50]}) + .....: + +In [132]: df +Out[132]: + Color Value +0 Red 100 +1 Red 150 +2 Red 50 +3 Blue 50 + +In [133]: df['Counts'] = df.groupby(['Color']).transform(len) + +In [134]: df +Out[134]: + Color Value Counts +0 Red 100 3 +1 Red 150 3 +2 Red 50 3 +3 Blue 50 1 +``` + +[基于索引唯一某列不同分组的值](http://stackoverflow.com/q/23198053/190597) + +```python +In [135]: df = pd.DataFrame({'line_race': [10, 10, 8, 10, 10, 8], + .....: 'beyer': [99, 102, 103, 103, 88, 100]}, + .....: index=['Last Gunfighter', 'Last Gunfighter', + .....: 'Last Gunfighter', 'Paynter', 'Paynter', + .....: 'Paynter']) + .....: + +In [136]: df +Out[136]: + line_race beyer +Last Gunfighter 10 99 +Last Gunfighter 10 102 +Last Gunfighter 8 103 +Paynter 10 103 +Paynter 10 88 +Paynter 8 100 + +In [137]: df['beyer_shifted'] = df.groupby(level=0)['beyer'].shift(1) + +In [138]: df +Out[138]: + line_race beyer beyer_shifted +Last Gunfighter 10 99 NaN +Last Gunfighter 10 102 99.0 +Last Gunfighter 8 103 102.0 +Paynter 10 103 NaN +Paynter 10 88 103.0 +Paynter 8 100 88.0 +``` + +[选择每组最大值的行](http://stackoverflow.com/q/26701849/190597) + +```python +In [139]: df = pd.DataFrame({'host': ['other', 'other', 'that', 'this', 'this'], + .....: 'service': ['mail', 'web', 'mail', 'mail', 'web'], + .....: 'no': [1, 2, 1, 2, 1]}).set_index(['host', 'service']) + .....: + +In [140]: mask = df.groupby(level=0).agg('idxmax') + +In [141]: df_count = df.loc[mask['no']].reset_index() + +In [142]: df_count +Out[142]: + host service no +0 other web 2 +1 that mail 1 +2 this mail 2 +``` + +[Python itertools.groupby 式分组](http://stackoverflow.com/q/29142487/846892) + +```python +In [143]: df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 1, 1], columns=['A']) + +In [144]: df.A.groupby((df.A != df.A.shift()).cumsum()).groups +Out[144]: +{1: Int64Index([0], dtype='int64'), + 2: Int64Index([1], dtype='int64'), + 3: Int64Index([2], dtype='int64'), + 4: Int64Index([3, 4, 5], dtype='int64'), + 5: Int64Index([6], dtype='int64'), + 6: Int64Index([7, 8], dtype='int64')} + +In [145]: df.A.groupby((df.A != df.A.shift()).cumsum()).cumsum() +Out[145]: +0 0 +1 1 +2 0 +3 1 +4 2 +5 3 +6 0 +7 1 +8 2 +Name: A, dtype: int64 +``` + +### 扩展数据 + +[Alignment and to-date](http://stackoverflow.com/questions/15489011/python-time-series-alignment-and-to-date-functions) + +[基于计数值进行移动窗口计算](http://stackoverflow.com/questions/14300768/pandas-rolling-computation-with-window-based-on-values-instead-of-counts) + +[按时间间隔计算滚动平均](http://stackoverflow.com/questions/15771472/pandas-rolling-mean-by-time-interval) + +### 分割 + +[分割 DataFrame](http://stackoverflow.com/questions/13353233/best-way-to-split-a-dataframe-given-an-edge/15449992#15449992) + +按指定逻辑,将不同的行,分割成 DataFrame 列表。 + +```python +In [146]: df = pd.DataFrame(data={'Case': ['A', 'A', 'A', 'B', 'A', 'A', 'B', 'A', + .....: 'A'], + .....: 'Data': np.random.randn(9)}) + .....: + +In [147]: dfs = list(zip(*df.groupby((1 * (df['Case'] == 'B')).cumsum() + .....: .rolling(window=3, min_periods=1).median())))[-1] + .....: + +In [148]: dfs[0] +Out[148]: + Case Data +0 A 0.276232 +1 A -1.087401 +2 A -0.673690 +3 B 0.113648 + +In [149]: dfs[1] +Out[149]: + Case Data +4 A -1.478427 +5 A 0.524988 +6 B 0.404705 + +In [150]: dfs[2] +Out[150]: + Case Data +7 A 0.577046 +8 A -1.715002 +``` + + +### 透视表 + +[透视表](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#reshaping-pivot) 文档。 + +[部分汇总与小计](http://stackoverflow.com/questions/15570099/pandas-pivot-tables-row-subtotals/15574875#15574875) + +```python +In [151]: df = pd.DataFrame(data={'Province': ['ON', 'QC', 'BC', 'AL', 'AL', 'MN', 'ON'], + .....: 'City': ['Toronto', 'Montreal', 'Vancouver', + .....: 'Calgary', 'Edmonton', 'Winnipeg', + .....: 'Windsor'], + .....: 'Sales': [13, 6, 16, 8, 4, 3, 1]}) + .....: + +In [152]: table = pd.pivot_table(df, values=['Sales'], index=['Province'], + .....: columns=['City'], aggfunc=np.sum, margins=True) + .....: + +In [153]: table.stack('City') +Out[153]: + Sales +Province City +AL All 12.0 + Calgary 8.0 + Edmonton 4.0 +BC All 16.0 + Vancouver 16.0 +... ... +All Montreal 6.0 + Toronto 13.0 + Vancouver 16.0 + Windsor 1.0 + Winnipeg 3.0 + +[20 rows x 1 columns] +``` + +[类似 R 的 plyr 频率表](http://stackoverflow.com/questions/15589354/frequency-tables-in-pandas-like-plyr-in-r) + +```python +In [154]: grades = [48, 99, 75, 80, 42, 80, 72, 68, 36, 78] + +In [155]: df = pd.DataFrame({'ID': ["x%d" % r for r in range(10)], + .....: 'Gender': ['F', 'M', 'F', 'M', 'F', + .....: 'M', 'F', 'M', 'M', 'M'], + .....: 'ExamYear': ['2007', '2007', '2007', '2008', '2008', + .....: '2008', '2008', '2009', '2009', '2009'], + .....: 'Class': ['algebra', 'stats', 'bio', 'algebra', + .....: 'algebra', 'stats', 'stats', 'algebra', + .....: 'bio', 'bio'], + .....: 'Participated': ['yes', 'yes', 'yes', 'yes', 'no', + .....: 'yes', 'yes', 'yes', 'yes', 'yes'], + .....: 'Passed': ['yes' if x > 50 else 'no' for x in grades], + .....: 'Employed': [True, True, True, False, + .....: False, False, False, True, True, False], + .....: 'Grade': grades}) + .....: + +In [156]: df.groupby('ExamYear').agg({'Participated': lambda x: x.value_counts()['yes'], + .....: 'Passed': lambda x: sum(x == 'yes'), + .....: 'Employed': lambda x: sum(x), + .....: 'Grade': lambda x: sum(x) / len(x)}) + .....: +Out[156]: + Participated Passed Employed Grade +ExamYear +2007 3 2 3 74.000000 +2008 3 3 0 68.500000 +2009 3 2 2 60.666667 +``` + +[按年生成 DataFrame](http://stackoverflow.com/questions/30379789/plot-pandas-data-frame-with-year-over-year-data) + +跨列表创建年月: + +```python +In [157]: df = pd.DataFrame({'value': np.random.randn(36)}, + .....: index=pd.date_range('2011-01-01', freq='M', periods=36)) + .....: + +In [158]: pd.pivot_table(df, index=df.index.month, columns=df.index.year, + .....: values='value', aggfunc='sum') + .....: +Out[158]: + 2011 2012 2013 +1 -1.039268 -0.968914 2.565646 +2 -0.370647 -1.294524 1.431256 +3 -1.157892 0.413738 1.340309 +4 -1.344312 0.276662 -1.170299 +5 0.844885 -0.472035 -0.226169 +6 1.075770 -0.013960 0.410835 +7 -0.109050 -0.362543 0.813850 +8 1.643563 -0.006154 0.132003 +9 -1.469388 -0.923061 -0.827317 +10 0.357021 0.895717 -0.076467 +11 -0.674600 0.805244 -1.187678 +12 -1.776904 -1.206412 1.130127 +``` + +### Apply 函数 + +[把嵌入列表转换为多层索引 DataFrame](http://stackoverflow.com/questions/17349981/converting-pandas-dataframe-with-categorical-values-into-binary-values) + +```python +In [159]: df = pd.DataFrame(data={'A': [[2, 4, 8, 16], [100, 200], [10, 20, 30]], + .....: 'B': [['a', 'b', 'c'], ['jj', 'kk'], ['ccc']]}, + .....: index=['I', 'II', 'III']) + .....: + +In [160]: def SeriesFromSubList(aList): + .....: return pd.Series(aList) + .....: + +In [161]: df_orgz = pd.concat({ind: row.apply(SeriesFromSubList) + .....: for ind, row in df.iterrows()}) + .....: + +In [162]: df_orgz +Out[162]: + 0 1 2 3 +I A 2 4 8 16.0 + B a b c NaN +II A 100 200 NaN NaN + B jj kk NaN NaN +III A 10 20 30 NaN + B ccc NaN NaN NaN +``` + +[返回 Series](http://stackoverflow.com/questions/19121854/using-rolling-apply-on-a-dataframe-object) + +Rolling Apply to multiple columns where function calculates a Series before a Scalar from the Series is returned + +```python +In [163]: df = pd.DataFrame(data=np.random.randn(2000, 2) / 10000, + .....: index=pd.date_range('2001-01-01', periods=2000), + .....: columns=['A', 'B']) + .....: + +In [164]: df +Out[164]: + A B +2001-01-01 -0.000144 -0.000141 +2001-01-02 0.000161 0.000102 +2001-01-03 0.000057 0.000088 +2001-01-04 -0.000221 0.000097 +2001-01-05 -0.000201 -0.000041 +... ... ... +2006-06-19 0.000040 -0.000235 +2006-06-20 -0.000123 -0.000021 +2006-06-21 -0.000113 0.000114 +2006-06-22 0.000136 0.000109 +2006-06-23 0.000027 0.000030 + +[2000 rows x 2 columns] + +In [165]: def gm(df, const): + .....: v = ((((df.A + df.B) + 1).cumprod()) - 1) * const + .....: return v.iloc[-1] + .....: + +In [166]: s = pd.Series({df.index[i]: gm(df.iloc[i:min(i + 51, len(df) - 1)], 5) + .....: for i in range(len(df) - 50)}) + .....: + +In [167]: s +Out[167]: +2001-01-01 0.000930 +2001-01-02 0.002615 +2001-01-03 0.001281 +2001-01-04 0.001117 +2001-01-05 0.002772 + ... +2006-04-30 0.003296 +2006-05-01 0.002629 +2006-05-02 0.002081 +2006-05-03 0.004247 +2006-05-04 0.003928 +Length: 1950, dtype: float64 +``` + +[返回标量值](http://stackoverflow.com/questions/21040766/python-pandas-rolling-apply-two-column-input-into-function/21045831#21045831) + +Rolling Apply to multiple columns where function returns a Scalar (Volume Weighted Average Price) +对多列执行滚动 Apply,函数返回标量值(成交价加权平均价) + +```python +In [168]: rng = pd.date_range(start='2014-01-01', periods=100) + +In [169]: df = pd.DataFrame({'Open': np.random.randn(len(rng)), + .....: 'Close': np.random.randn(len(rng)), + .....: 'Volume': np.random.randint(100, 2000, len(rng))}, + .....: index=rng) + .....: + +In [170]: df +Out[170]: + Open Close Volume +2014-01-01 -1.611353 -0.492885 1219 +2014-01-02 -3.000951 0.445794 1054 +2014-01-03 -0.138359 -0.076081 1381 +2014-01-04 0.301568 1.198259 1253 +2014-01-05 0.276381 -0.669831 1728 +... ... ... ... +2014-04-06 -0.040338 0.937843 1188 +2014-04-07 0.359661 -0.285908 1864 +2014-04-08 0.060978 1.714814 941 +2014-04-09 1.759055 -0.455942 1065 +2014-04-10 0.138185 -1.147008 1453 + +[100 rows x 3 columns] + +In [171]: def vwap(bars): + .....: return ((bars.Close * bars.Volume).sum() / bars.Volume.sum()) + .....: + +In [172]: window = 5 + +In [173]: s = pd.concat([(pd.Series(vwap(df.iloc[i:i + window]), + .....: index=[df.index[i + window]])) + .....: for i in range(len(df) - window)]) + .....: + +In [174]: s.round(2) +Out[174]: +2014-01-06 0.02 +2014-01-07 0.11 +2014-01-08 0.10 +2014-01-09 0.07 +2014-01-10 -0.29 + ... +2014-04-06 -0.63 +2014-04-07 -0.02 +2014-04-08 -0.03 +2014-04-09 0.34 +2014-04-10 0.29 +Length: 95, dtype: float64 +``` + +## 时间序列 + +[删除指定时间之外的数据](http://stackoverflow.com/questions/14539992/pandas-drop-rows-outside-of-time-range) + +[用 indexer 提取在时间范围内的数据](http://stackoverflow.com/questions/17559885/pandas-dataframe-mask-based-on-index) + +[创建不包括周末,且只包含指定时间的日期时间范围](http://stackoverflow.com/questions/24010830/pandas-generate-sequential-timestamp-with-jump/24014440#24014440?) + +[矢量查询](http://stackoverflow.com/questions/13893227/vectorized-look-up-of-values-in-pandas-dataframe) + +[聚合与绘制时间序列](http://nipunbatra.github.io/2015/06/timeseries/) + +把以小时为列,天为行的矩阵转换为连续的时间序列。 [如何重排 DataFrame?](http://stackoverflow.com/questions/15432659/how-to-rearrange-a-python-pandas-dataframe) + +[重建索引为指定频率时,如何处理重复值](http://stackoverflow.com/questions/22244383/pandas-df-refill-adding-two-columns-of-different-shape) + +为 DatetimeIndex 里每条记录计算当月第一天 + +```python +In [175]: dates = pd.date_range('2000-01-01', periods=5) + +In [176]: dates.to_period(freq='M').to_timestamp() +Out[176]: +DatetimeIndex(['2000-01-01', '2000-01-01', '2000-01-01', '2000-01-01', + '2000-01-01'], + dtype='datetime64[ns]', freq=None) +``` + + + +### 重采样 + +[重采样](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-resampling) 文档。 + +[用 Grouper 代替 TimeGrouper 处理时间分组的值 ](https://stackoverflow.com/questions/15297053/how-can-i-divide-single-values-of-a-dataframe-by-monthly-averages) + +[含缺失值的时间分组](https://stackoverflow.com/questions/33637312/pandas-grouper-by-frequency-with-completeness-requirement) + +[Grouper 的有效时间频率参数](http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) + +[用多层索引分组](https://stackoverflow.com/questions/41483763/pandas-timegrouper-on-multiindex) + +[用 TimeGrouper 与另一个分组创建子分组,再 Apply 自定义函数](https://github.com/pandas-dev/pandas/issues/3791) + +[按自定义时间段重采样](http://stackoverflow.com/questions/15408156/resampling-with-custom-periods) + +[不添加新日期,重采样某日数据](http://stackoverflow.com/questions/14898574/resample-intrday-pandas-dataframe-without-add-new-days) + +[按分钟重采样数据](http://stackoverflow.com/questions/14861023/resampling-minute-data) + +[分组重采样](http://stackoverflow.com/q/18677271/564538) + + +## 合并 + +[连接](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging-concatenation) docs. The [Join](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#merging-join)文档。 + +[模拟 R 的 rbind:追加两个重叠索引的 DataFrame](http://stackoverflow.com/questions/14988480/pandas-version-of-rbind) + +```python +In [177]: rng = pd.date_range('2000-01-01', periods=6) + +In [178]: df1 = pd.DataFrame(np.random.randn(6, 3), index=rng, columns=['A', 'B', 'C']) + +In [179]: df2 = df1.copy() +``` + +基于 df 构建器,需要`ignore_index`。 + +```python +In [180]: df = df1.append(df2, ignore_index=True) + +In [181]: df +Out[181]: + A B C +0 -0.870117 -0.479265 -0.790855 +1 0.144817 1.726395 -0.464535 +2 -0.821906 1.597605 0.187307 +3 -0.128342 -1.511638 -0.289858 +4 0.399194 -1.430030 -0.639760 +5 1.115116 -2.012600 1.810662 +6 -0.870117 -0.479265 -0.790855 +7 0.144817 1.726395 -0.464535 +8 -0.821906 1.597605 0.187307 +9 -0.128342 -1.511638 -0.289858 +10 0.399194 -1.430030 -0.639760 +11 1.115116 -2.012600 1.810662 +``` + +[自连接 DataFrame](https://github.com/pandas-dev/pandas/issues/2996) + +```python +In [182]: df = pd.DataFrame(data={'Area': ['A'] * 5 + ['C'] * 2, + .....: 'Bins': [110] * 2 + [160] * 3 + [40] * 2, + .....: 'Test_0': [0, 1, 0, 1, 2, 0, 1], + .....: 'Data': np.random.randn(7)}) + .....: + +In [183]: df +Out[183]: + Area Bins Test_0 Data +0 A 110 0 -0.433937 +1 A 110 1 -0.160552 +2 A 160 0 0.744434 +3 A 160 1 1.754213 +4 A 160 2 0.000850 +5 C 40 0 0.342243 +6 C 40 1 1.070599 + +In [184]: df['Test_1'] = df['Test_0'] - 1 + +In [185]: pd.merge(df, df, left_on=['Bins', 'Area', 'Test_0'], + .....: right_on=['Bins', 'Area', 'Test_1'], + .....: suffixes=('_L', '_R')) + .....: +Out[185]: + Area Bins Test_0_L Data_L Test_1_L Test_0_R Data_R Test_1_R +0 A 110 0 -0.433937 -1 1 -0.160552 0 +1 A 160 0 0.744434 -1 1 1.754213 0 +2 A 160 1 1.754213 0 2 0.000850 1 +3 C 40 0 0.342243 -1 1 1.070599 0 +``` + +[如何设置索引与连接](http://stackoverflow.com/questions/14341805/pandas-merge-pd-merge-how-to-set-the-index-and-join) + +[KDB 式的 asof 连接](http://stackoverflow.com/questions/12322289/kdb-like-asof-join-for-timeseries-data-in-pandas/12336039#12336039) + +[基于符合条件的值进行连接](http://stackoverflow.com/questions/15581829/how-to-perform-an-inner-or-outer-join-of-dataframes-with-pandas-on-non-simplisti) + +[基于范围里的值,用 searchsorted 合并](http://stackoverflow.com/questions/25125626/pandas-merge-with-logic/2512764) + + + +## 可视化 +[可视化](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#visualization) 文档。 + +[让 Matplotlib 看上去像 R](http://stackoverflow.com/questions/14349055/making-matplotlib-graphs-look-like-r-by-default) + +[设置 x 轴的主次标签](http://stackoverflow.com/questions/12945971/pandas-timeseries-plot-setting-x-axis-major-and-minor-ticks-and-labels) + +[在 iPython Notebook 里创建多个可视图](http://stackoverflow.com/questions/16392921/make-more-than-one-chart-in-same-ipython-notebook-cell) + +[创建多行可视图](http://stackoverflow.com/questions/16568964/make-a-multiline-plot-from-csv-file-in-matplotlib) + +[绘制热力图](http://stackoverflow.com/questions/17050202/plot-timeseries-of-histograms-in-python) + +[标记时间序列图](http://stackoverflow.com/questions/11067368/annotate-time-series-plot-in-matplotlib) + +[标记时间序列图 #2](http://stackoverflow.com/questions/17891493/annotating-points-from-a-pandas-dataframe-in-matplotlib-plot) + +[用 Pandas、Vincent、xlsxwriter 生成 Excel 文件里的嵌入可视图](https://pandas-xlsxwriter-charts.readthedocs.io/) + +[为分层变量的每个四分位数绘制箱型图](http://stackoverflow.com/questions/23232989/boxplot-stratified-by-column-in-python-pandas) + +```python +In [186]: df = pd.DataFrame( + .....: {'stratifying_var': np.random.uniform(0, 100, 20), + .....: 'price': np.random.normal(100, 5, 20)}) + .....: + +In [187]: df['quartiles'] = pd.qcut( + .....: df['stratifying_var'], + .....: 4, + .....: labels=['0-25%', '25-50%', '50-75%', '75-100%']) + .....: + +In [188]: df.boxplot(column='price', by='quartiles') +Out[188]: +``` + +![../_images/quartile_boxplot.png](https://pandas.pydata.org/pandas-docs/stable/_images/quartile_boxplot.png) + +## 数据输入输出 + +[SQL 与 HDF5 性能对比](http://stackoverflow.com/questions/16628329/hdf5-and-sqlite-concurrency-compression-i-o-performance) + + +### CSV + +[CSV](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-read-csv-table)文档 + +[read_csv 函数实战](http://wesmckinney.com/blog/update-on-upcoming-pandas-v0-10-new-file-parser-other-performance-wins/) + +[把 DataFrame 追加到 CSV 文件](http://stackoverflow.com/questions/17134942/pandas-dataframe-output-end-of-csv) + +[分块读取 CSV](http://stackoverflow.com/questions/11622652/large-persistent-dataframe-in-pandas/12193309#12193309) + +[分块读取指定的行](http://stackoverflow.com/questions/19674212/pandas-data-frame-select-rows-and-clear-memory) + +[只读取 DataFrame 的前几列](http://stackoverflow.com/questions/15008970/way-to-read-first-few-lines-for-pandas-dataframe) + +读取不是 `gzip 或 bz2` 压缩(read_csv 可识别的内置压缩格式)的文件。本例在介绍如何读取 `WinZip` 压缩文件的同时,还介绍了在环境管理器里打开文件,并读取内容的通用操作方式。[详见本链接](http://stackoverflow.com/questions/17789907/pandas-convert-winzipped-csv-file-to-data-frame) + +[推断文件数据类型](http://stackoverflow.com/questions/15555005/get-inferred-dataframe-types-iteratively-using-chunksize) + +[处理出错数据](http://github.com/pandas-dev/pandas/issues/2886) + +[处理出错数据 II](http://nipunbatra.github.io/2013/06/reading-unclean-data-csv-using-pandas/) + +[用 Unix 时间戳读取 CSV,并转为本地时区](http://nipunbatra.github.io/2013/06/pandas-reading-csv-with-unix-timestamps-and-converting-to-local-timezone/) + +[写入多行索引 CSV 时,不写入重复值](http://stackoverflow.com/questions/17349574/pandas-write-multiindex-rows-with-to-csv) + + +#### 从多个文件读取数据,创建单个 DataFrame + +最好的方式是先一个个读取单个文件,然后再把每个文件的内容存成列表,再用 `pd.concat()` 组合成一个 DataFrame: + +```python +In [189]: for i in range(3): + .....: data = pd.DataFrame(np.random.randn(10, 4)) + .....: data.to_csv('file_{}.csv'.format(i)) + .....: + +In [190]: files = ['file_0.csv', 'file_1.csv', 'file_2.csv'] + +In [191]: result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True) +``` + +还可以用同样的方法读取所有匹配同一模式的文件,下面这个例子使用的是`glob`: + +```python +In [192]: import glob + +In [193]: import os + +In [194]: files = glob.glob('file_*.csv') + +In [195]: result = pd.concat([pd.read_csv(f) for f in files], ignore_index=True) +``` + +最后,这种方式也适用于 [io 文档](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io) 介绍的其它 `pd.read_*` 函数。 + +#### 解析多列里的日期组件 + +用一种格式解析多列的日期组件,速度更快。 + +```python +In [196]: i = pd.date_range('20000101', periods=10000) + +In [197]: df = pd.DataFrame({'year': i.year, 'month': i.month, 'day': i.day}) + +In [198]: df.head() +Out[198]: + year month day +0 2000 1 1 +1 2000 1 2 +2 2000 1 3 +3 2000 1 4 +4 2000 1 5 + +In [199]: %timeit pd.to_datetime(df.year * 10000 + df.month * 100 + df.day, format='%Y%m%d') + .....: ds = df.apply(lambda x: "%04d%02d%02d" % (x['year'], + .....: x['month'], x['day']), axis=1) + .....: ds.head() + .....: %timeit pd.to_datetime(ds) + .....: +10.6 ms +- 698 us per loop (mean +- std. dev. of 7 runs, 100 loops each) +3.21 ms +- 36.4 us per loop (mean +- std. dev. of 7 runs, 100 loops each) +``` + +#### 跳过标题与数据之间的行 + +```python +In [200]: data = """;;;; + .....: ;;;; + .....: ;;;; + .....: ;;;; + .....: ;;;; + .....: ;;;; + .....: ;;;; + .....: ;;;; + .....: ;;;; + .....: ;;;; + .....: date;Param1;Param2;Param4;Param5 + .....: ;m²;°C;m²;m + .....: ;;;; + .....: 01.01.1990 00:00;1;1;2;3 + .....: 01.01.1990 01:00;5;3;4;5 + .....: 01.01.1990 02:00;9;5;6;7 + .....: 01.01.1990 03:00;13;7;8;9 + .....: 01.01.1990 04:00;17;9;10;11 + .....: 01.01.1990 05:00;21;11;12;13 + .....: """ + .....: +``` + +##### 选项 1:显式跳过行 + +```python +In [201]: from io import StringIO + +In [202]: pd.read_csv(StringIO(data), sep=';', skiprows=[11, 12], + .....: index_col=0, parse_dates=True, header=10) + .....: +Out[202]: + Param1 Param2 Param4 Param5 +date +1990-01-01 00:00:00 1 1 2 3 +1990-01-01 01:00:00 5 3 4 5 +1990-01-01 02:00:00 9 5 6 7 +1990-01-01 03:00:00 13 7 8 9 +1990-01-01 04:00:00 17 9 10 11 +1990-01-01 05:00:00 21 11 12 13 +``` + +##### 选项 2:读取列名,然后再读取数据 + +```python +In [203]: pd.read_csv(StringIO(data), sep=';', header=10, nrows=10).columns +Out[203]: Index(['date', 'Param1', 'Param2', 'Param4', 'Param5'], dtype='object') + +In [204]: columns = pd.read_csv(StringIO(data), sep=';', header=10, nrows=10).columns + +In [205]: pd.read_csv(StringIO(data), sep=';', index_col=0, + .....: header=12, parse_dates=True, names=columns) + .....: +Out[205]: + Param1 Param2 Param4 Param5 +date +1990-01-01 00:00:00 1 1 2 3 +1990-01-01 01:00:00 5 3 4 5 +1990-01-01 02:00:00 9 5 6 7 +1990-01-01 03:00:00 13 7 8 9 +1990-01-01 04:00:00 17 9 10 11 +1990-01-01 05:00:00 21 11 12 13 +``` + +### SQL + +[SQL](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql) 文档 + +[用 SQL 读取数据库](http://stackoverflow.com/questions/10065051/python-pandas-and-databases-like-mysql) + +### Excel + +[Excel](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-excel) 文档 + +[读取文件式句柄](http://stackoverflow.com/questions/15588713/sheets-of-excel-workbook-from-a-url-into-a-pandas-dataframe) + +[用 XlsxWriter 修改输出格式](http://pbpython.com/improve-pandas-excel-output.html) + +### HTML + +[从不能处理默认请求 header 的服务器读取 HTML 表格](http://stackoverflow.com/a/18939272/564538) + +### HDFStore + +[HDFStores](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-hdf5)文档 + +[时间戳索引简单查询](http://stackoverflow.com/questions/13926089/selecting-columns-from-pandas-hdfstore-table) + +[用链式多表架构管理异构数据](http://github.com/pandas-dev/pandas/issues/3032) + +[在硬盘上合并数百万行的表格](http://stackoverflow.com/questions/14614512/merging-two-tables-with-millions-of-rows-in-python/14617925#14617925) + +[避免多进程/线程存储数据出现不一致](http://stackoverflow.com/a/29014295/2858145) + +按块对大规模数据存储去重的本质是递归还原操作。[这里](http://stackoverflow.com/questions/16110252/need-to-compare-very-large-files-around-1-5gb-in-python/16110391#16110391)介绍了一个函数,可以从 CSV 文件里按块提取数据,解析日期后,再按块存储。 + +[按块读取 CSV 文件,并保存](http://stackoverflow.com/questions/20428355/appending-column-to-frame-of-hdf-file-in-pandas/20428786#20428786) + +[追加到已存储的文件,且确保索引唯一](http://stackoverflow.com/questions/16997048/how-does-one-append-large-amounts-of-data-to-a-pandas-hdfstore-and-get-a-natural/16999397#16999397) + +[大规模数据工作流](http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas) + +[读取一系列文件,追加时采用全局唯一索引](http://stackoverflow.com/questions/16997048/how-does-one-append-large-amounts-of-data-to-a-pandas-hdfstore-and-get-a-natural) + +[用低分组密度分组 HDFStore 文件](http://stackoverflow.com/questions/15798209/pandas-group-by-query-on-large-data-in-hdfstore) + +[用高分组密度分组 HDFStore 文件](http://stackoverflow.com/questions/25459982/trouble-with-grouby-on-millions-of-keys-on-a-chunked-file-in-python-pandas/25471765#25471765) + +[HDFStore 文件结构化查询](http://stackoverflow.com/questions/22777284/improve-query-performance-from-a-large-hdfstore-table-with-pandas/22820780#22820780) + +[HDFStore 计数](http://stackoverflow.com/questions/20497897/converting-dict-of-dicts-into-pandas-dataframe-memory-issues) + +[HDFStore 异常解答](http://stackoverflow.com/questions/15488809/how-to-trouble-shoot-hdfstore-exception-cannot-find-the-correct-atom-type) + +[用字符串设置 min_itemsize](http://stackoverflow.com/questions/15988871/hdfstore-appendstring-dataframe-fails-when-string-column-contents-are-longer) + +[用 ptrepack 创建完全排序索引](http://stackoverflow.com/questions/17893370/ptrepack-sortby-needs-full-index) + +把属性存至分组节点 + +```python +In [206]: df = pd.DataFrame(np.random.randn(8, 3)) + +In [207]: store = pd.HDFStore('test.h5') + +In [208]: store.put('df', df) + +# 用 pickle 存储任意 Python 对象 +In [209]: store.get_storer('df').attrs.my_attribute = {'A': 10} + +In [210]: store.get_storer('df').attrs.my_attribute +Out[210]: {'A': 10} +``` + + + +### 二进制文件 + +读取 C 结构体数组组成的二进制文件,Pandas 支持 NumPy 记录数组。 比如说,名为 `main.c` 的文件包含下列 C 代码,并在 64 位机器上用 `gcc main.c -std=gnu99` 进行编译。 + +```python +#include +#include + +typedef struct _Data +{ + int32_t count; + double avg; + float scale; +} Data; + +int main(int argc, const char *argv[]) +{ + size_t n = 10; + Data d[n]; + + for (int i = 0; i < n; ++i) + { + d[i].count = i; + d[i].avg = i + 1.0; + d[i].scale = (float) i + 2.0f; + } + + FILE *file = fopen("binary.dat", "wb"); + fwrite(&d, sizeof(Data), n, file); + fclose(file); + + return 0; +} +``` + +下列 Python 代码读取二进制二建 `binary.dat`,并将之存为 pandas `DataFrame`,每个结构体的元素对应 DataFrame 里的列: + +```python +names = 'count', 'avg', 'scale' + +# 注意:因为结构体填充,位移量比类型尺寸大 +offsets = 0, 8, 16 +formats = 'i4', 'f8', 'f4' +dt = np.dtype({'names': names, 'offsets': offsets, 'formats': formats}, + align=True) +df = pd.DataFrame(np.fromfile('binary.dat', dt)) +``` + +::: tip 注意 + +不同机器上创建的文件因其架构不同,结构化元素的位移量也不同,原生二进制格式文件不能跨平台使用,因此不建议作为通用数据存储格式。建议用 Pandas IO 功能支持的 HDF5 或 msgpack 文件。 + +::: + + +## 计算 + +[基于采样的时间序列数值整合](http://nbviewer.ipython.org/5720498) + +### 相关性 + +用 [`DataFrame.corr()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.corr.html#pandas.DataFrame.corr) 计算得出的相关矩阵的下(或上)三角形式一般都非常有用。下例通过把布尔掩码传递给 `where` 可以实现这一功能: + +```python +In [211]: df = pd.DataFrame(np.random.random(size=(100, 5))) + +In [212]: corr_mat = df.corr() + +In [213]: mask = np.tril(np.ones_like(corr_mat, dtype=np.bool), k=-1) + +In [214]: corr_mat.where(mask) +Out[214]: + 0 1 2 3 4 +0 NaN NaN NaN NaN NaN +1 -0.018923 NaN NaN NaN NaN +2 -0.076296 -0.012464 NaN NaN NaN +3 -0.169941 -0.289416 0.076462 NaN NaN +4 0.064326 0.018759 -0.084140 -0.079859 NaN +``` + +除了命名相关类型之外,`DataFrame.corr` 还接受回调,此处计算 DataFrame 对象的[距离相关矩阵](https://en.wikipedia.org/wiki/Distance_correlation)。 + +```python +In [215]: def distcorr(x, y): + .....: n = len(x) + .....: a = np.zeros(shape=(n, n)) + .....: b = np.zeros(shape=(n, n)) + .....: for i in range(n): + .....: for j in range(i + 1, n): + .....: a[i, j] = abs(x[i] - x[j]) + .....: b[i, j] = abs(y[i] - y[j]) + .....: a += a.T + .....: b += b.T + .....: a_bar = np.vstack([np.nanmean(a, axis=0)] * n) + .....: b_bar = np.vstack([np.nanmean(b, axis=0)] * n) + .....: A = a - a_bar - a_bar.T + np.full(shape=(n, n), fill_value=a_bar.mean()) + .....: B = b - b_bar - b_bar.T + np.full(shape=(n, n), fill_value=b_bar.mean()) + .....: cov_ab = np.sqrt(np.nansum(A * B)) / n + .....: std_a = np.sqrt(np.sqrt(np.nansum(A**2)) / n) + .....: std_b = np.sqrt(np.sqrt(np.nansum(B**2)) / n) + .....: return cov_ab / std_a / std_b + .....: + +In [216]: df = pd.DataFrame(np.random.normal(size=(100, 3))) + +In [217]: df.corr(method=distcorr) +Out[217]: + 0 1 2 +0 1.000000 0.199653 0.214871 +1 0.199653 1.000000 0.195116 +2 0.214871 0.195116 1.000000 +``` + +## 时间差 + +[时间差](https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html#timedeltas-timedeltas)文档。 + +[使用时间差](http://github.com/pandas-dev/pandas/pull/2899) + +```python +In [218]: import datetime + +In [219]: s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D')) + +In [220]: s - s.max() +Out[220]: +0 -2 days +1 -1 days +2 0 days +dtype: timedelta64[ns] + +In [221]: s.max() - s +Out[221]: +0 2 days +1 1 days +2 0 days +dtype: timedelta64[ns] + +In [222]: s - datetime.datetime(2011, 1, 1, 3, 5) +Out[222]: +0 364 days 20:55:00 +1 365 days 20:55:00 +2 366 days 20:55:00 +dtype: timedelta64[ns] + +In [223]: s + datetime.timedelta(minutes=5) +Out[223]: +0 2012-01-01 00:05:00 +1 2012-01-02 00:05:00 +2 2012-01-03 00:05:00 +dtype: datetime64[ns] + +In [224]: datetime.datetime(2011, 1, 1, 3, 5) - s +Out[224]: +0 -365 days +03:05:00 +1 -366 days +03:05:00 +2 -367 days +03:05:00 +dtype: timedelta64[ns] + +In [225]: datetime.timedelta(minutes=5) + s +Out[225]: +0 2012-01-01 00:05:00 +1 2012-01-02 00:05:00 +2 2012-01-03 00:05:00 +dtype: datetime64[ns] +``` + +[日期加减](http://stackoverflow.com/questions/16385785/add-days-to-dates-in-dataframe) + +```python +In [226]: deltas = pd.Series([datetime.timedelta(days=i) for i in range(3)]) + +In [227]: df = pd.DataFrame({'A': s, 'B': deltas}) + +In [228]: df +Out[228]: + A B +0 2012-01-01 0 days +1 2012-01-02 1 days +2 2012-01-03 2 days + +In [229]: df['New Dates'] = df['A'] + df['B'] + +In [230]: df['Delta'] = df['A'] - df['New Dates'] + +In [231]: df +Out[231]: + A B New Dates Delta +0 2012-01-01 0 days 2012-01-01 0 days +1 2012-01-02 1 days 2012-01-03 -1 days +2 2012-01-03 2 days 2012-01-05 -2 days + +In [232]: df.dtypes +Out[232]: +A datetime64[ns] +B timedelta64[ns] +New Dates datetime64[ns] +Delta timedelta64[ns] +dtype: object +``` + +[其它示例](http://stackoverflow.com/questions/15683588/iterating-through-a-pandas-dataframe) + +与 datetime 类似,用 `np.nan` 可以把值设为 `NaT`。 + +```python +In [233]: y = s - s.shift() + +In [234]: y +Out[234]: +0 NaT +1 1 days +2 1 days +dtype: timedelta64[ns] + +In [235]: y[1] = np.nan + +In [236]: y +Out[236]: +0 NaT +1 NaT +2 1 days +dtype: timedelta64[ns] +``` + +## 轴别名 + +设置全局轴别名,可以定义以下两个函数: + +```python +In [237]: def set_axis_alias(cls, axis, alias): + .....: if axis not in cls._AXIS_NUMBERS: + .....: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias)) + .....: cls._AXIS_ALIASES[alias] = axis + .....: +In [238]: def clear_axis_alias(cls, axis, alias): + .....: if axis not in cls._AXIS_NUMBERS: + .....: raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias)) + .....: cls._AXIS_ALIASES.pop(alias, None) + .....: +In [239]: set_axis_alias(pd.DataFrame, 'columns', 'myaxis2') + +In [240]: df2 = pd.DataFrame(np.random.randn(3, 2), columns=['c1', 'c2'], + .....: index=['i1', 'i2', 'i3']) + .....: + +In [241]: df2.sum(axis='myaxis2') +Out[241]: +i1 -0.461013 +i2 2.040016 +i3 0.904681 +dtype: float64 + +In [242]: clear_axis_alias(pd.DataFrame, 'columns', 'myaxis2') +``` + +## 创建示例数据 + +类似 R 的 `expand.grid()` 函数,用不同类型的值组生成 DataFrame,需要创建键是列名,值是数据值列表的字典: + +```python +In [243]: def expand_grid(data_dict): + .....: rows = itertools.product(*data_dict.values()) + .....: return pd.DataFrame.from_records(rows, columns=data_dict.keys()) + .....: + +In [244]: df = expand_grid({'height': [60, 70], + .....: 'weight': [100, 140, 180], + .....: 'sex': ['Male', 'Female']}) + .....: + +In [245]: df +Out[245]: + height weight sex +0 60 100 Male +1 60 100 Female +2 60 140 Male +3 60 140 Female +4 60 180 Male +5 60 180 Female +6 70 100 Male +7 70 100 Female +8 70 140 Male +9 70 140 Female +10 70 180 Male +11 70 180 Female +``` \ No newline at end of file diff --git a/Python/pandas/user_guide/enhancingperf.md b/Python/pandas/user_guide/enhancingperf.md new file mode 100644 index 00000000..099eea77 --- /dev/null +++ b/Python/pandas/user_guide/enhancingperf.md @@ -0,0 +1,984 @@ +# Enhancing performance + +In this part of the tutorial, we will investigate how to speed up certain +functions operating on pandas ``DataFrames`` using three different techniques: +Cython, Numba and [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval). We will see a speed improvement of ~200 +when we use Cython and Numba on a test function operating row-wise on the +``DataFrame``. Using [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) we will speed up a sum by an order of +~2. + +## Cython (writing C extensions for pandas) + +For many use cases writing pandas in pure Python and NumPy is sufficient. In some +computationally heavy applications however, it can be possible to achieve sizable +speed-ups by offloading work to [cython](http://cython.org/). + +This tutorial assumes you have refactored as much as possible in Python, for example +by trying to remove for-loops and making use of NumPy vectorization. It’s always worth +optimising in Python first. + +This tutorial walks through a “typical” process of cythonizing a slow computation. +We use an [example from the Cython documentation](http://docs.cython.org/src/quickstart/cythonize.html) +but in the context of pandas. Our final cythonized solution is around 100 times +faster than the pure Python solution. + +### Pure Python + +We have a ``DataFrame`` to which we want to apply a function row-wise. + +``` python +In [1]: df = pd.DataFrame({'a': np.random.randn(1000), + ...: 'b': np.random.randn(1000), + ...: 'N': np.random.randint(100, 1000, (1000)), + ...: 'x': 'x'}) + ...: + +In [2]: df +Out[2]: + a b N x +0 0.469112 -0.218470 585 x +1 -0.282863 -0.061645 841 x +2 -1.509059 -0.723780 251 x +3 -1.135632 0.551225 972 x +4 1.212112 -0.497767 181 x +.. ... ... ... .. +995 -1.512743 0.874737 374 x +996 0.933753 1.120790 246 x +997 -0.308013 0.198768 157 x +998 -0.079915 1.757555 977 x +999 -1.010589 -1.115680 770 x + +[1000 rows x 4 columns] +``` + +Here’s the function in pure Python: + +``` python +In [3]: def f(x): + ...: return x * (x - 1) + ...: + +In [4]: def integrate_f(a, b, N): + ...: s = 0 + ...: dx = (b - a) / N + ...: for i in range(N): + ...: s += f(a + i * dx) + ...: return s * dx + ...: +``` + +We achieve our result by using ``apply`` (row-wise): + +``` python +In [7]: %timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1) +10 loops, best of 3: 174 ms per loop +``` + +But clearly this isn’t fast enough for us. Let’s take a look and see where the +time is spent during this operation (limited to the most time consuming +four calls) using the [prun ipython magic function](http://ipython.org/ipython-doc/stable/api/generated/IPython.core.magics.execution.html#IPython.core.magics.execution.ExecutionMagics.prun): + +``` python +In [5]: %prun -l 4 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1) # noqa E999 + 672332 function calls (667306 primitive calls) in 0.285 seconds + + Ordered by: internal time + List reduced from 221 to 4 due to restriction <4> + + ncalls tottime percall cumtime percall filename:lineno(function) + 1000 0.144 0.000 0.217 0.000 :1(integrate_f) + 552423 0.074 0.000 0.074 0.000 :1(f) + 3000 0.008 0.000 0.045 0.000 base.py:4695(get_value) + 6001 0.005 0.000 0.012 0.000 {pandas._libs.lib.values_from_object} +``` + +By far the majority of time is spend inside either ``integrate_f`` or ``f``, +hence we’ll concentrate our efforts cythonizing these two functions. + +::: tip Note + +In Python 2 replacing the ``range`` with its generator counterpart (``xrange``) +would mean the ``range`` line would vanish. In Python 3 ``range`` is already a generator. + +::: + +### Plain Cython + +First we’re going to need to import the Cython magic function to ipython: + +``` python +In [6]: %load_ext Cython +``` + +Now, let’s simply copy our functions over to Cython as is (the suffix +is here to distinguish between function versions): + +``` python +In [7]: %%cython + ...: def f_plain(x): + ...: return x * (x - 1) + ...: def integrate_f_plain(a, b, N): + ...: s = 0 + ...: dx = (b - a) / N + ...: for i in range(N): + ...: s += f_plain(a + i * dx) + ...: return s * dx + ...: +``` + +::: tip Note + +If you’re having trouble pasting the above into your ipython, you may need +to be using bleeding edge ipython for paste to play well with cell magics. + +::: + +``` python +In [4]: %timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1) +10 loops, best of 3: 85.5 ms per loop +``` + +Already this has shaved a third off, not too bad for a simple copy and paste. + +### Adding type + +We get another huge improvement simply by providing type information: + +``` python +In [8]: %%cython + ...: cdef double f_typed(double x) except? -2: + ...: return x * (x - 1) + ...: cpdef double integrate_f_typed(double a, double b, int N): + ...: cdef int i + ...: cdef double s, dx + ...: s = 0 + ...: dx = (b - a) / N + ...: for i in range(N): + ...: s += f_typed(a + i * dx) + ...: return s * dx + ...: +``` + +``` python +In [4]: %timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1) +10 loops, best of 3: 20.3 ms per loop +``` + +Now, we’re talking! It’s now over ten times faster than the original python +implementation, and we haven’t *really* modified the code. Let’s have another +look at what’s eating up time: + +``` python +In [9]: %prun -l 4 df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1) + 119905 function calls (114879 primitive calls) in 0.096 seconds + + Ordered by: internal time + List reduced from 216 to 4 due to restriction <4> + + ncalls tottime percall cumtime percall filename:lineno(function) + 3000 0.012 0.000 0.064 0.000 base.py:4695(get_value) + 6001 0.007 0.000 0.017 0.000 {pandas._libs.lib.values_from_object} + 3000 0.007 0.000 0.073 0.000 series.py:1061(__getitem__) + 3000 0.006 0.000 0.006 0.000 {method 'get_value' of 'pandas._libs.index.IndexEngine' objects} +``` + +### Using ndarray + +It’s calling series… a lot! It’s creating a Series from each row, and get-ting from both +the index and the series (three times for each row). Function calls are expensive +in Python, so maybe we could minimize these by cythonizing the apply part. + +::: tip Note + +We are now passing ndarrays into the Cython function, fortunately Cython plays +very nicely with NumPy. + +::: + +``` python +In [10]: %%cython + ....: cimport numpy as np + ....: import numpy as np + ....: cdef double f_typed(double x) except? -2: + ....: return x * (x - 1) + ....: cpdef double integrate_f_typed(double a, double b, int N): + ....: cdef int i + ....: cdef double s, dx + ....: s = 0 + ....: dx = (b - a) / N + ....: for i in range(N): + ....: s += f_typed(a + i * dx) + ....: return s * dx + ....: cpdef np.ndarray[double] apply_integrate_f(np.ndarray col_a, np.ndarray col_b, + ....: np.ndarray col_N): + ....: assert (col_a.dtype == np.float + ....: and col_b.dtype == np.float and col_N.dtype == np.int) + ....: cdef Py_ssize_t i, n = len(col_N) + ....: assert (len(col_a) == len(col_b) == n) + ....: cdef np.ndarray[double] res = np.empty(n) + ....: for i in range(len(col_a)): + ....: res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i]) + ....: return res + ....: +``` + +The implementation is simple, it creates an array of zeros and loops over +the rows, applying our ``integrate_f_typed``, and putting this in the zeros array. + +::: danger Warning + +You can **not pass** a ``Series`` directly as a ``ndarray`` typed parameter +to a Cython function. Instead pass the actual ``ndarray`` using the +[``Series.to_numpy()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy). The reason is that the Cython +definition is specific to an ndarray and not the passed ``Series``. + +So, do not do this: + +``` python +apply_integrate_f(df['a'], df['b'], df['N']) +``` + +But rather, use [``Series.to_numpy()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy) to get the underlying ``ndarray``: + +``` python +apply_integrate_f(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) +``` + +::: + +::: tip Note + +Loops like this would be *extremely* slow in Python, but in Cython looping +over NumPy arrays is *fast*. + +::: + +``` python +In [4]: %timeit apply_integrate_f(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) +1000 loops, best of 3: 1.25 ms per loop +``` + +We’ve gotten another big improvement. Let’s check again where the time is spent: + +``` python +In [11]: %prun -l 4 apply_integrate_f(df['a'].to_numpy(), + ....: df['b'].to_numpy(), + ....: df['N'].to_numpy()) + ....: + File "", line 2 + df['b'].to_numpy(), + ^ +IndentationError: unexpected indent +``` + +As one might expect, the majority of the time is now spent in ``apply_integrate_f``, +so if we wanted to make anymore efficiencies we must continue to concentrate our +efforts here. + +### More advanced techniques + +There is still hope for improvement. Here’s an example of using some more +advanced Cython techniques: + +``` python +In [12]: %%cython + ....: cimport cython + ....: cimport numpy as np + ....: import numpy as np + ....: cdef double f_typed(double x) except? -2: + ....: return x * (x - 1) + ....: cpdef double integrate_f_typed(double a, double b, int N): + ....: cdef int i + ....: cdef double s, dx + ....: s = 0 + ....: dx = (b - a) / N + ....: for i in range(N): + ....: s += f_typed(a + i * dx) + ....: return s * dx + ....: @cython.boundscheck(False) + ....: @cython.wraparound(False) + ....: cpdef np.ndarray[double] apply_integrate_f_wrap(np.ndarray[double] col_a, + ....: np.ndarray[double] col_b, + ....: np.ndarray[int] col_N): + ....: cdef int i, n = len(col_N) + ....: assert len(col_a) == len(col_b) == n + ....: cdef np.ndarray[double] res = np.empty(n) + ....: for i in range(n): + ....: res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i]) + ....: return res + ....: +``` + +``` python +In [4]: %timeit apply_integrate_f_wrap(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) +1000 loops, best of 3: 987 us per loop +``` + +Even faster, with the caveat that a bug in our Cython code (an off-by-one error, +for example) might cause a segfault because memory access isn’t checked. +For more about ``boundscheck`` and ``wraparound``, see the Cython docs on +[compiler directives](http://cython.readthedocs.io/en/latest/src/reference/compilation.html?highlight=wraparound#compiler-directives). + +## Using Numba + +A recent alternative to statically compiling Cython code, is to use a *dynamic jit-compiler*, Numba. + +Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters. + +Numba works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool). Numba supports compilation of Python to run on either CPU or GPU hardware, and is designed to integrate with the Python scientific software stack. + +::: tip Note + +You will need to install Numba. This is easy with ``conda``, by using: ``conda install numba``, see [installing using miniconda](https://pandas.pydata.org/pandas-docs/stable/install.html#install-miniconda). + +::: + +::: tip Note + +As of Numba version 0.20, pandas objects cannot be passed directly to Numba-compiled functions. Instead, one must pass the NumPy array underlying the pandas object to the Numba-compiled function as demonstrated below. + +::: + +### Jit + +We demonstrate how to use Numba to just-in-time compile our code. We simply +take the plain Python code from above and annotate with the ``@jit`` decorator. + +``` python +import numba + + +@numba.jit +def f_plain(x): + return x * (x - 1) + + +@numba.jit +def integrate_f_numba(a, b, N): + s = 0 + dx = (b - a) / N + for i in range(N): + s += f_plain(a + i * dx) + return s * dx + + +@numba.jit +def apply_integrate_f_numba(col_a, col_b, col_N): + n = len(col_N) + result = np.empty(n, dtype='float64') + assert len(col_a) == len(col_b) == n + for i in range(n): + result[i] = integrate_f_numba(col_a[i], col_b[i], col_N[i]) + return result + + +def compute_numba(df): + result = apply_integrate_f_numba(df['a'].to_numpy(), + df['b'].to_numpy(), + df['N'].to_numpy()) + return pd.Series(result, index=df.index, name='result') +``` + +Note that we directly pass NumPy arrays to the Numba function. ``compute_numba`` is just a wrapper that provides a +nicer interface by passing/returning pandas objects. + +``` python +In [4]: %timeit compute_numba(df) +1000 loops, best of 3: 798 us per loop +``` + +In this example, using Numba was faster than Cython. + +### Vectorize + +Numba can also be used to write vectorized functions that do not require the user to explicitly +loop over the observations of a vector; a vectorized function will be applied to each row automatically. +Consider the following toy example of doubling each observation: + +``` python +import numba + + +def double_every_value_nonumba(x): + return x * 2 + + +@numba.vectorize +def double_every_value_withnumba(x): # noqa E501 + return x * 2 +``` + +``` python +# Custom function without numba +In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba) # noqa E501 +1000 loops, best of 3: 797 us per loop + +# Standard implementation (faster than a custom function) +In [6]: %timeit df['col1_doubled'] = df.a * 2 +1000 loops, best of 3: 233 us per loop + +# Custom function with numba +In [7]: %timeit (df['col1_doubled'] = double_every_value_withnumba(df.a.to_numpy()) +1000 loops, best of 3: 145 us per loop +``` + +### Caveats + +::: tip Note + +Numba will execute on any function, but can only accelerate certain classes of functions. + +::: + +Numba is best at accelerating functions that apply numerical functions to NumPy +arrays. When passed a function that only uses operations it knows how to +accelerate, it will execute in ``nopython`` mode. + +If Numba is passed a function that includes something it doesn’t know how to +work with – a category that currently includes sets, lists, dictionaries, or +string functions – it will revert to ``object mode``. In ``object mode``, +Numba will execute but your code will not speed up significantly. If you would +prefer that Numba throw an error if it cannot compile a function in a way that +speeds up your code, pass Numba the argument +``nopython=True`` (e.g. ``@numba.jit(nopython=True)``). For more on +troubleshooting Numba modes, see the [Numba troubleshooting page](http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#the-compiled-code-is-too-slow). + +Read more in the [Numba docs](http://numba.pydata.org/). + +## Expression evaluation via ``eval()`` + +The top-level function [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) implements expression evaluation of +[``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) and [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) objects. + +::: tip Note + +To benefit from using [``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) you need to +install ``numexpr``. See the [recommended dependencies section](https://pandas.pydata.org/pandas-docs/stable/install.html#install-recommended-dependencies) for more details. + +::: + +The point of using [``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) for expression evaluation rather than +plain Python is two-fold: 1) large [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) objects are +evaluated more efficiently and 2) large arithmetic and boolean expressions are +evaluated all at once by the underlying engine (by default ``numexpr`` is used +for evaluation). + +::: tip Note + +You should not use [``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) for simple +expressions or for expressions involving small DataFrames. In fact, +[``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) is many orders of magnitude slower for +smaller expressions/objects than plain ol’ Python. A good rule of thumb is +to only use [``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) when you have a +``DataFrame`` with more than 10,000 rows. + +::: + +[``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) supports all arithmetic expressions supported by the +engine in addition to some extensions available only in pandas. + +::: tip Note + +The larger the frame and the larger the expression the more speedup you will +see from using [``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval). + +::: + +### Supported syntax + +These operations are supported by [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval): + +- Arithmetic operations except for the left shift (``<<``) and right shift +(``>>``) operators, e.g., ``df + 2 * pi / s ** 4 % 42 - the_golden_ratio`` +- Comparison operations, including chained comparisons, e.g., ``2 < df < df2`` +- Boolean operations, e.g., ``df < df2 and df3 < df4 or not df_bool`` +- ``list`` and ``tuple`` literals, e.g., ``[1, 2]`` or ``(1, 2)`` +- Attribute access, e.g., ``df.a`` +- Subscript expressions, e.g., ``df[0]`` +- Simple variable evaluation, e.g., ``pd.eval('df')`` (this is not very useful) +- Math functions: *sin*, *cos*, *exp*, *log*, *expm1*, *log1p*, +*sqrt*, *sinh*, *cosh*, *tanh*, *arcsin*, *arccos*, *arctan*, *arccosh*, +*arcsinh*, *arctanh*, *abs*, *arctan2* and *log10*. + +This Python syntax is **not** allowed: + +- Expressions + - Function calls other than math functions. + - ``is``/``is not`` operations + - ``if`` expressions + - ``lambda`` expressions + - ``list``/``set``/``dict`` comprehensions + - Literal ``dict`` and ``set`` expressions + - ``yield`` expressions + - Generator expressions + - Boolean expressions consisting of only scalar values + +- Statements + + - Neither [simple](https://docs.python.org/3/reference/simple_stmts.html) + nor [compound](https://docs.python.org/3/reference/compound_stmts.html) + statements are allowed. This includes things like ``for``, ``while``, and + ``if``. + +### ``eval()`` examples + +[``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) works well with expressions containing large arrays. + +First let’s create a few decent-sized arrays to play with: + +``` python +In [13]: nrows, ncols = 20000, 100 + +In [14]: df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols)) for _ in range(4)] +``` + +Now let’s compare adding them together using plain ol’ Python versus +[``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval): + +``` python +In [15]: %timeit df1 + df2 + df3 + df4 +21 ms +- 787 us per loop (mean +- std. dev. of 7 runs, 10 loops each) +``` + +``` python +In [16]: %timeit pd.eval('df1 + df2 + df3 + df4') +8.12 ms +- 249 us per loop (mean +- std. dev. of 7 runs, 100 loops each) +``` + +Now let’s do the same thing but with comparisons: + +``` python +In [17]: %timeit (df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0) +272 ms +- 6.92 ms per loop (mean +- std. dev. of 7 runs, 1 loop each) +``` + +``` python +In [18]: %timeit pd.eval('(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)') +19.2 ms +- 1.87 ms per loop (mean +- std. dev. of 7 runs, 10 loops each) +``` + +[``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) also works with unaligned pandas objects: + +``` python +In [19]: s = pd.Series(np.random.randn(50)) + +In [20]: %timeit df1 + df2 + df3 + df4 + s +103 ms +- 12.7 ms per loop (mean +- std. dev. of 7 runs, 10 loops each) +``` + +``` python +In [21]: %timeit pd.eval('df1 + df2 + df3 + df4 + s') +10.2 ms +- 215 us per loop (mean +- std. dev. of 7 runs, 100 loops each) +``` + +::: tip Note + +Operations such as + +``` python +1 and 2 # would parse to 1 & 2, but should evaluate to 2 +3 or 4 # would parse to 3 | 4, but should evaluate to 3 +~1 # this is okay, but slower when using eval +``` + +should be performed in Python. An exception will be raised if you try to +perform any boolean/bitwise operations with scalar operands that are not +of type ``bool`` or ``np.bool_``. Again, you should perform these kinds of +operations in plain Python. + +::: + +### The ``DataFrame.eval`` method + +In addition to the top level [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) function you can also +evaluate an expression in the “context” of a [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame). + +``` python +In [22]: df = pd.DataFrame(np.random.randn(5, 2), columns=['a', 'b']) + +In [23]: df.eval('a + b') +Out[23]: +0 -0.246747 +1 0.867786 +2 -1.626063 +3 -1.134978 +4 -1.027798 +dtype: float64 +``` + +Any expression that is a valid [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) expression is also a valid +[``DataFrame.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval) expression, with the added benefit that you don’t have to +prefix the name of the [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) to the column(s) you’re +interested in evaluating. + +In addition, you can perform assignment of columns within an expression. +This allows for *formulaic evaluation*. The assignment target can be a +new column name or an existing column name, and it must be a valid Python +identifier. + +*New in version 0.18.0.* + +The ``inplace`` keyword determines whether this assignment will performed +on the original ``DataFrame`` or return a copy with the new column. + +::: danger Warning + +For backwards compatibility, ``inplace`` defaults to ``True`` if not +specified. This will change in a future version of pandas - if your +code depends on an inplace assignment you should update to explicitly +set ``inplace=True``. + +::: + +``` python +In [24]: df = pd.DataFrame(dict(a=range(5), b=range(5, 10))) + +In [25]: df.eval('c = a + b', inplace=True) + +In [26]: df.eval('d = a + b + c', inplace=True) + +In [27]: df.eval('a = 1', inplace=True) + +In [28]: df +Out[28]: + a b c d +0 1 5 5 10 +1 1 6 7 14 +2 1 7 9 18 +3 1 8 11 22 +4 1 9 13 26 +``` + +When ``inplace`` is set to ``False``, a copy of the ``DataFrame`` with the +new or modified columns is returned and the original frame is unchanged. + +``` python +In [29]: df +Out[29]: + a b c d +0 1 5 5 10 +1 1 6 7 14 +2 1 7 9 18 +3 1 8 11 22 +4 1 9 13 26 + +In [30]: df.eval('e = a - c', inplace=False) +Out[30]: + a b c d e +0 1 5 5 10 -4 +1 1 6 7 14 -6 +2 1 7 9 18 -8 +3 1 8 11 22 -10 +4 1 9 13 26 -12 + +In [31]: df +Out[31]: + a b c d +0 1 5 5 10 +1 1 6 7 14 +2 1 7 9 18 +3 1 8 11 22 +4 1 9 13 26 +``` + +*New in version 0.18.0.* + +As a convenience, multiple assignments can be performed by using a +multi-line string. + +``` python +In [32]: df.eval(""" + ....: c = a + b + ....: d = a + b + c + ....: a = 1""", inplace=False) + ....: +Out[32]: + a b c d +0 1 5 6 12 +1 1 6 7 14 +2 1 7 8 16 +3 1 8 9 18 +4 1 9 10 20 +``` + +The equivalent in standard Python would be + +``` python +In [33]: df = pd.DataFrame(dict(a=range(5), b=range(5, 10))) + +In [34]: df['c'] = df.a + df.b + +In [35]: df['d'] = df.a + df.b + df.c + +In [36]: df['a'] = 1 + +In [37]: df +Out[37]: + a b c d +0 1 5 5 10 +1 1 6 7 14 +2 1 7 9 18 +3 1 8 11 22 +4 1 9 13 26 +``` + +*New in version 0.18.0.* + +The ``query`` method gained the ``inplace`` keyword which determines +whether the query modifies the original frame. + +``` python +In [38]: df = pd.DataFrame(dict(a=range(5), b=range(5, 10))) + +In [39]: df.query('a > 2') +Out[39]: + a b +3 3 8 +4 4 9 + +In [40]: df.query('a > 2', inplace=True) + +In [41]: df +Out[41]: + a b +3 3 8 +4 4 9 +``` + +::: danger Warning + +Unlike with ``eval``, the default value for ``inplace`` for ``query`` +is ``False``. This is consistent with prior versions of pandas. + +::: + +### Local variables + +You must *explicitly reference* any local variable that you want to use in an +expression by placing the ``@`` character in front of the name. For example, + +``` python +In [42]: df = pd.DataFrame(np.random.randn(5, 2), columns=list('ab')) + +In [43]: newcol = np.random.randn(len(df)) + +In [44]: df.eval('b + @newcol') +Out[44]: +0 -0.173926 +1 2.493083 +2 -0.881831 +3 -0.691045 +4 1.334703 +dtype: float64 + +In [45]: df.query('b < @newcol') +Out[45]: + a b +0 0.863987 -0.115998 +2 -2.621419 -1.297879 +``` + +If you don’t prefix the local variable with ``@``, pandas will raise an +exception telling you the variable is undefined. + +When using [``DataFrame.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.eval.html#pandas.DataFrame.eval) and [``DataFrame.query()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query), this allows you +to have a local variable and a [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) column with the same +name in an expression. + +``` python +In [46]: a = np.random.randn() + +In [47]: df.query('@a < a') +Out[47]: + a b +0 0.863987 -0.115998 + +In [48]: df.loc[a < df.a] # same as the previous expression +Out[48]: + a b +0 0.863987 -0.115998 +``` + +With [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) you cannot use the ``@`` prefix *at all*, because it +isn’t defined in that context. ``pandas`` will let you know this if you try to +use ``@`` in a top-level call to [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval). For example, + +``` python +In [49]: a, b = 1, 2 + +In [50]: pd.eval('@a + b') +Traceback (most recent call last): + + File "/opt/conda/envs/pandas/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3325, in run_code + exec(code_obj, self.user_global_ns, self.user_ns) + + File "", line 1, in + pd.eval('@a + b') + + File "/pandas/pandas/core/computation/eval.py", line 311, in eval + _check_for_locals(expr, level, parser) + + File "/pandas/pandas/core/computation/eval.py", line 166, in _check_for_locals + raise SyntaxError(msg) + + File "", line unknown +SyntaxError: The '@' prefix is not allowed in top-level eval calls, +please refer to your variables by name without the '@' prefix +``` + +In this case, you should simply refer to the variables like you would in +standard Python. + +``` python +In [51]: pd.eval('a + b') +Out[51]: 3 +``` + +### ``pandas.eval()`` parsers + +There are two different parsers and two different engines you can use as +the backend. + +The default ``'pandas'`` parser allows a more intuitive syntax for expressing +query-like operations (comparisons, conjunctions and disjunctions). In +particular, the precedence of the ``&`` and ``|`` operators is made equal to +the precedence of the corresponding boolean operations ``and`` and ``or``. + +For example, the above conjunction can be written without parentheses. +Alternatively, you can use the ``'python'`` parser to enforce strict Python +semantics. + +``` python +In [52]: expr = '(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)' + +In [53]: x = pd.eval(expr, parser='python') + +In [54]: expr_no_parens = 'df1 > 0 & df2 > 0 & df3 > 0 & df4 > 0' + +In [55]: y = pd.eval(expr_no_parens, parser='pandas') + +In [56]: np.all(x == y) +Out[56]: True +``` + +The same expression can be “anded” together with the word [``and``](https://docs.python.org/3/reference/expressions.html#and) as +well: + +``` python +In [57]: expr = '(df1 > 0) & (df2 > 0) & (df3 > 0) & (df4 > 0)' + +In [58]: x = pd.eval(expr, parser='python') + +In [59]: expr_with_ands = 'df1 > 0 and df2 > 0 and df3 > 0 and df4 > 0' + +In [60]: y = pd.eval(expr_with_ands, parser='pandas') + +In [61]: np.all(x == y) +Out[61]: True +``` + +The ``and`` and ``or`` operators here have the same precedence that they would +in vanilla Python. + +### ``pandas.eval()`` backends + +There’s also the option to make [``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) operate identical to plain +ol’ Python. + +::: tip Note + +Using the ``'python'`` engine is generally *not* useful, except for testing +other evaluation engines against it. You will achieve **no** performance +benefits using [``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) with ``engine='python'`` and in fact may +incur a performance hit. + +::: + +You can see this by using [``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) with the ``'python'`` engine. It +is a bit slower (not by much) than evaluating the same expression in Python + +``` python +In [62]: %timeit df1 + df2 + df3 + df4 +9.5 ms +- 241 us per loop (mean +- std. dev. of 7 runs, 100 loops each) +``` + +``` python +In [63]: %timeit pd.eval('df1 + df2 + df3 + df4', engine='python') +10.8 ms +- 898 us per loop (mean +- std. dev. of 7 runs, 100 loops each) +``` + +### ``pandas.eval()`` performance + +[``eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) is intended to speed up certain kinds of operations. In +particular, those operations involving complex expressions with large +[``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)/[``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) objects should see a +significant performance benefit. Here is a plot showing the running time of +[``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) as function of the size of the frame involved in the +computation. The two lines are two different engines. + +![eval-perf](https://static.pypandas.cn/public/static/images/eval-perf.png) + +::: tip Note + +Operations with smallish objects (around 15k-20k rows) are faster using +plain Python: + +![eval-perf-small](https://static.pypandas.cn/public/static/images/eval-perf-small.png) + +::: + +This plot was created using a ``DataFrame`` with 3 columns each containing +floating point values generated using ``numpy.random.randn()``. + +### Technical minutia regarding expression evaluation + +Expressions that would result in an object dtype or involve datetime operations +(because of ``NaT``) must be evaluated in Python space. The main reason for +this behavior is to maintain backwards compatibility with versions of NumPy < +1.7. In those versions of NumPy a call to ``ndarray.astype(str)`` will +truncate any strings that are more than 60 characters in length. Second, we +can’t pass ``object`` arrays to ``numexpr`` thus string comparisons must be +evaluated in Python space. + +The upshot is that this *only* applies to object-dtype expressions. So, if +you have an expression–for example + +``` python +In [64]: df = pd.DataFrame({'strings': np.repeat(list('cba'), 3), + ....: 'nums': np.repeat(range(3), 3)}) + ....: + +In [65]: df +Out[65]: + strings nums +0 c 0 +1 c 0 +2 c 0 +3 b 1 +4 b 1 +5 b 1 +6 a 2 +7 a 2 +8 a 2 + +In [66]: df.query('strings == "a" and nums == 1') +Out[66]: +Empty DataFrame +Columns: [strings, nums] +Index: [] +``` + +the numeric part of the comparison (``nums == 1``) will be evaluated by +``numexpr``. + +In general, [``DataFrame.query()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query)/[``pandas.eval()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.eval.html#pandas.eval) will +evaluate the subexpressions that *can* be evaluated by ``numexpr`` and those +that must be evaluated in Python space transparently to the user. This is done +by inferring the result type of an expression from its arguments and operators. + \ No newline at end of file diff --git a/Python/pandas/user_guide/gotchas.md b/Python/pandas/user_guide/gotchas.md new file mode 100644 index 00000000..86f5f018 --- /dev/null +++ b/Python/pandas/user_guide/gotchas.md @@ -0,0 +1,429 @@ +# Frequently Asked Questions (FAQ) + +## DataFrame memory usage + +The memory usage of a ``DataFrame`` (including the index) is shown when calling +the [``info()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info). A configuration option, ``display.memory_usage`` +(see [the list of options](options.html#options-available)), specifies if the +``DataFrame``’s memory usage will be displayed when invoking the ``df.info()`` +method. + +For example, the memory usage of the ``DataFrame`` below is shown +when calling [``info()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info): + +``` python +In [1]: dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]', + ...: 'complex128', 'object', 'bool'] + ...: + +In [2]: n = 5000 + +In [3]: data = {t: np.random.randint(100, size=n).astype(t) for t in dtypes} + +In [4]: df = pd.DataFrame(data) + +In [5]: df['categorical'] = df['object'].astype('category') + +In [6]: df.info() + +RangeIndex: 5000 entries, 0 to 4999 +Data columns (total 8 columns): +int64 5000 non-null int64 +float64 5000 non-null float64 +datetime64[ns] 5000 non-null datetime64[ns] +timedelta64[ns] 5000 non-null timedelta64[ns] +complex128 5000 non-null complex128 +object 5000 non-null object +bool 5000 non-null bool +categorical 5000 non-null category +dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int64(1), object(1), timedelta64[ns](1) +memory usage: 289.1+ KB +``` + +The ``+`` symbol indicates that the true memory usage could be higher, because +pandas does not count the memory used by values in columns with +``dtype=object``. + +Passing ``memory_usage='deep'`` will enable a more accurate memory usage report, +accounting for the full usage of the contained objects. This is optional +as it can be expensive to do this deeper introspection. + +``` python +In [7]: df.info(memory_usage='deep') + +RangeIndex: 5000 entries, 0 to 4999 +Data columns (total 8 columns): +int64 5000 non-null int64 +float64 5000 non-null float64 +datetime64[ns] 5000 non-null datetime64[ns] +timedelta64[ns] 5000 non-null timedelta64[ns] +complex128 5000 non-null complex128 +object 5000 non-null object +bool 5000 non-null bool +categorical 5000 non-null category +dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int64(1), object(1), timedelta64[ns](1) +memory usage: 425.6 KB +``` + +By default the display option is set to ``True`` but can be explicitly +overridden by passing the ``memory_usage`` argument when invoking ``df.info()``. + +The memory usage of each column can be found by calling the +[``memory_usage()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.memory_usage.html#pandas.DataFrame.memory_usage) method. This returns a ``Series`` with an index +represented by column names and memory usage of each column shown in bytes. For +the ``DataFrame`` above, the memory usage of each column and the total memory +usage can be found with the ``memory_usage`` method: + +``` python +In [8]: df.memory_usage() +Out[8]: +Index 128 +int64 40000 +float64 40000 +datetime64[ns] 40000 +timedelta64[ns] 40000 +complex128 80000 +object 40000 +bool 5000 +categorical 10920 +dtype: int64 + +# total memory usage of dataframe +In [9]: df.memory_usage().sum() +Out[9]: 296048 +``` + +By default the memory usage of the ``DataFrame``’s index is shown in the +returned ``Series``, the memory usage of the index can be suppressed by passing +the ``index=False`` argument: + +``` python +In [10]: df.memory_usage(index=False) +Out[10]: +int64 40000 +float64 40000 +datetime64[ns] 40000 +timedelta64[ns] 40000 +complex128 80000 +object 40000 +bool 5000 +categorical 10920 +dtype: int64 +``` + +The memory usage displayed by the [``info()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html#pandas.DataFrame.info) method utilizes the +[``memory_usage()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.memory_usage.html#pandas.DataFrame.memory_usage) method to determine the memory usage of a +``DataFrame`` while also formatting the output in human-readable units (base-2 +representation; i.e. 1KB = 1024 bytes). + +See also [Categorical Memory Usage](categorical.html#categorical-memory). + +## Using if/truth statements with pandas + +pandas follows the NumPy convention of raising an error when you try to convert +something to a ``bool``. This happens in an ``if``-statement or when using the +boolean operations: ``and``, ``or``, and ``not``. It is not clear what the result +of the following code should be: + +``` python +>>> if pd.Series([False, True, False]): +... pass +``` + +Should it be ``True`` because it’s not zero-length, or ``False`` because there +are ``False`` values? It is unclear, so instead, pandas raises a ``ValueError``: + +``` python +>>> if pd.Series([False, True, False]): +... print("I was true") +Traceback + ... +ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all(). +``` + +You need to explicitly choose what you want to do with the ``DataFrame``, e.g. +use [``any()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.any.html#pandas.DataFrame.any), [``all()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.all.html#pandas.DataFrame.all) or [``empty()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.empty.html#pandas.DataFrame.empty). +Alternatively, you might want to compare if the pandas object is ``None``: + +``` python +>>> if pd.Series([False, True, False]) is not None: +... print("I was not None") +I was not None +``` + +Below is how to check if any of the values are ``True``: + +``` python +>>> if pd.Series([False, True, False]).any(): +... print("I am any") +I am any +``` + +To evaluate single-element pandas objects in a boolean context, use the method +[``bool()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.bool.html#pandas.DataFrame.bool): + +``` python +In [11]: pd.Series([True]).bool() +Out[11]: True + +In [12]: pd.Series([False]).bool() +Out[12]: False + +In [13]: pd.DataFrame([[True]]).bool() +Out[13]: True + +In [14]: pd.DataFrame([[False]]).bool() +Out[14]: False +``` + +### Bitwise boolean + +Bitwise boolean operators like ``==`` and ``!=`` return a boolean ``Series``, +which is almost always what you want anyways. + +``` python +>>> s = pd.Series(range(5)) +>>> s == 4 +0 False +1 False +2 False +3 False +4 True +dtype: bool +``` + +See [boolean comparisons](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-compare) for more examples. + +### Using the ``in`` operator + +Using the Python ``in`` operator on a ``Series`` tests for membership in the +index, not membership among the values. + +``` python +In [15]: s = pd.Series(range(5), index=list('abcde')) + +In [16]: 2 in s +Out[16]: False + +In [17]: 'b' in s +Out[17]: True +``` + +If this behavior is surprising, keep in mind that using ``in`` on a Python +dictionary tests keys, not values, and ``Series`` are dict-like. +To test for membership in the values, use the method [``isin()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html#pandas.Series.isin): + +``` python +In [18]: s.isin([2]) +Out[18]: +a False +b False +c True +d False +e False +dtype: bool + +In [19]: s.isin([2]).any() +Out[19]: True +``` + +For ``DataFrames``, likewise, ``in`` applies to the column axis, +testing for membership in the list of column names. + +## ``NaN``, Integer ``NA`` values and ``NA`` type promotions + +### Choice of ``NA`` representation + +For lack of ``NA`` (missing) support from the ground up in NumPy and Python in +general, we were given the difficult choice between either: + +- A *masked array* solution: an array of data and an array of boolean values +indicating whether a value is there or is missing. +- Using a special sentinel value, bit pattern, or set of sentinel values to +denote ``NA`` across the dtypes. + +For many reasons we chose the latter. After years of production use it has +proven, at least in my opinion, to be the best decision given the state of +affairs in NumPy and Python in general. The special value ``NaN`` +(Not-A-Number) is used everywhere as the ``NA`` value, and there are API +functions ``isna`` and ``notna`` which can be used across the dtypes to +detect NA values. + +However, it comes with it a couple of trade-offs which I most certainly have +not ignored. + +### Support for integer ``NA`` + +In the absence of high performance ``NA`` support being built into NumPy from +the ground up, the primary casualty is the ability to represent NAs in integer +arrays. For example: + +``` python +In [20]: s = pd.Series([1, 2, 3, 4, 5], index=list('abcde')) + +In [21]: s +Out[21]: +a 1 +b 2 +c 3 +d 4 +e 5 +dtype: int64 + +In [22]: s.dtype +Out[22]: dtype('int64') + +In [23]: s2 = s.reindex(['a', 'b', 'c', 'f', 'u']) + +In [24]: s2 +Out[24]: +a 1.0 +b 2.0 +c 3.0 +f NaN +u NaN +dtype: float64 + +In [25]: s2.dtype +Out[25]: dtype('float64') +``` + +This trade-off is made largely for memory and performance reasons, and also so +that the resulting ``Series`` continues to be “numeric”. + +If you need to represent integers with possibly missing values, use one of +the nullable-integer extension dtypes provided by pandas + +- [``Int8Dtype``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Int8Dtype.html#pandas.Int8Dtype) +- [``Int16Dtype``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Int16Dtype.html#pandas.Int16Dtype) +- [``Int32Dtype``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Int32Dtype.html#pandas.Int32Dtype) +- [``Int64Dtype``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Int64Dtype.html#pandas.Int64Dtype) + +``` python +In [26]: s_int = pd.Series([1, 2, 3, 4, 5], index=list('abcde'), + ....: dtype=pd.Int64Dtype()) + ....: + +In [27]: s_int +Out[27]: +a 1 +b 2 +c 3 +d 4 +e 5 +dtype: Int64 + +In [28]: s_int.dtype +Out[28]: Int64Dtype() + +In [29]: s2_int = s_int.reindex(['a', 'b', 'c', 'f', 'u']) + +In [30]: s2_int +Out[30]: +a 1 +b 2 +c 3 +f NaN +u NaN +dtype: Int64 + +In [31]: s2_int.dtype +Out[31]: Int64Dtype() +``` + +See [Nullable integer data type](integer_na.html#integer-na) for more. + +### ``NA`` type promotions + +When introducing NAs into an existing ``Series`` or ``DataFrame`` via +[``reindex()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html#pandas.Series.reindex) or some other means, boolean and integer types will be +promoted to a different dtype in order to store the NAs. The promotions are +summarized in this table: + +Typeclass | Promotion dtype for storing NAs +---|--- +floating | no change +object | no change +integer | cast to float64 +boolean | cast to object + +While this may seem like a heavy trade-off, I have found very few cases where +this is an issue in practice i.e. storing values greater than 2**53. Some +explanation for the motivation is in the next section. + +### Why not make NumPy like R? + +Many people have suggested that NumPy should simply emulate the ``NA`` support +present in the more domain-specific statistical programming language [R](https://r-project.org). Part of the reason is the NumPy type hierarchy: + +Typeclass | Dtypes +---|--- +numpy.floating | float16, float32, float64, float128 +numpy.integer | int8, int16, int32, int64 +numpy.unsignedinteger | uint8, uint16, uint32, uint64 +numpy.object_ | object_ +numpy.bool_ | bool_ +numpy.character | string_, unicode_ + +The R language, by contrast, only has a handful of built-in data types: +``integer``, ``numeric`` (floating-point), ``character``, and +``boolean``. ``NA`` types are implemented by reserving special bit patterns for +each type to be used as the missing value. While doing this with the full NumPy +type hierarchy would be possible, it would be a more substantial trade-off +(especially for the 8- and 16-bit data types) and implementation undertaking. + +An alternate approach is that of using masked arrays. A masked array is an +array of data with an associated boolean *mask* denoting whether each value +should be considered ``NA`` or not. I am personally not in love with this +approach as I feel that overall it places a fairly heavy burden on the user and +the library implementer. Additionally, it exacts a fairly high performance cost +when working with numerical data compared with the simple approach of using +``NaN``. Thus, I have chosen the Pythonic “practicality beats purity” approach +and traded integer ``NA`` capability for a much simpler approach of using a +special value in float and object arrays to denote ``NA``, and promoting +integer arrays to floating when NAs must be introduced. + +## Differences with NumPy + +For ``Series`` and ``DataFrame`` objects, [``var()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.var.html#pandas.DataFrame.var) normalizes by +``N-1`` to produce unbiased estimates of the sample variance, while NumPy’s +``var`` normalizes by N, which measures the variance of the sample. Note that +[``cov()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cov.html#pandas.DataFrame.cov) normalizes by ``N-1`` in both pandas and NumPy. + +## Thread-safety + +As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to +the [``copy()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.copy.html#pandas.DataFrame.copy) method. If you are doing a lot of copying of +``DataFrame`` objects shared among threads, we recommend holding locks inside +the threads where the data copying occurs. + +See [this link](https://stackoverflow.com/questions/13592618/python-pandas-dataframe-thread-safe) +for more information. + +## Byte-Ordering issues + +Occasionally you may have to deal with data that were created on a machine with +a different byte order than the one on which you are running Python. A common +symptom of this issue is an error like:: + +``` python +Traceback + ... +ValueError: Big-endian buffer not supported on little-endian compiler +``` + +To deal +with this issue you should convert the underlying NumPy array to the native +system byte order *before* passing it to ``Series`` or ``DataFrame`` +constructors using something similar to the following: + +``` python +In [32]: x = np.array(list(range(10)), '>i4') # big endian + +In [33]: newx = x.byteswap().newbyteorder() # force native byteorder + +In [34]: s = pd.Series(newx) +``` + +See [the NumPy documentation on byte order](https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html) for more +details. diff --git a/Python/pandas/user_guide/groupby.md b/Python/pandas/user_guide/groupby.md new file mode 100644 index 00000000..172adab0 --- /dev/null +++ b/Python/pandas/user_guide/groupby.md @@ -0,0 +1,2417 @@ +# Group By: split-apply-combine + +By “group by” we are referring to a process involving one or more of the following +steps: + +- **Splitting** the data into groups based on some criteria. +- **Applying** a function to each group independently. +- **Combining** the results into a data structure. + +Out of these, the split step is the most straightforward. In fact, in many +situations we may wish to split the data set into groups and do something with +those groups. In the apply step, we might wish to do one of the +following: + +- **Aggregation**: compute a summary statistic (or statistics) for each +group. Some examples: + + - Compute group sums or means. + - Compute group sizes / counts. + +- **Transformation**: perform some group-specific computations and return a +like-indexed object. Some examples: + + - Standardize data (zscore) within a group. + - Filling NAs within groups with a value derived from each group. + +- **Filtration**: discard some groups, according to a group-wise computation +that evaluates True or False. Some examples: + + - Discard data that belongs to groups with only a few members. + - Filter out data based on the group sum or mean. + +- Some combination of the above: GroupBy will examine the results of the apply +step and try to return a sensibly combined result if it doesn’t fit into +either of the above two categories. + +Since the set of object instance methods on pandas data structures are generally +rich and expressive, we often simply want to invoke, say, a DataFrame function +on each group. The name GroupBy should be quite familiar to those who have used +a SQL-based tool (or ``itertools``), in which you can write code like: + +``` sql +SELECT Column1, Column2, mean(Column3), sum(Column4) +FROM SomeTable +GROUP BY Column1, Column2 +``` + +We aim to make operations like this natural and easy to express using +pandas. We’ll address each area of GroupBy functionality then provide some +non-trivial examples / use cases. + +See the [cookbook](cookbook.html#cookbook-grouping) for some advanced strategies. + +## Splitting an object into groups + +pandas objects can be split on any of their axes. The abstract definition of +grouping is to provide a mapping of labels to group names. To create a GroupBy +object (more on what the GroupBy object is later), you may do the following: + +``` python +In [1]: df = pd.DataFrame([('bird', 'Falconiformes', 389.0), + ...: ('bird', 'Psittaciformes', 24.0), + ...: ('mammal', 'Carnivora', 80.2), + ...: ('mammal', 'Primates', np.nan), + ...: ('mammal', 'Carnivora', 58)], + ...: index=['falcon', 'parrot', 'lion', 'monkey', 'leopard'], + ...: columns=('class', 'order', 'max_speed')) + ...: + +In [2]: df +Out[2]: + class order max_speed +falcon bird Falconiformes 389.0 +parrot bird Psittaciformes 24.0 +lion mammal Carnivora 80.2 +monkey mammal Primates NaN +leopard mammal Carnivora 58.0 + +# default is axis=0 +In [3]: grouped = df.groupby('class') + +In [4]: grouped = df.groupby('order', axis='columns') + +In [5]: grouped = df.groupby(['class', 'order']) +``` + +The mapping can be specified many different ways: + +- A Python function, to be called on each of the axis labels. +- A list or NumPy array of the same length as the selected axis. +- A dict or ``Series``, providing a ``label -> group name`` mapping. +- For ``DataFrame`` objects, a string indicating a column to be used to group. +Of course ``df.groupby('A')`` is just syntactic sugar for +``df.groupby(df['A'])``, but it makes life simpler. +- For ``DataFrame`` objects, a string indicating an index level to be used to +group. +- A list of any of the above things. + +Collectively we refer to the grouping objects as the **keys**. For example, +consider the following ``DataFrame``: + +::: tip Note + +A string passed to ``groupby`` may refer to either a column or an index level. +If a string matches both a column name and an index level name, a +``ValueError`` will be raised. + +::: + +``` python +In [6]: df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', + ...: 'foo', 'bar', 'foo', 'foo'], + ...: 'B': ['one', 'one', 'two', 'three', + ...: 'two', 'two', 'one', 'three'], + ...: 'C': np.random.randn(8), + ...: 'D': np.random.randn(8)}) + ...: + +In [7]: df +Out[7]: + A B C D +0 foo one 0.469112 -0.861849 +1 bar one -0.282863 -2.104569 +2 foo two -1.509059 -0.494929 +3 bar three -1.135632 1.071804 +4 foo two 1.212112 0.721555 +5 bar two -0.173215 -0.706771 +6 foo one 0.119209 -1.039575 +7 foo three -1.044236 0.271860 +``` + +On a DataFrame, we obtain a GroupBy object by calling [``groupby()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html#pandas.DataFrame.groupby). +We could naturally group by either the ``A`` or ``B`` columns, or both: + +``` python +In [8]: grouped = df.groupby('A') + +In [9]: grouped = df.groupby(['A', 'B']) +``` + +*New in version 0.24.* + +If we also have a MultiIndex on columns ``A`` and ``B``, we can group by all +but the specified columns + +``` python +In [10]: df2 = df.set_index(['A', 'B']) + +In [11]: grouped = df2.groupby(level=df2.index.names.difference(['B'])) + +In [12]: grouped.sum() +Out[12]: + C D +A +bar -1.591710 -1.739537 +foo -0.752861 -1.402938 +``` + +These will split the DataFrame on its index (rows). We could also split by the +columns: + +``` python +In [13]: def get_letter_type(letter): + ....: if letter.lower() in 'aeiou': + ....: return 'vowel' + ....: else: + ....: return 'consonant' + ....: + +In [14]: grouped = df.groupby(get_letter_type, axis=1) +``` + +pandas [``Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index) objects support duplicate values. If a +non-unique index is used as the group key in a groupby operation, all values +for the same index value will be considered to be in one group and thus the +output of aggregation functions will only contain unique index values: + +``` python +In [15]: lst = [1, 2, 3, 1, 2, 3] + +In [16]: s = pd.Series([1, 2, 3, 10, 20, 30], lst) + +In [17]: grouped = s.groupby(level=0) + +In [18]: grouped.first() +Out[18]: +1 1 +2 2 +3 3 +dtype: int64 + +In [19]: grouped.last() +Out[19]: +1 10 +2 20 +3 30 +dtype: int64 + +In [20]: grouped.sum() +Out[20]: +1 11 +2 22 +3 33 +dtype: int64 +``` + +Note that **no splitting occurs** until it’s needed. Creating the GroupBy object +only verifies that you’ve passed a valid mapping. + +::: tip Note + +Many kinds of complicated data manipulations can be expressed in terms of +GroupBy operations (though can’t be guaranteed to be the most +efficient). You can get quite creative with the label mapping functions. + +::: + +### GroupBy sorting + +By default the group keys are sorted during the ``groupby`` operation. You may however pass ``sort=False`` for potential speedups: + +``` python +In [21]: df2 = pd.DataFrame({'X': ['B', 'B', 'A', 'A'], 'Y': [1, 2, 3, 4]}) + +In [22]: df2.groupby(['X']).sum() +Out[22]: + Y +X +A 7 +B 3 + +In [23]: df2.groupby(['X'], sort=False).sum() +Out[23]: + Y +X +B 3 +A 7 +``` + +Note that ``groupby`` will preserve the order in which *observations* are sorted *within* each group. +For example, the groups created by ``groupby()`` below are in the order they appeared in the original ``DataFrame``: + +``` python +In [24]: df3 = pd.DataFrame({'X': ['A', 'B', 'A', 'B'], 'Y': [1, 4, 3, 2]}) + +In [25]: df3.groupby(['X']).get_group('A') +Out[25]: + X Y +0 A 1 +2 A 3 + +In [26]: df3.groupby(['X']).get_group('B') +Out[26]: + X Y +1 B 4 +3 B 2 +``` + +### GroupBy object attributes + +The ``groups`` attribute is a dict whose keys are the computed unique groups +and corresponding values being the axis labels belonging to each group. In the +above example we have: + +``` python +In [27]: df.groupby('A').groups +Out[27]: +{'bar': Int64Index([1, 3, 5], dtype='int64'), + 'foo': Int64Index([0, 2, 4, 6, 7], dtype='int64')} + +In [28]: df.groupby(get_letter_type, axis=1).groups +Out[28]: +{'consonant': Index(['B', 'C', 'D'], dtype='object'), + 'vowel': Index(['A'], dtype='object')} +``` + +Calling the standard Python ``len`` function on the GroupBy object just returns +the length of the ``groups`` dict, so it is largely just a convenience: + +``` python +In [29]: grouped = df.groupby(['A', 'B']) + +In [30]: grouped.groups +Out[30]: +{('bar', 'one'): Int64Index([1], dtype='int64'), + ('bar', 'three'): Int64Index([3], dtype='int64'), + ('bar', 'two'): Int64Index([5], dtype='int64'), + ('foo', 'one'): Int64Index([0, 6], dtype='int64'), + ('foo', 'three'): Int64Index([7], dtype='int64'), + ('foo', 'two'): Int64Index([2, 4], dtype='int64')} + +In [31]: len(grouped) +Out[31]: 6 +``` + +``GroupBy`` will tab complete column names (and other attributes): + +``` python +In [32]: df +Out[32]: + height weight gender +2000-01-01 42.849980 157.500553 male +2000-01-02 49.607315 177.340407 male +2000-01-03 56.293531 171.524640 male +2000-01-04 48.421077 144.251986 female +2000-01-05 46.556882 152.526206 male +2000-01-06 68.448851 168.272968 female +2000-01-07 70.757698 136.431469 male +2000-01-08 58.909500 176.499753 female +2000-01-09 76.435631 174.094104 female +2000-01-10 45.306120 177.540920 male + +In [33]: gb = df.groupby('gender') +``` + +``` python +In [34]: gb. # noqa: E225, E999 +gb.agg gb.boxplot gb.cummin gb.describe gb.filter gb.get_group gb.height gb.last gb.median gb.ngroups gb.plot gb.rank gb.std gb.transform +gb.aggregate gb.count gb.cumprod gb.dtype gb.first gb.groups gb.hist gb.max gb.min gb.nth gb.prod gb.resample gb.sum gb.var +gb.apply gb.cummax gb.cumsum gb.fillna gb.gender gb.head gb.indices gb.mean gb.name gb.ohlc gb.quantile gb.size gb.tail gb.weight +``` + +### GroupBy with MultiIndex + +With [hierarchically-indexed data](advanced.html#advanced-hierarchical), it’s quite +natural to group by one of the levels of the hierarchy. + +Let’s create a Series with a two-level ``MultiIndex``. + +``` python +In [35]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ....: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + ....: + +In [36]: index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second']) + +In [37]: s = pd.Series(np.random.randn(8), index=index) + +In [38]: s +Out[38]: +first second +bar one -0.919854 + two -0.042379 +baz one 1.247642 + two -0.009920 +foo one 0.290213 + two 0.495767 +qux one 0.362949 + two 1.548106 +dtype: float64 +``` + +We can then group by one of the levels in ``s``. + +``` python +In [39]: grouped = s.groupby(level=0) + +In [40]: grouped.sum() +Out[40]: +first +bar -0.962232 +baz 1.237723 +foo 0.785980 +qux 1.911055 +dtype: float64 +``` + +If the MultiIndex has names specified, these can be passed instead of the level +number: + +``` python +In [41]: s.groupby(level='second').sum() +Out[41]: +second +one 0.980950 +two 1.991575 +dtype: float64 +``` + +The aggregation functions such as ``sum`` will take the level parameter +directly. Additionally, the resulting index will be named according to the +chosen level: + +``` python +In [42]: s.sum(level='second') +Out[42]: +second +one 0.980950 +two 1.991575 +dtype: float64 +``` + +Grouping with multiple levels is supported. + +``` python +In [43]: s +Out[43]: +first second third +bar doo one -1.131345 + two -0.089329 +baz bee one 0.337863 + two -0.945867 +foo bop one -0.932132 + two 1.956030 +qux bop one 0.017587 + two -0.016692 +dtype: float64 + +In [44]: s.groupby(level=['first', 'second']).sum() +Out[44]: +first second +bar doo -1.220674 +baz bee -0.608004 +foo bop 1.023898 +qux bop 0.000895 +dtype: float64 +``` + +*New in version 0.20.* + +Index level names may be supplied as keys. + +``` python +In [45]: s.groupby(['first', 'second']).sum() +Out[45]: +first second +bar doo -1.220674 +baz bee -0.608004 +foo bop 1.023898 +qux bop 0.000895 +dtype: float64 +``` + +More on the ``sum`` function and aggregation later. + +### Grouping DataFrame with Index levels and columns + +A DataFrame may be grouped by a combination of columns and index levels by +specifying the column names as strings and the index levels as ``pd.Grouper`` +objects. + +``` python +In [46]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ....: ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + ....: + +In [47]: index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second']) + +In [48]: df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3], + ....: 'B': np.arange(8)}, + ....: index=index) + ....: + +In [49]: df +Out[49]: + A B +first second +bar one 1 0 + two 1 1 +baz one 1 2 + two 1 3 +foo one 2 4 + two 2 5 +qux one 3 6 + two 3 7 +``` + +The following example groups ``df`` by the ``second`` index level and +the ``A`` column. + +``` python +In [50]: df.groupby([pd.Grouper(level=1), 'A']).sum() +Out[50]: + B +second A +one 1 2 + 2 4 + 3 6 +two 1 4 + 2 5 + 3 7 +``` + +Index levels may also be specified by name. + +``` python +In [51]: df.groupby([pd.Grouper(level='second'), 'A']).sum() +Out[51]: + B +second A +one 1 2 + 2 4 + 3 6 +two 1 4 + 2 5 + 3 7 +``` + +*New in version 0.20.* + +Index level names may be specified as keys directly to ``groupby``. + +``` python +In [52]: df.groupby(['second', 'A']).sum() +Out[52]: + B +second A +one 1 2 + 2 4 + 3 6 +two 1 4 + 2 5 + 3 7 +``` + +### DataFrame column selection in GroupBy + +Once you have created the GroupBy object from a DataFrame, you might want to do +something different for each of the columns. Thus, using ``[]`` similar to +getting a column from a DataFrame, you can do: + +``` python +In [53]: grouped = df.groupby(['A']) + +In [54]: grouped_C = grouped['C'] + +In [55]: grouped_D = grouped['D'] +``` + +This is mainly syntactic sugar for the alternative and much more verbose: + +``` python +In [56]: df['C'].groupby(df['A']) +Out[56]: +``` + +Additionally this method avoids recomputing the internal grouping information +derived from the passed key. + +## Iterating through groups + +With the GroupBy object in hand, iterating through the grouped data is very +natural and functions similarly to [``itertools.groupby()``](https://docs.python.org/3/library/itertools.html#itertools.groupby): + +``` python +In [57]: grouped = df.groupby('A') + +In [58]: for name, group in grouped: + ....: print(name) + ....: print(group) + ....: +bar + A B C D +1 bar one 0.254161 1.511763 +3 bar three 0.215897 -0.990582 +5 bar two -0.077118 1.211526 +foo + A B C D +0 foo one -0.575247 1.346061 +2 foo two -1.143704 1.627081 +4 foo two 1.193555 -0.441652 +6 foo one -0.408530 0.268520 +7 foo three -0.862495 0.024580 +``` + +In the case of grouping by multiple keys, the group name will be a tuple: + +``` python +In [59]: for name, group in df.groupby(['A', 'B']): + ....: print(name) + ....: print(group) + ....: +('bar', 'one') + A B C D +1 bar one 0.254161 1.511763 +('bar', 'three') + A B C D +3 bar three 0.215897 -0.990582 +('bar', 'two') + A B C D +5 bar two -0.077118 1.211526 +('foo', 'one') + A B C D +0 foo one -0.575247 1.346061 +6 foo one -0.408530 0.268520 +('foo', 'three') + A B C D +7 foo three -0.862495 0.02458 +('foo', 'two') + A B C D +2 foo two -1.143704 1.627081 +4 foo two 1.193555 -0.441652 +``` + +See [Iterating through groups](timeseries.html#timeseries-iterating-label). + +## Selecting a group + +A single group can be selected using +``get_group()``: + +``` python +In [60]: grouped.get_group('bar') +Out[60]: + A B C D +1 bar one 0.254161 1.511763 +3 bar three 0.215897 -0.990582 +5 bar two -0.077118 1.211526 +``` + +Or for an object grouped on multiple columns: + +``` python +In [61]: df.groupby(['A', 'B']).get_group(('bar', 'one')) +Out[61]: + A B C D +1 bar one 0.254161 1.511763 +``` + +## Aggregation + +Once the GroupBy object has been created, several methods are available to +perform a computation on the grouped data. These operations are similar to the +[aggregating API](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-aggregate), [window functions API](computation.html#stats-aggregate), +and [resample API](timeseries.html#timeseries-aggregate). + +An obvious one is aggregation via the +``aggregate()`` or equivalently +``agg()`` method: + +``` python +In [62]: grouped = df.groupby('A') + +In [63]: grouped.aggregate(np.sum) +Out[63]: + C D +A +bar 0.392940 1.732707 +foo -1.796421 2.824590 + +In [64]: grouped = df.groupby(['A', 'B']) + +In [65]: grouped.aggregate(np.sum) +Out[65]: + C D +A B +bar one 0.254161 1.511763 + three 0.215897 -0.990582 + two -0.077118 1.211526 +foo one -0.983776 1.614581 + three -0.862495 0.024580 + two 0.049851 1.185429 +``` + +As you can see, the result of the aggregation will have the group names as the +new index along the grouped axis. In the case of multiple keys, the result is a +[MultiIndex](advanced.html#advanced-hierarchical) by default, though this can be +changed by using the ``as_index`` option: + +``` python +In [66]: grouped = df.groupby(['A', 'B'], as_index=False) + +In [67]: grouped.aggregate(np.sum) +Out[67]: + A B C D +0 bar one 0.254161 1.511763 +1 bar three 0.215897 -0.990582 +2 bar two -0.077118 1.211526 +3 foo one -0.983776 1.614581 +4 foo three -0.862495 0.024580 +5 foo two 0.049851 1.185429 + +In [68]: df.groupby('A', as_index=False).sum() +Out[68]: + A C D +0 bar 0.392940 1.732707 +1 foo -1.796421 2.824590 +``` + +Note that you could use the ``reset_index`` DataFrame function to achieve the +same result as the column names are stored in the resulting ``MultiIndex``: + +``` python +In [69]: df.groupby(['A', 'B']).sum().reset_index() +Out[69]: + A B C D +0 bar one 0.254161 1.511763 +1 bar three 0.215897 -0.990582 +2 bar two -0.077118 1.211526 +3 foo one -0.983776 1.614581 +4 foo three -0.862495 0.024580 +5 foo two 0.049851 1.185429 +``` + +Another simple aggregation example is to compute the size of each group. +This is included in GroupBy as the ``size`` method. It returns a Series whose +index are the group names and whose values are the sizes of each group. + +``` python +In [70]: grouped.size() +Out[70]: +A B +bar one 1 + three 1 + two 1 +foo one 2 + three 1 + two 2 +dtype: int64 +``` + +``` python +In [71]: grouped.describe() +Out[71]: + C D + count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max +0 1.0 0.254161 NaN 0.254161 0.254161 0.254161 0.254161 0.254161 1.0 1.511763 NaN 1.511763 1.511763 1.511763 1.511763 1.511763 +1 1.0 0.215897 NaN 0.215897 0.215897 0.215897 0.215897 0.215897 1.0 -0.990582 NaN -0.990582 -0.990582 -0.990582 -0.990582 -0.990582 +2 1.0 -0.077118 NaN -0.077118 -0.077118 -0.077118 -0.077118 -0.077118 1.0 1.211526 NaN 1.211526 1.211526 1.211526 1.211526 1.211526 +3 2.0 -0.491888 0.117887 -0.575247 -0.533567 -0.491888 -0.450209 -0.408530 2.0 0.807291 0.761937 0.268520 0.537905 0.807291 1.076676 1.346061 +4 1.0 -0.862495 NaN -0.862495 -0.862495 -0.862495 -0.862495 -0.862495 1.0 0.024580 NaN 0.024580 0.024580 0.024580 0.024580 0.024580 +5 2.0 0.024925 1.652692 -1.143704 -0.559389 0.024925 0.609240 1.193555 2.0 0.592714 1.462816 -0.441652 0.075531 0.592714 1.109898 1.627081 +``` + +::: tip Note + +Aggregation functions **will not** return the groups that you are aggregating over +if they are named *columns*, when ``as_index=True``, the default. The grouped columns will +be the **indices** of the returned object. + +Passing ``as_index=False`` **will** return the groups that you are aggregating over, if they are +named *columns*. + +::: + +Aggregating functions are the ones that reduce the dimension of the returned objects. +Some common aggregating functions are tabulated below: + +Function | Description +---|--- +mean() | Compute mean of groups +sum() | Compute sum of group values +size() | Compute group sizes +count() | Compute count of group +std() | Standard deviation of groups +var() | Compute variance of groups +sem() | Standard error of the mean of groups +describe() | Generates descriptive statistics +first() | Compute first of group values +last() | Compute last of group values +nth() | Take nth value, or a subset if n is a list +min() | Compute min of group values +max() | Compute max of group values + +The aggregating functions above will exclude NA values. Any function which +reduces a [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) to a scalar value is an aggregation function and will work, +a trivial example is ``df.groupby('A').agg(lambda ser: 1)``. Note that +``nth()`` can act as a reducer *or* a +filter, see [here](#groupby-nth). + +### Applying multiple functions at once + +With grouped ``Series`` you can also pass a list or dict of functions to do +aggregation with, outputting a DataFrame: + +``` python +In [72]: grouped = df.groupby('A') + +In [73]: grouped['C'].agg([np.sum, np.mean, np.std]) +Out[73]: + sum mean std +A +bar 0.392940 0.130980 0.181231 +foo -1.796421 -0.359284 0.912265 +``` + +On a grouped ``DataFrame``, you can pass a list of functions to apply to each +column, which produces an aggregated result with a hierarchical index: + +``` python +In [74]: grouped.agg([np.sum, np.mean, np.std]) +Out[74]: + C D + sum mean std sum mean std +A +bar 0.392940 0.130980 0.181231 1.732707 0.577569 1.366330 +foo -1.796421 -0.359284 0.912265 2.824590 0.564918 0.884785 +``` + +The resulting aggregations are named for the functions themselves. If you +need to rename, then you can add in a chained operation for a ``Series`` like this: + +``` python +In [75]: (grouped['C'].agg([np.sum, np.mean, np.std]) + ....: .rename(columns={'sum': 'foo', + ....: 'mean': 'bar', + ....: 'std': 'baz'})) + ....: +Out[75]: + foo bar baz +A +bar 0.392940 0.130980 0.181231 +foo -1.796421 -0.359284 0.912265 +``` + +For a grouped ``DataFrame``, you can rename in a similar manner: + +``` python +In [76]: (grouped.agg([np.sum, np.mean, np.std]) + ....: .rename(columns={'sum': 'foo', + ....: 'mean': 'bar', + ....: 'std': 'baz'})) + ....: +Out[76]: + C D + foo bar baz foo bar baz +A +bar 0.392940 0.130980 0.181231 1.732707 0.577569 1.366330 +foo -1.796421 -0.359284 0.912265 2.824590 0.564918 0.884785 +``` + +::: tip Note + +In general, the output column names should be unique. You can’t apply +the same function (or two functions with the same name) to the same +column. + +``` python +In [77]: grouped['C'].agg(['sum', 'sum']) +--------------------------------------------------------------------------- +SpecificationError Traceback (most recent call last) + in +----> 1 grouped['C'].agg(['sum', 'sum']) + +/pandas/pandas/core/groupby/generic.py in aggregate(self, func_or_funcs, *args, **kwargs) + 849 # but not the class list / tuple itself. + 850 func_or_funcs = _maybe_mangle_lambdas(func_or_funcs) +--> 851 ret = self._aggregate_multiple_funcs(func_or_funcs, (_level or 0) + 1) + 852 if relabeling: + 853 ret.columns = columns + +/pandas/pandas/core/groupby/generic.py in _aggregate_multiple_funcs(self, arg, _level) + 919 raise SpecificationError( + 920 "Function names must be unique, found multiple named " +--> 921 "{}".format(name) + 922 ) + 923 + +SpecificationError: Function names must be unique, found multiple named sum +``` + +Pandas *does* allow you to provide multiple lambdas. In this case, pandas +will mangle the name of the (nameless) lambda functions, appending ``_`` +to each subsequent lambda. + +``` python +In [78]: grouped['C'].agg([lambda x: x.max() - x.min(), + ....: lambda x: x.median() - x.mean()]) + ....: +Out[78]: + +A +bar 0.331279 0.084917 +foo 2.337259 -0.215962 +``` + +::: + +### Named aggregation + +*New in version 0.25.0.* + +To support column-specific aggregation *with control over the output column names*, pandas +accepts the special syntax in ``GroupBy.agg()``, known as “named aggregation”, where + +- The keywords are the *output* column names +- The values are tuples whose first element is the column to select +and the second element is the aggregation to apply to that column. Pandas +provides the ``pandas.NamedAgg`` namedtuple with the fields ``['column', 'aggfunc']`` +to make it clearer what the arguments are. As usual, the aggregation can +be a callable or a string alias. + +``` python +In [79]: animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'], + ....: 'height': [9.1, 6.0, 9.5, 34.0], + ....: 'weight': [7.9, 7.5, 9.9, 198.0]}) + ....: + +In [80]: animals +Out[80]: + kind height weight +0 cat 9.1 7.9 +1 dog 6.0 7.5 +2 cat 9.5 9.9 +3 dog 34.0 198.0 + +In [81]: animals.groupby("kind").agg( + ....: min_height=pd.NamedAgg(column='height', aggfunc='min'), + ....: max_height=pd.NamedAgg(column='height', aggfunc='max'), + ....: average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean), + ....: ) + ....: +Out[81]: + min_height max_height average_weight +kind +cat 9.1 9.5 8.90 +dog 6.0 34.0 102.75 +``` + +``pandas.NamedAgg`` is just a ``namedtuple``. Plain tuples are allowed as well. + +``` python +In [82]: animals.groupby("kind").agg( + ....: min_height=('height', 'min'), + ....: max_height=('height', 'max'), + ....: average_weight=('weight', np.mean), + ....: ) + ....: +Out[82]: + min_height max_height average_weight +kind +cat 9.1 9.5 8.90 +dog 6.0 34.0 102.75 +``` + +If your desired output column names are not valid python keywords, construct a dictionary +and unpack the keyword arguments + +``` python +In [83]: animals.groupby("kind").agg(**{ + ....: 'total weight': pd.NamedAgg(column='weight', aggfunc=sum), + ....: }) + ....: +Out[83]: + total weight +kind +cat 17.8 +dog 205.5 +``` + +Additional keyword arguments are not passed through to the aggregation functions. Only pairs +of ``(column, aggfunc)`` should be passed as ``**kwargs``. If your aggregation functions +requires additional arguments, partially apply them with ``functools.partial()``. + +::: tip Note + +For Python 3.5 and earlier, the order of ``**kwargs`` in a functions was not +preserved. This means that the output column ordering would not be +consistent. To ensure consistent ordering, the keys (and so output columns) +will always be sorted for Python 3.5. + +::: + +Named aggregation is also valid for Series groupby aggregations. In this case there’s +no column selection, so the values are just the functions. + +``` python +In [84]: animals.groupby("kind").height.agg( + ....: min_height='min', + ....: max_height='max', + ....: ) + ....: +Out[84]: + min_height max_height +kind +cat 9.1 9.5 +dog 6.0 34.0 +``` + +### Applying different functions to DataFrame columns + +By passing a dict to ``aggregate`` you can apply a different aggregation to the +columns of a DataFrame: + +``` python +In [85]: grouped.agg({'C': np.sum, + ....: 'D': lambda x: np.std(x, ddof=1)}) + ....: +Out[85]: + C D +A +bar 0.392940 1.366330 +foo -1.796421 0.884785 +``` + +The function names can also be strings. In order for a string to be valid it +must be either implemented on GroupBy or available via [dispatching](#groupby-dispatch): + +``` python +In [86]: grouped.agg({'C': 'sum', 'D': 'std'}) +Out[86]: + C D +A +bar 0.392940 1.366330 +foo -1.796421 0.884785 +``` + +### Cython-optimized aggregation functions + +Some common aggregations, currently only ``sum``, ``mean``, ``std``, and ``sem``, have +optimized Cython implementations: + +``` python +In [87]: df.groupby('A').sum() +Out[87]: + C D +A +bar 0.392940 1.732707 +foo -1.796421 2.824590 + +In [88]: df.groupby(['A', 'B']).mean() +Out[88]: + C D +A B +bar one 0.254161 1.511763 + three 0.215897 -0.990582 + two -0.077118 1.211526 +foo one -0.491888 0.807291 + three -0.862495 0.024580 + two 0.024925 0.592714 +``` + +Of course ``sum`` and ``mean`` are implemented on pandas objects, so the above +code would work even without the special versions via dispatching (see below). + +## Transformation + +The ``transform`` method returns an object that is indexed the same (same size) +as the one being grouped. The transform function must: + +- Return a result that is either the same size as the group chunk or +broadcastable to the size of the group chunk (e.g., a scalar, +``grouped.transform(lambda x: x.iloc[-1])``). +- Operate column-by-column on the group chunk. The transform is applied to +the first group chunk using chunk.apply. +- Not perform in-place operations on the group chunk. Group chunks should +be treated as immutable, and changes to a group chunk may produce unexpected +results. For example, when using ``fillna``, ``inplace`` must be ``False`` +(``grouped.transform(lambda x: x.fillna(inplace=False))``). +- (Optionally) operates on the entire group chunk. If this is supported, a +fast path is used starting from the *second* chunk. + +For example, suppose we wished to standardize the data within each group: + +``` python +In [89]: index = pd.date_range('10/1/1999', periods=1100) + +In [90]: ts = pd.Series(np.random.normal(0.5, 2, 1100), index) + +In [91]: ts = ts.rolling(window=100, min_periods=100).mean().dropna() + +In [92]: ts.head() +Out[92]: +2000-01-08 0.779333 +2000-01-09 0.778852 +2000-01-10 0.786476 +2000-01-11 0.782797 +2000-01-12 0.798110 +Freq: D, dtype: float64 + +In [93]: ts.tail() +Out[93]: +2002-09-30 0.660294 +2002-10-01 0.631095 +2002-10-02 0.673601 +2002-10-03 0.709213 +2002-10-04 0.719369 +Freq: D, dtype: float64 + +In [94]: transformed = (ts.groupby(lambda x: x.year) + ....: .transform(lambda x: (x - x.mean()) / x.std())) + ....: +``` + +We would expect the result to now have mean 0 and standard deviation 1 within +each group, which we can easily check: + +``` python +# Original Data +In [95]: grouped = ts.groupby(lambda x: x.year) + +In [96]: grouped.mean() +Out[96]: +2000 0.442441 +2001 0.526246 +2002 0.459365 +dtype: float64 + +In [97]: grouped.std() +Out[97]: +2000 0.131752 +2001 0.210945 +2002 0.128753 +dtype: float64 + +# Transformed Data +In [98]: grouped_trans = transformed.groupby(lambda x: x.year) + +In [99]: grouped_trans.mean() +Out[99]: +2000 1.168208e-15 +2001 1.454544e-15 +2002 1.726657e-15 +dtype: float64 + +In [100]: grouped_trans.std() +Out[100]: +2000 1.0 +2001 1.0 +2002 1.0 +dtype: float64 +``` + +We can also visually compare the original and transformed data sets. + +``` python +In [101]: compare = pd.DataFrame({'Original': ts, 'Transformed': transformed}) + +In [102]: compare.plot() +Out[102]: +``` + +![groupby_transform_plot](https://static.pypandas.cn/public/static/images/groupby_transform_plot.png) + +Transformation functions that have lower dimension outputs are broadcast to +match the shape of the input array. + +``` python +In [103]: ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min()) +Out[103]: +2000-01-08 0.623893 +2000-01-09 0.623893 +2000-01-10 0.623893 +2000-01-11 0.623893 +2000-01-12 0.623893 + ... +2002-09-30 0.558275 +2002-10-01 0.558275 +2002-10-02 0.558275 +2002-10-03 0.558275 +2002-10-04 0.558275 +Freq: D, Length: 1001, dtype: float64 +``` + +Alternatively, the built-in methods could be used to produce the same outputs. + +``` python +In [104]: max = ts.groupby(lambda x: x.year).transform('max') + +In [105]: min = ts.groupby(lambda x: x.year).transform('min') + +In [106]: max - min +Out[106]: +2000-01-08 0.623893 +2000-01-09 0.623893 +2000-01-10 0.623893 +2000-01-11 0.623893 +2000-01-12 0.623893 + ... +2002-09-30 0.558275 +2002-10-01 0.558275 +2002-10-02 0.558275 +2002-10-03 0.558275 +2002-10-04 0.558275 +Freq: D, Length: 1001, dtype: float64 +``` + +Another common data transform is to replace missing data with the group mean. + +``` python +In [107]: data_df +Out[107]: + A B C +0 1.539708 -1.166480 0.533026 +1 1.302092 -0.505754 NaN +2 -0.371983 1.104803 -0.651520 +3 -1.309622 1.118697 -1.161657 +4 -1.924296 0.396437 0.812436 +.. ... ... ... +995 -0.093110 0.683847 -0.774753 +996 -0.185043 1.438572 NaN +997 -0.394469 -0.642343 0.011374 +998 -1.174126 1.857148 NaN +999 0.234564 0.517098 0.393534 + +[1000 rows x 3 columns] + +In [108]: countries = np.array(['US', 'UK', 'GR', 'JP']) + +In [109]: key = countries[np.random.randint(0, 4, 1000)] + +In [110]: grouped = data_df.groupby(key) + +# Non-NA count in each group +In [111]: grouped.count() +Out[111]: + A B C +GR 209 217 189 +JP 240 255 217 +UK 216 231 193 +US 239 250 217 + +In [112]: transformed = grouped.transform(lambda x: x.fillna(x.mean())) +``` + +We can verify that the group means have not changed in the transformed data +and that the transformed data contains no NAs. + +``` python +In [113]: grouped_trans = transformed.groupby(key) + +In [114]: grouped.mean() # original group means +Out[114]: + A B C +GR -0.098371 -0.015420 0.068053 +JP 0.069025 0.023100 -0.077324 +UK 0.034069 -0.052580 -0.116525 +US 0.058664 -0.020399 0.028603 + +In [115]: grouped_trans.mean() # transformation did not change group means +Out[115]: + A B C +GR -0.098371 -0.015420 0.068053 +JP 0.069025 0.023100 -0.077324 +UK 0.034069 -0.052580 -0.116525 +US 0.058664 -0.020399 0.028603 + +In [116]: grouped.count() # original has some missing data points +Out[116]: + A B C +GR 209 217 189 +JP 240 255 217 +UK 216 231 193 +US 239 250 217 + +In [117]: grouped_trans.count() # counts after transformation +Out[117]: + A B C +GR 228 228 228 +JP 267 267 267 +UK 247 247 247 +US 258 258 258 + +In [118]: grouped_trans.size() # Verify non-NA count equals group size +Out[118]: +GR 228 +JP 267 +UK 247 +US 258 +dtype: int64 +``` + +::: tip Note + +Some functions will automatically transform the input when applied to a +GroupBy object, but returning an object of the same shape as the original. +Passing ``as_index=False`` will not affect these transformation methods. + +For example: ``fillna, ffill, bfill, shift.``. + +``` python +In [119]: grouped.ffill() +Out[119]: + A B C +0 1.539708 -1.166480 0.533026 +1 1.302092 -0.505754 0.533026 +2 -0.371983 1.104803 -0.651520 +3 -1.309622 1.118697 -1.161657 +4 -1.924296 0.396437 0.812436 +.. ... ... ... +995 -0.093110 0.683847 -0.774753 +996 -0.185043 1.438572 -0.774753 +997 -0.394469 -0.642343 0.011374 +998 -1.174126 1.857148 -0.774753 +999 0.234564 0.517098 0.393534 + +[1000 rows x 3 columns] +``` + +::: + +### New syntax to window and resample operations + +*New in version 0.18.1.* + +Working with the resample, expanding or rolling operations on the groupby +level used to require the application of helper functions. However, +now it is possible to use ``resample()``, ``expanding()`` and +``rolling()`` as methods on groupbys. + +The example below will apply the ``rolling()`` method on the samples of +the column B based on the groups of column A. + +``` python +In [120]: df_re = pd.DataFrame({'A': [1] * 10 + [5] * 10, + .....: 'B': np.arange(20)}) + .....: + +In [121]: df_re +Out[121]: + A B +0 1 0 +1 1 1 +2 1 2 +3 1 3 +4 1 4 +.. .. .. +15 5 15 +16 5 16 +17 5 17 +18 5 18 +19 5 19 + +[20 rows x 2 columns] + +In [122]: df_re.groupby('A').rolling(4).B.mean() +Out[122]: +A +1 0 NaN + 1 NaN + 2 NaN + 3 1.5 + 4 2.5 + ... +5 15 13.5 + 16 14.5 + 17 15.5 + 18 16.5 + 19 17.5 +Name: B, Length: 20, dtype: float64 +``` + +The ``expanding()`` method will accumulate a given operation +(``sum()`` in the example) for all the members of each particular +group. + +``` python +In [123]: df_re.groupby('A').expanding().sum() +Out[123]: + A B +A +1 0 1.0 0.0 + 1 2.0 1.0 + 2 3.0 3.0 + 3 4.0 6.0 + 4 5.0 10.0 +... ... ... +5 15 30.0 75.0 + 16 35.0 91.0 + 17 40.0 108.0 + 18 45.0 126.0 + 19 50.0 145.0 + +[20 rows x 2 columns] +``` + +Suppose you want to use the ``resample()`` method to get a daily +frequency in each group of your dataframe and wish to complete the +missing values with the ``ffill()`` method. + +``` python +In [124]: df_re = pd.DataFrame({'date': pd.date_range(start='2016-01-01', periods=4, + .....: freq='W'), + .....: 'group': [1, 1, 2, 2], + .....: 'val': [5, 6, 7, 8]}).set_index('date') + .....: + +In [125]: df_re +Out[125]: + group val +date +2016-01-03 1 5 +2016-01-10 1 6 +2016-01-17 2 7 +2016-01-24 2 8 + +In [126]: df_re.groupby('group').resample('1D').ffill() +Out[126]: + group val +group date +1 2016-01-03 1 5 + 2016-01-04 1 5 + 2016-01-05 1 5 + 2016-01-06 1 5 + 2016-01-07 1 5 +... ... ... +2 2016-01-20 2 7 + 2016-01-21 2 7 + 2016-01-22 2 7 + 2016-01-23 2 7 + 2016-01-24 2 8 + +[16 rows x 2 columns] +``` + +## Filtration + +The ``filter`` method returns a subset of the original object. Suppose we +want to take only elements that belong to groups with a group sum greater +than 2. + +``` python +In [127]: sf = pd.Series([1, 1, 2, 3, 3, 3]) + +In [128]: sf.groupby(sf).filter(lambda x: x.sum() > 2) +Out[128]: +3 3 +4 3 +5 3 +dtype: int64 +``` + +The argument of ``filter`` must be a function that, applied to the group as a +whole, returns ``True`` or ``False``. + +Another useful operation is filtering out elements that belong to groups +with only a couple members. + +``` python +In [129]: dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc')}) + +In [130]: dff.groupby('B').filter(lambda x: len(x) > 2) +Out[130]: + A B +2 2 b +3 3 b +4 4 b +5 5 b +``` + +Alternatively, instead of dropping the offending groups, we can return a +like-indexed objects where the groups that do not pass the filter are filled +with NaNs. + +``` python +In [131]: dff.groupby('B').filter(lambda x: len(x) > 2, dropna=False) +Out[131]: + A B +0 NaN NaN +1 NaN NaN +2 2.0 b +3 3.0 b +4 4.0 b +5 5.0 b +6 NaN NaN +7 NaN NaN +``` + +For DataFrames with multiple columns, filters should explicitly specify a column as the filter criterion. + +``` python +In [132]: dff['C'] = np.arange(8) + +In [133]: dff.groupby('B').filter(lambda x: len(x['C']) > 2) +Out[133]: + A B C +2 2 b 2 +3 3 b 3 +4 4 b 4 +5 5 b 5 +``` + +::: tip Note + +Some functions when applied to a groupby object will act as a **filter** on the input, returning +a reduced shape of the original (and potentially eliminating groups), but with the index unchanged. +Passing ``as_index=False`` will not affect these transformation methods. + +For example: ``head, tail``. + +``` python +In [134]: dff.groupby('B').head(2) +Out[134]: + A B C +0 0 a 0 +1 1 a 1 +2 2 b 2 +3 3 b 3 +6 6 c 6 +7 7 c 7 +``` + +::: + +## Dispatching to instance methods + +When doing an aggregation or transformation, you might just want to call an +instance method on each data group. This is pretty easy to do by passing lambda +functions: + +``` python +In [135]: grouped = df.groupby('A') + +In [136]: grouped.agg(lambda x: x.std()) +Out[136]: + C D +A +bar 0.181231 1.366330 +foo 0.912265 0.884785 +``` + +But, it’s rather verbose and can be untidy if you need to pass additional +arguments. Using a bit of metaprogramming cleverness, GroupBy now has the +ability to “dispatch” method calls to the groups: + +``` python +In [137]: grouped.std() +Out[137]: + C D +A +bar 0.181231 1.366330 +foo 0.912265 0.884785 +``` + +What is actually happening here is that a function wrapper is being +generated. When invoked, it takes any passed arguments and invokes the function +with any arguments on each group (in the above example, the ``std`` +function). The results are then combined together much in the style of ``agg`` +and ``transform`` (it actually uses ``apply`` to infer the gluing, documented +next). This enables some operations to be carried out rather succinctly: + +``` python +In [138]: tsdf = pd.DataFrame(np.random.randn(1000, 3), + .....: index=pd.date_range('1/1/2000', periods=1000), + .....: columns=['A', 'B', 'C']) + .....: + +In [139]: tsdf.iloc[::2] = np.nan + +In [140]: grouped = tsdf.groupby(lambda x: x.year) + +In [141]: grouped.fillna(method='pad') +Out[141]: + A B C +2000-01-01 NaN NaN NaN +2000-01-02 -0.353501 -0.080957 -0.876864 +2000-01-03 -0.353501 -0.080957 -0.876864 +2000-01-04 0.050976 0.044273 -0.559849 +2000-01-05 0.050976 0.044273 -0.559849 +... ... ... ... +2002-09-22 0.005011 0.053897 -1.026922 +2002-09-23 0.005011 0.053897 -1.026922 +2002-09-24 -0.456542 -1.849051 1.559856 +2002-09-25 -0.456542 -1.849051 1.559856 +2002-09-26 1.123162 0.354660 1.128135 + +[1000 rows x 3 columns] +``` + +In this example, we chopped the collection of time series into yearly chunks +then independently called [fillna](missing_data.html#missing-data-fillna) on the +groups. + +The ``nlargest`` and ``nsmallest`` methods work on ``Series`` style groupbys: + +``` python +In [142]: s = pd.Series([9, 8, 7, 5, 19, 1, 4.2, 3.3]) + +In [143]: g = pd.Series(list('abababab')) + +In [144]: gb = s.groupby(g) + +In [145]: gb.nlargest(3) +Out[145]: +a 4 19.0 + 0 9.0 + 2 7.0 +b 1 8.0 + 3 5.0 + 7 3.3 +dtype: float64 + +In [146]: gb.nsmallest(3) +Out[146]: +a 6 4.2 + 2 7.0 + 0 9.0 +b 5 1.0 + 7 3.3 + 3 5.0 +dtype: float64 +``` + +## Flexible ``apply`` + +Some operations on the grouped data might not fit into either the aggregate or +transform categories. Or, you may simply want GroupBy to infer how to combine +the results. For these, use the ``apply`` function, which can be substituted +for both ``aggregate`` and ``transform`` in many standard use cases. However, +``apply`` can handle some exceptional use cases, for example: + +``` python +In [147]: df +Out[147]: + A B C D +0 foo one -0.575247 1.346061 +1 bar one 0.254161 1.511763 +2 foo two -1.143704 1.627081 +3 bar three 0.215897 -0.990582 +4 foo two 1.193555 -0.441652 +5 bar two -0.077118 1.211526 +6 foo one -0.408530 0.268520 +7 foo three -0.862495 0.024580 + +In [148]: grouped = df.groupby('A') + +# could also just call .describe() +In [149]: grouped['C'].apply(lambda x: x.describe()) +Out[149]: +A +bar count 3.000000 + mean 0.130980 + std 0.181231 + min -0.077118 + 25% 0.069390 + ... +foo min -1.143704 + 25% -0.862495 + 50% -0.575247 + 75% -0.408530 + max 1.193555 +Name: C, Length: 16, dtype: float64 +``` + +The dimension of the returned result can also change: + +``` python +In [150]: grouped = df.groupby('A')['C'] + +In [151]: def f(group): + .....: return pd.DataFrame({'original': group, + .....: 'demeaned': group - group.mean()}) + .....: + +In [152]: grouped.apply(f) +Out[152]: + original demeaned +0 -0.575247 -0.215962 +1 0.254161 0.123181 +2 -1.143704 -0.784420 +3 0.215897 0.084917 +4 1.193555 1.552839 +5 -0.077118 -0.208098 +6 -0.408530 -0.049245 +7 -0.862495 -0.503211 +``` + +``apply`` on a Series can operate on a returned value from the applied function, +that is itself a series, and possibly upcast the result to a DataFrame: + +``` python +In [153]: def f(x): + .....: return pd.Series([x, x ** 2], index=['x', 'x^2']) + .....: + +In [154]: s = pd.Series(np.random.rand(5)) + +In [155]: s +Out[155]: +0 0.321438 +1 0.493496 +2 0.139505 +3 0.910103 +4 0.194158 +dtype: float64 + +In [156]: s.apply(f) +Out[156]: + x x^2 +0 0.321438 0.103323 +1 0.493496 0.243538 +2 0.139505 0.019462 +3 0.910103 0.828287 +4 0.194158 0.037697 +``` + +::: tip Note + +``apply`` can act as a reducer, transformer, *or* filter function, depending on exactly what is passed to it. +So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in +the output as well as set the indices. + +::: + +## Other useful features + +### Automatic exclusion of “nuisance” columns + +Again consider the example DataFrame we’ve been looking at: + +``` python +In [157]: df +Out[157]: + A B C D +0 foo one -0.575247 1.346061 +1 bar one 0.254161 1.511763 +2 foo two -1.143704 1.627081 +3 bar three 0.215897 -0.990582 +4 foo two 1.193555 -0.441652 +5 bar two -0.077118 1.211526 +6 foo one -0.408530 0.268520 +7 foo three -0.862495 0.024580 +``` + +Suppose we wish to compute the standard deviation grouped by the ``A`` +column. There is a slight problem, namely that we don’t care about the data in +column ``B``. We refer to this as a “nuisance” column. If the passed +aggregation function can’t be applied to some columns, the troublesome columns +will be (silently) dropped. Thus, this does not pose any problems: + +``` python +In [158]: df.groupby('A').std() +Out[158]: + C D +A +bar 0.181231 1.366330 +foo 0.912265 0.884785 +``` + +Note that ``df.groupby('A').colname.std().`` is more efficient than +``df.groupby('A').std().colname``, so if the result of an aggregation function +is only interesting over one column (here ``colname``), it may be filtered +*before* applying the aggregation function. + +::: tip Note + +Any object column, also if it contains numerical values such as ``Decimal`` +objects, is considered as a “nuisance” columns. They are excluded from +aggregate functions automatically in groupby. + +If you do wish to include decimal or object columns in an aggregation with +other non-nuisance data types, you must do so explicitly. + +::: + +``` python +In [159]: from decimal import Decimal + +In [160]: df_dec = pd.DataFrame( + .....: {'id': [1, 2, 1, 2], + .....: 'int_column': [1, 2, 3, 4], + .....: 'dec_column': [Decimal('0.50'), Decimal('0.15'), + .....: Decimal('0.25'), Decimal('0.40')] + .....: } + .....: ) + .....: + +# Decimal columns can be sum'd explicitly by themselves... +In [161]: df_dec.groupby(['id'])[['dec_column']].sum() +Out[161]: + dec_column +id +1 0.75 +2 0.55 + +# ...but cannot be combined with standard data types or they will be excluded +In [162]: df_dec.groupby(['id'])[['int_column', 'dec_column']].sum() +Out[162]: + int_column +id +1 4 +2 6 + +# Use .agg function to aggregate over standard and "nuisance" data types +# at the same time +In [163]: df_dec.groupby(['id']).agg({'int_column': 'sum', 'dec_column': 'sum'}) +Out[163]: + int_column dec_column +id +1 4 0.75 +2 6 0.55 +``` + +### Handling of (un)observed Categorical values + +When using a ``Categorical`` grouper (as a single grouper, or as part of multiple groupers), the ``observed`` keyword +controls whether to return a cartesian product of all possible groupers values (``observed=False``) or only those +that are observed groupers (``observed=True``). + +Show all values: + +``` python +In [164]: pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], + .....: categories=['a', 'b']), + .....: observed=False).count() + .....: +Out[164]: +a 3 +b 0 +dtype: int64 +``` + +Show only the observed values: + +``` python +In [165]: pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], + .....: categories=['a', 'b']), + .....: observed=True).count() + .....: +Out[165]: +a 3 +dtype: int64 +``` + +The returned dtype of the grouped will *always* include *all* of the categories that were grouped. + +``` python +In [166]: s = pd.Series([1, 1, 1]).groupby(pd.Categorical(['a', 'a', 'a'], + .....: categories=['a', 'b']), + .....: observed=False).count() + .....: + +In [167]: s.index.dtype +Out[167]: CategoricalDtype(categories=['a', 'b'], ordered=False) +``` + +### NA and NaT group handling + +If there are any NaN or NaT values in the grouping key, these will be +automatically excluded. In other words, there will never be an “NA group” or +“NaT group”. This was not the case in older versions of pandas, but users were +generally discarding the NA group anyway (and supporting it was an +implementation headache). + +### Grouping with ordered factors + +Categorical variables represented as instance of pandas’s ``Categorical`` class +can be used as group keys. If so, the order of the levels will be preserved: + +``` python +In [168]: data = pd.Series(np.random.randn(100)) + +In [169]: factor = pd.qcut(data, [0, .25, .5, .75, 1.]) + +In [170]: data.groupby(factor).mean() +Out[170]: +(-2.645, -0.523] -1.362896 +(-0.523, 0.0296] -0.260266 +(0.0296, 0.654] 0.361802 +(0.654, 2.21] 1.073801 +dtype: float64 +``` + +### Grouping with a grouper specification + +You may need to specify a bit more data to properly group. You can +use the ``pd.Grouper`` to provide this local control. + +``` python +In [171]: import datetime + +In [172]: df = pd.DataFrame({'Branch': 'A A A A A A A B'.split(), + .....: 'Buyer': 'Carl Mark Carl Carl Joe Joe Joe Carl'.split(), + .....: 'Quantity': [1, 3, 5, 1, 8, 1, 9, 3], + .....: 'Date': [ + .....: datetime.datetime(2013, 1, 1, 13, 0), + .....: datetime.datetime(2013, 1, 1, 13, 5), + .....: datetime.datetime(2013, 10, 1, 20, 0), + .....: datetime.datetime(2013, 10, 2, 10, 0), + .....: datetime.datetime(2013, 10, 1, 20, 0), + .....: datetime.datetime(2013, 10, 2, 10, 0), + .....: datetime.datetime(2013, 12, 2, 12, 0), + .....: datetime.datetime(2013, 12, 2, 14, 0)] + .....: }) + .....: + +In [173]: df +Out[173]: + Branch Buyer Quantity Date +0 A Carl 1 2013-01-01 13:00:00 +1 A Mark 3 2013-01-01 13:05:00 +2 A Carl 5 2013-10-01 20:00:00 +3 A Carl 1 2013-10-02 10:00:00 +4 A Joe 8 2013-10-01 20:00:00 +5 A Joe 1 2013-10-02 10:00:00 +6 A Joe 9 2013-12-02 12:00:00 +7 B Carl 3 2013-12-02 14:00:00 +``` + +Groupby a specific column with the desired frequency. This is like resampling. + +``` python +In [174]: df.groupby([pd.Grouper(freq='1M', key='Date'), 'Buyer']).sum() +Out[174]: + Quantity +Date Buyer +2013-01-31 Carl 1 + Mark 3 +2013-10-31 Carl 6 + Joe 9 +2013-12-31 Carl 3 + Joe 9 +``` + +You have an ambiguous specification in that you have a named index and a column +that could be potential groupers. + +``` python +In [175]: df = df.set_index('Date') + +In [176]: df['Date'] = df.index + pd.offsets.MonthEnd(2) + +In [177]: df.groupby([pd.Grouper(freq='6M', key='Date'), 'Buyer']).sum() +Out[177]: + Quantity +Date Buyer +2013-02-28 Carl 1 + Mark 3 +2014-02-28 Carl 9 + Joe 18 + +In [178]: df.groupby([pd.Grouper(freq='6M', level='Date'), 'Buyer']).sum() +Out[178]: + Quantity +Date Buyer +2013-01-31 Carl 1 + Mark 3 +2014-01-31 Carl 9 + Joe 18 +``` + +### Taking the first rows of each group + +Just like for a DataFrame or Series you can call head and tail on a groupby: + +``` python +In [179]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B']) + +In [180]: df +Out[180]: + A B +0 1 2 +1 1 4 +2 5 6 + +In [181]: g = df.groupby('A') + +In [182]: g.head(1) +Out[182]: + A B +0 1 2 +2 5 6 + +In [183]: g.tail(1) +Out[183]: + A B +1 1 4 +2 5 6 +``` + +This shows the first or last n rows from each group. + +### Taking the nth row of each group + +To select from a DataFrame or Series the nth item, use +``nth()``. This is a reduction method, and +will return a single row (or no row) per group if you pass an int for n: + +``` python +In [184]: df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) + +In [185]: g = df.groupby('A') + +In [186]: g.nth(0) +Out[186]: + B +A +1 NaN +5 6.0 + +In [187]: g.nth(-1) +Out[187]: + B +A +1 4.0 +5 6.0 + +In [188]: g.nth(1) +Out[188]: + B +A +1 4.0 +``` + +If you want to select the nth not-null item, use the ``dropna`` kwarg. For a DataFrame this should be either ``'any'`` or ``'all'`` just like you would pass to dropna: + +``` python +# nth(0) is the same as g.first() +In [189]: g.nth(0, dropna='any') +Out[189]: + B +A +1 4.0 +5 6.0 + +In [190]: g.first() +Out[190]: + B +A +1 4.0 +5 6.0 + +# nth(-1) is the same as g.last() +In [191]: g.nth(-1, dropna='any') # NaNs denote group exhausted when using dropna +Out[191]: + B +A +1 4.0 +5 6.0 + +In [192]: g.last() +Out[192]: + B +A +1 4.0 +5 6.0 + +In [193]: g.B.nth(0, dropna='all') +Out[193]: +A +1 4.0 +5 6.0 +Name: B, dtype: float64 +``` + +As with other methods, passing ``as_index=False``, will achieve a filtration, which returns the grouped row. + +``` python +In [194]: df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) + +In [195]: g = df.groupby('A', as_index=False) + +In [196]: g.nth(0) +Out[196]: + A B +0 1 NaN +2 5 6.0 + +In [197]: g.nth(-1) +Out[197]: + A B +1 1 4.0 +2 5 6.0 +``` + +You can also select multiple rows from each group by specifying multiple nth values as a list of ints. + +``` python +In [198]: business_dates = pd.date_range(start='4/1/2014', end='6/30/2014', freq='B') + +In [199]: df = pd.DataFrame(1, index=business_dates, columns=['a', 'b']) + +# get the first, 4th, and last date index for each month +In [200]: df.groupby([df.index.year, df.index.month]).nth([0, 3, -1]) +Out[200]: + a b +2014 4 1 1 + 4 1 1 + 4 1 1 + 5 1 1 + 5 1 1 + 5 1 1 + 6 1 1 + 6 1 1 + 6 1 1 +``` + +### Enumerate group items + +To see the order in which each row appears within its group, use the +``cumcount`` method: + +``` python +In [201]: dfg = pd.DataFrame(list('aaabba'), columns=['A']) + +In [202]: dfg +Out[202]: + A +0 a +1 a +2 a +3 b +4 b +5 a + +In [203]: dfg.groupby('A').cumcount() +Out[203]: +0 0 +1 1 +2 2 +3 0 +4 1 +5 3 +dtype: int64 + +In [204]: dfg.groupby('A').cumcount(ascending=False) +Out[204]: +0 3 +1 2 +2 1 +3 1 +4 0 +5 0 +dtype: int64 +``` + +### Enumerate groups + +*New in version 0.20.2.* + +To see the ordering of the groups (as opposed to the order of rows +within a group given by ``cumcount``) you can use +``ngroup()``. + +Note that the numbers given to the groups match the order in which the +groups would be seen when iterating over the groupby object, not the +order they are first observed. + +``` python +In [205]: dfg = pd.DataFrame(list('aaabba'), columns=['A']) + +In [206]: dfg +Out[206]: + A +0 a +1 a +2 a +3 b +4 b +5 a + +In [207]: dfg.groupby('A').ngroup() +Out[207]: +0 0 +1 0 +2 0 +3 1 +4 1 +5 0 +dtype: int64 + +In [208]: dfg.groupby('A').ngroup(ascending=False) +Out[208]: +0 1 +1 1 +2 1 +3 0 +4 0 +5 1 +dtype: int64 +``` + +### Plotting + +Groupby also works with some plotting methods. For example, suppose we +suspect that some features in a DataFrame may differ by group, in this case, +the values in column 1 where the group is “B” are 3 higher on average. + +``` python +In [209]: np.random.seed(1234) + +In [210]: df = pd.DataFrame(np.random.randn(50, 2)) + +In [211]: df['g'] = np.random.choice(['A', 'B'], size=50) + +In [212]: df.loc[df['g'] == 'B', 1] += 3 +``` + +We can easily visualize this with a boxplot: + +``` python +In [213]: df.groupby('g').boxplot() +Out[213]: +A AxesSubplot(0.1,0.15;0.363636x0.75) +B AxesSubplot(0.536364,0.15;0.363636x0.75) +dtype: object +``` + +![groupby_boxplot](https://static.pypandas.cn/public/static/images/groupby_boxplot.png) + +The result of calling ``boxplot`` is a dictionary whose keys are the values +of our grouping column ``g`` (“A” and “B”). The values of the resulting dictionary +can be controlled by the ``return_type`` keyword of ``boxplot``. +See the [visualization documentation](visualization.html#visualization-box) for more. + +::: danger Warning + +For historical reasons, ``df.groupby("g").boxplot()`` is not equivalent +to ``df.boxplot(by="g")``. See [here](visualization.html#visualization-box-return) for +an explanation. + +::: + +### Piping function calls + +*New in version 0.21.0.* + +Similar to the functionality provided by ``DataFrame`` and ``Series``, functions +that take ``GroupBy`` objects can be chained together using a ``pipe`` method to +allow for a cleaner, more readable syntax. To read about ``.pipe`` in general terms, +see [here](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-pipe). + +Combining ``.groupby`` and ``.pipe`` is often useful when you need to reuse +GroupBy objects. + +As an example, imagine having a DataFrame with columns for stores, products, +revenue and quantity sold. We’d like to do a groupwise calculation of *prices* +(i.e. revenue/quantity) per store and per product. We could do this in a +multi-step operation, but expressing it in terms of piping can make the +code more readable. First we set the data: + +``` python +In [214]: n = 1000 + +In [215]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n), + .....: 'Product': np.random.choice(['Product_1', + .....: 'Product_2'], n), + .....: 'Revenue': (np.random.random(n) * 50 + 10).round(2), + .....: 'Quantity': np.random.randint(1, 10, size=n)}) + .....: + +In [216]: df.head(2) +Out[216]: + Store Product Revenue Quantity +0 Store_2 Product_1 26.12 1 +1 Store_2 Product_1 28.86 1 +``` + +Now, to find prices per store/product, we can simply do: + +``` python +In [217]: (df.groupby(['Store', 'Product']) + .....: .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum()) + .....: .unstack().round(2)) + .....: +Out[217]: +Product Product_1 Product_2 +Store +Store_1 6.82 7.05 +Store_2 6.30 6.64 +``` + +Piping can also be expressive when you want to deliver a grouped object to some +arbitrary function, for example: + +``` python +In [218]: def mean(groupby): + .....: return groupby.mean() + .....: + +In [219]: df.groupby(['Store', 'Product']).pipe(mean) +Out[219]: + Revenue Quantity +Store Product +Store_1 Product_1 34.622727 5.075758 + Product_2 35.482815 5.029630 +Store_2 Product_1 32.972837 5.237589 + Product_2 34.684360 5.224000 +``` + +where ``mean`` takes a GroupBy object and finds the mean of the Revenue and Quantity +columns respectively for each Store-Product combination. The ``mean`` function can +be any function that takes in a GroupBy object; the ``.pipe`` will pass the GroupBy +object as a parameter into the function you specify. + +## Examples + +### Regrouping by factor + +Regroup columns of a DataFrame according to their sum, and sum the aggregated ones. + +``` python +In [220]: df = pd.DataFrame({'a': [1, 0, 0], 'b': [0, 1, 0], + .....: 'c': [1, 0, 0], 'd': [2, 3, 4]}) + .....: + +In [221]: df +Out[221]: + a b c d +0 1 0 1 2 +1 0 1 0 3 +2 0 0 0 4 + +In [222]: df.groupby(df.sum(), axis=1).sum() +Out[222]: + 1 9 +0 2 2 +1 1 3 +2 0 4 +``` + +### Multi-column factorization + +By using ``ngroup()``, we can extract +information about the groups in a way similar to [``factorize()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html#pandas.factorize) (as described +further in the [reshaping API](reshaping.html#reshaping-factorize)) but which applies +naturally to multiple columns of mixed type and different +sources. This can be useful as an intermediate categorical-like step +in processing, when the relationships between the group rows are more +important than their content, or as input to an algorithm which only +accepts the integer encoding. (For more information about support in +pandas for full categorical data, see the [Categorical +introduction](categorical.html#categorical) and the +[API documentation](https://pandas.pydata.org/pandas-docs/stable/reference/arrays.html#api-arrays-categorical).) + +``` python +In [223]: dfg = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")}) + +In [224]: dfg +Out[224]: + A B +0 1 a +1 1 a +2 2 a +3 3 b +4 2 a + +In [225]: dfg.groupby(["A", "B"]).ngroup() +Out[225]: +0 0 +1 0 +2 1 +3 2 +4 1 +dtype: int64 + +In [226]: dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup() +Out[226]: +0 0 +1 0 +2 1 +3 3 +4 2 +dtype: int64 +``` + +### Groupby by indexer to ‘resample’ data + +Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples. + +In order to resample to work on indices that are non-datetimelike, the following procedure can be utilized. + +In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation. + +::: tip Note + +The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples. + +::: + +``` python +In [227]: df = pd.DataFrame(np.random.randn(10, 2)) + +In [228]: df +Out[228]: + 0 1 +0 -0.793893 0.321153 +1 0.342250 1.618906 +2 -0.975807 1.918201 +3 -0.810847 -1.405919 +4 -1.977759 0.461659 +5 0.730057 -1.316938 +6 -0.751328 0.528290 +7 -0.257759 -1.081009 +8 0.505895 -1.701948 +9 -1.006349 0.020208 + +In [229]: df.index // 5 +Out[229]: Int64Index([0, 0, 0, 0, 0, 1, 1, 1, 1, 1], dtype='int64') + +In [230]: df.groupby(df.index // 5).std() +Out[230]: + 0 1 +0 0.823647 1.312912 +1 0.760109 0.942941 +``` + +### Returning a Series to propagate names + +Group DataFrame columns, compute a set of metrics and return a named Series. +The Series name is used as the name for the column index. This is especially +useful in conjunction with reshaping operations such as stacking in which the +column index name will be used as the name of the inserted column: + +``` python +In [231]: df = pd.DataFrame({'a': [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], + .....: 'b': [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1], + .....: 'c': [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0], + .....: 'd': [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1]}) + .....: + +In [232]: def compute_metrics(x): + .....: result = {'b_sum': x['b'].sum(), 'c_mean': x['c'].mean()} + .....: return pd.Series(result, name='metrics') + .....: + +In [233]: result = df.groupby('a').apply(compute_metrics) + +In [234]: result +Out[234]: +metrics b_sum c_mean +a +0 2.0 0.5 +1 2.0 0.5 +2 2.0 0.5 + +In [235]: result.stack() +Out[235]: +a metrics +0 b_sum 2.0 + c_mean 0.5 +1 b_sum 2.0 + c_mean 0.5 +2 b_sum 2.0 + c_mean 0.5 +dtype: float64 +``` diff --git a/Python/pandas/user_guide/indexing.md b/Python/pandas/user_guide/indexing.md new file mode 100644 index 00000000..7308ef5a --- /dev/null +++ b/Python/pandas/user_guide/indexing.md @@ -0,0 +1,3114 @@ +# 索引和数据选择器 + +Pandas对象中的轴标记信息有多种用途: + +- 使用已知指标识别数据(即提供*元数据*),这对于分析,可视化和交互式控制台显示非常重要。 +- 启用自动和显式数据对齐。 +- 允许直观地获取和设置数据集的子集。 + +在本节中,我们将重点关注最后一点:即如何切片,切块,以及通常获取和设置pandas对象的子集。主要关注的是Series和DataFrame,因为他们在这个领域受到了更多的开发关注。 + +::: tip 注意 + +Python和NumPy索引运算符``[]``和属性运算符``.`` +可以在各种用例中快速轻松地访问pandas数据结构。这使得交互式工作变得直观,因为如果您已经知道如何处理Python字典和NumPy数组,那么几乎没有新的东西需要学习。但是,由于预先不知道要访问的数据类型,因此直接使用标准运算符会有一些优化限制。对于生产代码,我们建议您利用本章中介绍的优化的pandas数据访问方法。 + +::: + +::: danger 警告 + +是否为设置操作返回副本或引用可能取决于上下文。这有时被称为应该避免。请参阅[返回视图与复制](#indexing-view-versus-copy)。``chained assignment``[](#indexing-view-versus-copy) + +::: + +::: danger 警告 + +使用浮点数对基于整数的索引进行索引已在0.18.0中进行了说明,有关更改的摘要,请参见[此处](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.18.0.html#whatsnew-0180-float-indexers)。 + +::: + +见[多指标/高级索引](advanced.html#advanced)的``MultiIndex``和更先进的索引文件。 + +有关一些高级策略,请参阅[食谱](cookbook.html#cookbook-selection)。 + +## 索引的不同选择 + +对象选择已经有许多用户请求的添加,以支持更明确的基于位置的索引。Pandas现在支持三种类型的多轴索引。 + +- ``.loc``主要是基于标签的,但也可以与布尔数组一起使用。当找不到物品时``.loc``会提高``KeyError``。允许的输入是: + - 单个标签,例如``5``或``'a'``(注意,它``5``被解释为索引的 + *标签*。此用法**不是**索引的整数位置。)。 + - 列表或标签数组。``['a', 'b', 'c']`` + - 带标签的切片对象``'a':'f'``(注意,相反普通的Python片,**都**开始和停止都包括在内,当存在于索引中!见[有标签切片](#indexing-slicing-with-labels) + 和[端点都包括在内](advanced.html#advanced-endpoints-are-inclusive)。) + - 布尔数组 + - 一个``callable``带有一个参数的函数(调用Series或DataFrame)并返回有效的索引输出(上面的一个)。 + + *版本0.18.1中的新功能。* + + 在[标签选择中](#indexing-label)查看更多信息。 + +- ``.iloc``是基于主要的整数位置(从``0``到 ``length-1``所述轴的),但也可以用布尔阵列使用。 如果请求的索引器超出范围,``.iloc``则会引发``IndexError``,但允许越界索引的*切片*索引器除外。(这符合Python / NumPy *切片* +语义)。允许的输入是: + - 一个整数,例如``5``。 + - 整数列表或数组。``[4, 3, 0]`` + - 带有整数的切片对象``1:7``。 + - 布尔数组。 + - 一个``callable``带有一个参数的函数(调用Series或DataFrame)并返回有效的索引输出(上面的一个)。 + + *版本0.18.1中的新功能。* + + 有关详细信息,请参阅[按位置选择](#indexing-integer),[高级索引](advanced.html#advanced)和[高级层次结构](advanced.html#advanced-advanced-hierarchical)。 + +- ``.loc``,``.iloc``以及``[]``索引也可以接受一个``callable``索引器。在[Select By Callable中](#indexing-callable)查看更多信息。 + +从具有多轴选择的对象获取值使用以下表示法(使用``.loc``作为示例,但以下也适用``.iloc``)。任何轴访问器可以是空切片``:``。假设超出规范的轴是``:``,例如``p.loc['a']``相当于 + 。``p.loc['a', :, :]`` + +对象类型 | 索引 +---|--- +系列 | s.loc[indexer] +数据帧 | df.loc[row_indexer,column_indexer] + +## 基础知识 + +正如在[上一节中](/docs/getting_started/basics.html)介绍数据结构时所提到的,索引的主要功能``[]``(也就是``__getitem__`` +那些熟悉在Python中实现类行为的人)是选择低维切片。下表显示了使用以下方法索引pandas对象时的返回类型值``[]``: + +对象类型 | 选择 | 返回值类型 +---|---|--- +系列 | series[label] | 标量值 +数据帧 | frame[colname] | Series 对应于colname + +在这里,我们构建一个简单的时间序列数据集,用于说明索引功能: + +``` python +In [1]: dates = pd.date_range('1/1/2000', periods=8) + +In [2]: df = pd.DataFrame(np.random.randn(8, 4), + ...: index=dates, columns=['A', 'B', 'C', 'D']) + ...: + +In [3]: df +Out[3]: + A B C D +2000-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2000-01-02 1.212112 -0.173215 0.119209 -1.044236 +2000-01-03 -0.861849 -2.104569 -0.494929 1.071804 +2000-01-04 0.721555 -0.706771 -1.039575 0.271860 +2000-01-05 -0.424972 0.567020 0.276232 -1.087401 +2000-01-06 -0.673690 0.113648 -1.478427 0.524988 +2000-01-07 0.404705 0.577046 -1.715002 -1.039268 +2000-01-08 -0.370647 -1.157892 -1.344312 0.844885 +``` + +::: tip 注意 + +除非特别说明,否则索引功能都不是时间序列特定的。 + +::: + +因此,如上所述,我们使用最基本的索引``[]``: + +``` python +In [4]: s = df['A'] + +In [5]: s[dates[5]] +Out[5]: -0.6736897080883706 +``` + +您可以传递列表列表``[]``以按该顺序选择列。如果DataFrame中未包含列,则会引发异常。也可以这种方式设置多列: + +``` python +In [6]: df +Out[6]: + A B C D +2000-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2000-01-02 1.212112 -0.173215 0.119209 -1.044236 +2000-01-03 -0.861849 -2.104569 -0.494929 1.071804 +2000-01-04 0.721555 -0.706771 -1.039575 0.271860 +2000-01-05 -0.424972 0.567020 0.276232 -1.087401 +2000-01-06 -0.673690 0.113648 -1.478427 0.524988 +2000-01-07 0.404705 0.577046 -1.715002 -1.039268 +2000-01-08 -0.370647 -1.157892 -1.344312 0.844885 + +In [7]: df[['B', 'A']] = df[['A', 'B']] + +In [8]: df +Out[8]: + A B C D +2000-01-01 -0.282863 0.469112 -1.509059 -1.135632 +2000-01-02 -0.173215 1.212112 0.119209 -1.044236 +2000-01-03 -2.104569 -0.861849 -0.494929 1.071804 +2000-01-04 -0.706771 0.721555 -1.039575 0.271860 +2000-01-05 0.567020 -0.424972 0.276232 -1.087401 +2000-01-06 0.113648 -0.673690 -1.478427 0.524988 +2000-01-07 0.577046 0.404705 -1.715002 -1.039268 +2000-01-08 -1.157892 -0.370647 -1.344312 0.844885 +``` + +您可能会发现这对于将变换(就地)应用于列的子集非常有用。 + +::: danger 警告 + +pandas在设置``Series``和``DataFrame``来自``.loc``和时对齐所有AXES ``.iloc``。 + +这**不会**修改,``df``因为列对齐在赋值之前。 + +``` python +In [9]: df[['A', 'B']] +Out[9]: + A B +2000-01-01 -0.282863 0.469112 +2000-01-02 -0.173215 1.212112 +2000-01-03 -2.104569 -0.861849 +2000-01-04 -0.706771 0.721555 +2000-01-05 0.567020 -0.424972 +2000-01-06 0.113648 -0.673690 +2000-01-07 0.577046 0.404705 +2000-01-08 -1.157892 -0.370647 + +In [10]: df.loc[:, ['B', 'A']] = df[['A', 'B']] + +In [11]: df[['A', 'B']] +Out[11]: + A B +2000-01-01 -0.282863 0.469112 +2000-01-02 -0.173215 1.212112 +2000-01-03 -2.104569 -0.861849 +2000-01-04 -0.706771 0.721555 +2000-01-05 0.567020 -0.424972 +2000-01-06 0.113648 -0.673690 +2000-01-07 0.577046 0.404705 +2000-01-08 -1.157892 -0.370647 +``` + +交换列值的正确方法是使用原始值: + +``` python +In [12]: df.loc[:, ['B', 'A']] = df[['A', 'B']].to_numpy() + +In [13]: df[['A', 'B']] +Out[13]: + A B +2000-01-01 0.469112 -0.282863 +2000-01-02 1.212112 -0.173215 +2000-01-03 -0.861849 -2.104569 +2000-01-04 0.721555 -0.706771 +2000-01-05 -0.424972 0.567020 +2000-01-06 -0.673690 0.113648 +2000-01-07 0.404705 0.577046 +2000-01-08 -0.370647 -1.157892 +``` + +::: + +## 属性访问 + +您可以直接访问某个``Series``或列上的索引``DataFrame``作为属性: + +``` python +In [14]: sa = pd.Series([1, 2, 3], index=list('abc')) + +In [15]: dfa = df.copy() +``` + +``` python +In [16]: sa.b +Out[16]: 2 + +In [17]: dfa.A +Out[17]: +2000-01-01 0.469112 +2000-01-02 1.212112 +2000-01-03 -0.861849 +2000-01-04 0.721555 +2000-01-05 -0.424972 +2000-01-06 -0.673690 +2000-01-07 0.404705 +2000-01-08 -0.370647 +Freq: D, Name: A, dtype: float64 +``` + +``` python +In [18]: sa.a = 5 + +In [19]: sa +Out[19]: +a 5 +b 2 +c 3 +dtype: int64 + +In [20]: dfa.A = list(range(len(dfa.index))) # ok if A already exists + +In [21]: dfa +Out[21]: + A B C D +2000-01-01 0 -0.282863 -1.509059 -1.135632 +2000-01-02 1 -0.173215 0.119209 -1.044236 +2000-01-03 2 -2.104569 -0.494929 1.071804 +2000-01-04 3 -0.706771 -1.039575 0.271860 +2000-01-05 4 0.567020 0.276232 -1.087401 +2000-01-06 5 0.113648 -1.478427 0.524988 +2000-01-07 6 0.577046 -1.715002 -1.039268 +2000-01-08 7 -1.157892 -1.344312 0.844885 + +In [22]: dfa['A'] = list(range(len(dfa.index))) # use this form to create a new column + +In [23]: dfa +Out[23]: + A B C D +2000-01-01 0 -0.282863 -1.509059 -1.135632 +2000-01-02 1 -0.173215 0.119209 -1.044236 +2000-01-03 2 -2.104569 -0.494929 1.071804 +2000-01-04 3 -0.706771 -1.039575 0.271860 +2000-01-05 4 0.567020 0.276232 -1.087401 +2000-01-06 5 0.113648 -1.478427 0.524988 +2000-01-07 6 0.577046 -1.715002 -1.039268 +2000-01-08 7 -1.157892 -1.344312 0.844885 +``` + +::: danger 警告 + +- 仅当index元素是有效的Python标识符时才可以使用此访问权限,例如``s.1``,不允许。有关[有效标识符的说明,](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)请参见[此处](https://docs.python.org/3/reference/lexical_analysis.html#identifiers)。 +- 如果该属性与现有方法名称冲突,则该属性将不可用,例如``s.min``,不允许。 +- 同样的,如果它与任何下面的列表冲突的属性将不可用:``index``, + ``major_axis``,``minor_axis``,``items``。 +- 在任何一种情况下,标准索引仍然可以工作,例如``s['1']``,``s['min']``和``s['index']``将访问相应的元素或列。 + +::: + +如果您使用的是IPython环境,则还可以使用tab-completion来查看这些可访问的属性。 + +您还可以将a分配``dict``给一行``DataFrame``: + +``` python +In [24]: x = pd.DataFrame({'x': [1, 2, 3], 'y': [3, 4, 5]}) + +In [25]: x.iloc[1] = {'x': 9, 'y': 99} + +In [26]: x +Out[26]: + x y +0 1 3 +1 9 99 +2 3 5 +``` + +您可以使用属性访问来修改DataFrame的Series或列的现有元素,但要小心; 如果您尝试使用属性访问权来创建新列,则会创建新属性而不是新列。在0.21.0及更高版本中,这将引发``UserWarning``: + +``` python +In [1]: df = pd.DataFrame({'one': [1., 2., 3.]}) +In [2]: df.two = [4, 5, 6] +UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute_access +In [3]: df +Out[3]: + one +0 1.0 +1 2.0 +2 3.0 +``` + +## 切片范围 + +沿着任意轴切割范围的最稳健和一致的方法在详细说明该方法的“ [按位置选择”](#indexing-integer)部分中描述``.iloc``。现在,我们解释使用``[]``运算符切片的语义。 + +使用Series,语法与ndarray完全一样,返回值的一部分和相应的标签: + +``` python +In [27]: s[:5] +Out[27]: +2000-01-01 0.469112 +2000-01-02 1.212112 +2000-01-03 -0.861849 +2000-01-04 0.721555 +2000-01-05 -0.424972 +Freq: D, Name: A, dtype: float64 + +In [28]: s[::2] +Out[28]: +2000-01-01 0.469112 +2000-01-03 -0.861849 +2000-01-05 -0.424972 +2000-01-07 0.404705 +Freq: 2D, Name: A, dtype: float64 + +In [29]: s[::-1] +Out[29]: +2000-01-08 -0.370647 +2000-01-07 0.404705 +2000-01-06 -0.673690 +2000-01-05 -0.424972 +2000-01-04 0.721555 +2000-01-03 -0.861849 +2000-01-02 1.212112 +2000-01-01 0.469112 +Freq: -1D, Name: A, dtype: float64 +``` + +请注意,设置也适用: + +``` python +In [30]: s2 = s.copy() + +In [31]: s2[:5] = 0 + +In [32]: s2 +Out[32]: +2000-01-01 0.000000 +2000-01-02 0.000000 +2000-01-03 0.000000 +2000-01-04 0.000000 +2000-01-05 0.000000 +2000-01-06 -0.673690 +2000-01-07 0.404705 +2000-01-08 -0.370647 +Freq: D, Name: A, dtype: float64 +``` + +使用DataFrame,切片内部``[]`` **切片**。这主要是为了方便而提供的,因为它是如此常见的操作。 + +``` python +In [33]: df[:3] +Out[33]: + A B C D +2000-01-01 0.469112 -0.282863 -1.509059 -1.135632 +2000-01-02 1.212112 -0.173215 0.119209 -1.044236 +2000-01-03 -0.861849 -2.104569 -0.494929 1.071804 + +In [34]: df[::-1] +Out[34]: + A B C D +2000-01-08 -0.370647 -1.157892 -1.344312 0.844885 +2000-01-07 0.404705 0.577046 -1.715002 -1.039268 +2000-01-06 -0.673690 0.113648 -1.478427 0.524988 +2000-01-05 -0.424972 0.567020 0.276232 -1.087401 +2000-01-04 0.721555 -0.706771 -1.039575 0.271860 +2000-01-03 -0.861849 -2.104569 -0.494929 1.071804 +2000-01-02 1.212112 -0.173215 0.119209 -1.044236 +2000-01-01 0.469112 -0.282863 -1.509059 -1.135632 +``` + +## 按标签选择 + +::: danger 警告 + +是否为设置操作返回副本或引用可能取决于上下文。这有时被称为应该避免。请参阅[返回视图与复制](#indexing-view-versus-copy)。``chained assignment``[](#indexing-view-versus-copy) + +::: + +::: danger 警告 + +``` python +In [35]: dfl = pd.DataFrame(np.random.randn(5, 4), + ....: columns=list('ABCD'), + ....: index=pd.date_range('20130101', periods=5)) + ....: + +In [36]: dfl +Out[36]: + A B C D +2013-01-01 1.075770 -0.109050 1.643563 -1.469388 +2013-01-02 0.357021 -0.674600 -1.776904 -0.968914 +2013-01-03 -1.294524 0.413738 0.276662 -0.472035 +2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061 +2013-01-05 0.895717 0.805244 -1.206412 2.565646 +``` + +``` python +In [4]: dfl.loc[2:3] +TypeError: cannot do slice indexing on with these indexers [2] of +``` + +切片中的字符串喜欢*可以*转换为索引的类型并导致自然切片。 + +``` python +In [37]: dfl.loc['20130102':'20130104'] +Out[37]: + A B C D +2013-01-02 0.357021 -0.674600 -1.776904 -0.968914 +2013-01-03 -1.294524 0.413738 0.276662 -0.472035 +2013-01-04 -0.013960 -0.362543 -0.006154 -0.923061 +``` + +::: + +::: danger 警告 + +从0.21.0开始,pandas将显示``FutureWarning``带有缺少标签的列表的if索引。将来这会提高一个``KeyError``。请参阅[list-like使用列表中缺少键的loc是不推荐使用](#indexing-deprecate-loc-reindex-listlike)。 + +::: + +pandas提供了一套方法,以便拥有**纯粹基于标签的索引**。这是一个严格的包含协议。要求的每个标签必须在索引中,否则``KeyError``将被提出。切片时,如果索引中存在,则*包括*起始绑定**和**停止边界。整数是有效标签,但它们是指标签**而不是位置**。****** + +该``.loc``属性是主要访问方法。以下是有效输入: + +- 单个标签,例如``5``或``'a'``(注意,它``5``被解释为索引的*标签*。此用法**不是**索引的整数位置。)。 +- 列表或标签数组。``['a', 'b', 'c']`` +- 带有标签的切片对象``'a':'f'``(注意,与通常的python切片相反,**包括**起始和停止,当存在于索引中时!请参见[切片标签](#indexing-slicing-with-labels)。 +- 布尔数组。 +- A ``callable``,参见[按可调用选择](#indexing-callable)。 + +``` python +In [38]: s1 = pd.Series(np.random.randn(6), index=list('abcdef')) + +In [39]: s1 +Out[39]: +a 1.431256 +b 1.340309 +c -1.170299 +d -0.226169 +e 0.410835 +f 0.813850 +dtype: float64 + +In [40]: s1.loc['c':] +Out[40]: +c -1.170299 +d -0.226169 +e 0.410835 +f 0.813850 +dtype: float64 + +In [41]: s1.loc['b'] +Out[41]: 1.3403088497993827 +``` + +请注意,设置也适用: + +``` python +In [42]: s1.loc['c':] = 0 + +In [43]: s1 +Out[43]: +a 1.431256 +b 1.340309 +c 0.000000 +d 0.000000 +e 0.000000 +f 0.000000 +dtype: float64 +``` + +使用DataFrame: + +``` python +In [44]: df1 = pd.DataFrame(np.random.randn(6, 4), + ....: index=list('abcdef'), + ....: columns=list('ABCD')) + ....: + +In [45]: df1 +Out[45]: + A B C D +a 0.132003 -0.827317 -0.076467 -1.187678 +b 1.130127 -1.436737 -1.413681 1.607920 +c 1.024180 0.569605 0.875906 -2.211372 +d 0.974466 -2.006747 -0.410001 -0.078638 +e 0.545952 -1.219217 -1.226825 0.769804 +f -1.281247 -0.727707 -0.121306 -0.097883 + +In [46]: df1.loc[['a', 'b', 'd'], :] +Out[46]: + A B C D +a 0.132003 -0.827317 -0.076467 -1.187678 +b 1.130127 -1.436737 -1.413681 1.607920 +d 0.974466 -2.006747 -0.410001 -0.078638 +``` + +通过标签切片访问: + +``` python +In [47]: df1.loc['d':, 'A':'C'] +Out[47]: + A B C +d 0.974466 -2.006747 -0.410001 +e 0.545952 -1.219217 -1.226825 +f -1.281247 -0.727707 -0.121306 +``` + +使用标签获取横截面(相当于``df.xs('a')``): + +``` python +In [48]: df1.loc['a'] +Out[48]: +A 0.132003 +B -0.827317 +C -0.076467 +D -1.187678 +Name: a, dtype: float64 +``` + +要使用布尔数组获取值: + +``` python +In [49]: df1.loc['a'] > 0 +Out[49]: +A True +B False +C False +D False +Name: a, dtype: bool + +In [50]: df1.loc[:, df1.loc['a'] > 0] +Out[50]: + A +a 0.132003 +b 1.130127 +c 1.024180 +d 0.974466 +e 0.545952 +f -1.281247 +``` + +要明确获取值(相当于已弃用``df.get_value('a','A')``): + +``` python +# this is also equivalent to ``df1.at['a','A']`` +In [51]: df1.loc['a', 'A'] +Out[51]: 0.13200317033032932 +``` + +### 用标签切片 + +使用``.loc``切片时,如果索引中存在开始和停止标签,则返回*位于*两者之间的元素(包括它们): + +``` python +In [52]: s = pd.Series(list('abcde'), index=[0, 3, 2, 5, 4]) + +In [53]: s.loc[3:5] +Out[53]: +3 b +2 c +5 d +dtype: object +``` + +如果两个中至少有一个不存在,但索引已排序,并且可以与开始和停止标签进行比较,那么通过选择在两者之间*排名的*标签,切片仍将按预期工作: + +``` python +In [54]: s.sort_index() +Out[54]: +0 a +2 c +3 b +4 e +5 d +dtype: object + +In [55]: s.sort_index().loc[1:6] +Out[55]: +2 c +3 b +4 e +5 d +dtype: object +``` + +然而,如果两个中的至少一个不存在*并且*索引未被排序,则将引发错误(因为否则将是计算上昂贵的,并且对于混合类型索引可能是模糊的)。例如,在上面的例子中,``s.loc[1:6]``会提高``KeyError``。 + +有关此行为背后的基本原理,请参阅 + [端点包含](advanced.html#advanced-endpoints-are-inclusive)。 + +## 按位置选择 + +::: danger 警告 + +是否为设置操作返回副本或引用可能取决于上下文。这有时被称为应该避免。请参阅[返回视图与复制](#indexing-view-versus-copy)。``chained assignment``[](#indexing-view-versus-copy) + +::: + +Pandas提供了一套方法,以获得**纯粹基于整数的索引**。语义紧跟Python和NumPy切片。这些是``0-based``索引。切片时,所结合的开始被*包括*,而上限是*排除*。尝试使用非整数,甚至是**有效的**标签都会引发一个问题``IndexError``。 + +该``.iloc``属性是主要访问方法。以下是有效输入: + +- 一个整数,例如``5``。 +- 整数列表或数组。``[4, 3, 0]`` +- 带有整数的切片对象``1:7``。 +- 布尔数组。 +- A ``callable``,参见[按可调用选择](#indexing-callable)。 + +``` python +In [56]: s1 = pd.Series(np.random.randn(5), index=list(range(0, 10, 2))) + +In [57]: s1 +Out[57]: +0 0.695775 +2 0.341734 +4 0.959726 +6 -1.110336 +8 -0.619976 +dtype: float64 + +In [58]: s1.iloc[:3] +Out[58]: +0 0.695775 +2 0.341734 +4 0.959726 +dtype: float64 + +In [59]: s1.iloc[3] +Out[59]: -1.110336102891167 +``` + +请注意,设置也适用: + +``` python +In [60]: s1.iloc[:3] = 0 + +In [61]: s1 +Out[61]: +0 0.000000 +2 0.000000 +4 0.000000 +6 -1.110336 +8 -0.619976 +dtype: float64 +``` + +使用DataFrame: + +``` python +In [62]: df1 = pd.DataFrame(np.random.randn(6, 4), + ....: index=list(range(0, 12, 2)), + ....: columns=list(range(0, 8, 2))) + ....: + +In [63]: df1 +Out[63]: + 0 2 4 6 +0 0.149748 -0.732339 0.687738 0.176444 +2 0.403310 -0.154951 0.301624 -2.179861 +4 -1.369849 -0.954208 1.462696 -1.743161 +6 -0.826591 -0.345352 1.314232 0.690579 +8 0.995761 2.396780 0.014871 3.357427 +10 -0.317441 -1.236269 0.896171 -0.487602 +``` + +通过整数切片选择: + +``` python +In [64]: df1.iloc[:3] +Out[64]: + 0 2 4 6 +0 0.149748 -0.732339 0.687738 0.176444 +2 0.403310 -0.154951 0.301624 -2.179861 +4 -1.369849 -0.954208 1.462696 -1.743161 + +In [65]: df1.iloc[1:5, 2:4] +Out[65]: + 4 6 +2 0.301624 -2.179861 +4 1.462696 -1.743161 +6 1.314232 0.690579 +8 0.014871 3.357427 +``` + +通过整数列表选择: + +``` python +In [66]: df1.iloc[[1, 3, 5], [1, 3]] +Out[66]: + 2 6 +2 -0.154951 -2.179861 +6 -0.345352 0.690579 +10 -1.236269 -0.487602 +``` + +``` python +In [67]: df1.iloc[1:3, :] +Out[67]: + 0 2 4 6 +2 0.403310 -0.154951 0.301624 -2.179861 +4 -1.369849 -0.954208 1.462696 -1.743161 +``` + +``` python +In [68]: df1.iloc[:, 1:3] +Out[68]: + 2 4 +0 -0.732339 0.687738 +2 -0.154951 0.301624 +4 -0.954208 1.462696 +6 -0.345352 1.314232 +8 2.396780 0.014871 +10 -1.236269 0.896171 +``` + +``` python +# this is also equivalent to ``df1.iat[1,1]`` +In [69]: df1.iloc[1, 1] +Out[69]: -0.1549507744249032 +``` + +使用整数位置(等效``df.xs(1)``)得到横截面: + +``` python +In [70]: df1.iloc[1] +Out[70]: +0 0.403310 +2 -0.154951 +4 0.301624 +6 -2.179861 +Name: 2, dtype: float64 +``` + +超出范围的切片索引正如Python / Numpy中一样优雅地处理。 + +``` python +# these are allowed in python/numpy. +In [71]: x = list('abcdef') + +In [72]: x +Out[72]: ['a', 'b', 'c', 'd', 'e', 'f'] + +In [73]: x[4:10] +Out[73]: ['e', 'f'] + +In [74]: x[8:10] +Out[74]: [] + +In [75]: s = pd.Series(x) + +In [76]: s +Out[76]: +0 a +1 b +2 c +3 d +4 e +5 f +dtype: object + +In [77]: s.iloc[4:10] +Out[77]: +4 e +5 f +dtype: object + +In [78]: s.iloc[8:10] +Out[78]: Series([], dtype: object) +``` + +请注意,使用超出边界的切片可能会导致空轴(例如,返回一个空的DataFrame)。 + +``` python +In [79]: dfl = pd.DataFrame(np.random.randn(5, 2), columns=list('AB')) + +In [80]: dfl +Out[80]: + A B +0 -0.082240 -2.182937 +1 0.380396 0.084844 +2 0.432390 1.519970 +3 -0.493662 0.600178 +4 0.274230 0.132885 + +In [81]: dfl.iloc[:, 2:3] +Out[81]: +Empty DataFrame +Columns: [] +Index: [0, 1, 2, 3, 4] + +In [82]: dfl.iloc[:, 1:3] +Out[82]: + B +0 -2.182937 +1 0.084844 +2 1.519970 +3 0.600178 +4 0.132885 + +In [83]: dfl.iloc[4:6] +Out[83]: + A B +4 0.27423 0.132885 +``` + +一个超出范围的索引器会引发一个``IndexError``。任何元素超出范围的索引器列表都会引发 + ``IndexError``。 + +``` python +>>> dfl.iloc[[4, 5, 6]] +IndexError: positional indexers are out-of-bounds + +>>> dfl.iloc[:, 4] +IndexError: single positional indexer is out-of-bounds +``` + +## 通过可调用选择 + +*版本0.18.1中的新功能。* + +``.loc``,``.iloc``以及``[]``索引也可以接受一个``callable``索引器。在``callable``必须与一个参数(调用系列或数据帧)返回的有效输出索引功能。 + +``` python +In [84]: df1 = pd.DataFrame(np.random.randn(6, 4), + ....: index=list('abcdef'), + ....: columns=list('ABCD')) + ....: + +In [85]: df1 +Out[85]: + A B C D +a -0.023688 2.410179 1.450520 0.206053 +b -0.251905 -2.213588 1.063327 1.266143 +c 0.299368 -0.863838 0.408204 -1.048089 +d -0.025747 -0.988387 0.094055 1.262731 +e 1.289997 0.082423 -0.055758 0.536580 +f -0.489682 0.369374 -0.034571 -2.484478 + +In [86]: df1.loc[lambda df: df.A > 0, :] +Out[86]: + A B C D +c 0.299368 -0.863838 0.408204 -1.048089 +e 1.289997 0.082423 -0.055758 0.536580 + +In [87]: df1.loc[:, lambda df: ['A', 'B']] +Out[87]: + A B +a -0.023688 2.410179 +b -0.251905 -2.213588 +c 0.299368 -0.863838 +d -0.025747 -0.988387 +e 1.289997 0.082423 +f -0.489682 0.369374 + +In [88]: df1.iloc[:, lambda df: [0, 1]] +Out[88]: + A B +a -0.023688 2.410179 +b -0.251905 -2.213588 +c 0.299368 -0.863838 +d -0.025747 -0.988387 +e 1.289997 0.082423 +f -0.489682 0.369374 + +In [89]: df1[lambda df: df.columns[0]] +Out[89]: +a -0.023688 +b -0.251905 +c 0.299368 +d -0.025747 +e 1.289997 +f -0.489682 +Name: A, dtype: float64 +``` + +您可以使用可调用索引``Series``。 + +``` python +In [90]: df1.A.loc[lambda s: s > 0] +Out[90]: +c 0.299368 +e 1.289997 +Name: A, dtype: float64 +``` + +使用这些方法/索引器,您可以在不使用临时变量的情况下链接数据选择操作。 + +``` python +In [91]: bb = pd.read_csv('data/baseball.csv', index_col='id') + +In [92]: (bb.groupby(['year', 'team']).sum() + ....: .loc[lambda df: df.r > 100]) + ....: +Out[92]: + stint g ab r h X2b X3b hr rbi sb cs bb so ibb hbp sh sf gidp +year team +2007 CIN 6 379 745 101 203 35 2 36 125.0 10.0 1.0 105 127.0 14.0 1.0 1.0 15.0 18.0 + DET 5 301 1062 162 283 54 4 37 144.0 24.0 7.0 97 176.0 3.0 10.0 4.0 8.0 28.0 + HOU 4 311 926 109 218 47 6 14 77.0 10.0 4.0 60 212.0 3.0 9.0 16.0 6.0 17.0 + LAN 11 413 1021 153 293 61 3 36 154.0 7.0 5.0 114 141.0 8.0 9.0 3.0 8.0 29.0 + NYN 13 622 1854 240 509 101 3 61 243.0 22.0 4.0 174 310.0 24.0 23.0 18.0 15.0 48.0 + SFN 5 482 1305 198 337 67 6 40 171.0 26.0 7.0 235 188.0 51.0 8.0 16.0 6.0 41.0 + TEX 2 198 729 115 200 40 4 28 115.0 21.0 4.0 73 140.0 4.0 5.0 2.0 8.0 16.0 + TOR 4 459 1408 187 378 96 2 58 223.0 4.0 2.0 190 265.0 16.0 12.0 4.0 16.0 38.0 +``` + +## 不推荐使用IX索引器 + +::: danger 警告 + +在0.20.0开始,``.ix``索引器已被弃用,赞成更加严格``.iloc`` +和``.loc``索引。 + +::: + +``.ix``在推断用户想要做的事情上提供了很多魔力。也就是说,``.ix``可以根据索引的数据类型决定按*位置*或通过*标签*进行索引。多年来,这引起了相当多的用户混淆。 + +建议的索引方法是: + +- ``.loc``如果你想*标记*索引。 +- ``.iloc``如果你想要*定位*索引。 + +``` python +In [93]: dfd = pd.DataFrame({'A': [1, 2, 3], + ....: 'B': [4, 5, 6]}, + ....: index=list('abc')) + ....: + +In [94]: dfd +Out[94]: + A B +a 1 4 +b 2 5 +c 3 6 +``` + +以前的行为,您希望从“A”列中获取索引中的第0个和第2个元素。 + +``` python +In [3]: dfd.ix[[0, 2], 'A'] +Out[3]: +a 1 +c 3 +Name: A, dtype: int64 +``` + +用``.loc``。这里我们将从索引中选择适当的索引,然后使用*标签*索引。 + +``` python +In [95]: dfd.loc[dfd.index[[0, 2]], 'A'] +Out[95]: +a 1 +c 3 +Name: A, dtype: int64 +``` + +这也可以``.iloc``通过在索引器上显式获取位置,并使用 + *位置*索引来选择事物来表达。 + +``` python +In [96]: dfd.iloc[[0, 2], dfd.columns.get_loc('A')] +Out[96]: +a 1 +c 3 +Name: A, dtype: int64 +``` + +要获得*多个*索引器,请使用``.get_indexer``: + +``` python +In [97]: dfd.iloc[[0, 2], dfd.columns.get_indexer(['A', 'B'])] +Out[97]: + A B +a 1 4 +c 3 6 +``` + +## 不推荐使用缺少标签的列表进行索引 + +::: danger 警告 + +从0.21.0开始,使用``.loc``或``[]``包含一个或多个缺少标签的列表,不赞成使用``.reindex``。 + +::: + +在以前的版本中,``.loc[list-of-labels]``只要找到*至少1*个密钥,使用就可以工作(否则会引起a ``KeyError``)。不推荐使用此行为,并将显示指向此部分的警告消息。推荐的替代方案是使用``.reindex()``。 + +例如。 + +``` python +In [98]: s = pd.Series([1, 2, 3]) + +In [99]: s +Out[99]: +0 1 +1 2 +2 3 +dtype: int64 +``` + +找到所有键的选择保持不变。 + +``` python +In [100]: s.loc[[1, 2]] +Out[100]: +1 2 +2 3 +dtype: int64 +``` + +以前的行为 + +``` python +In [4]: s.loc[[1, 2, 3]] +Out[4]: +1 2.0 +2 3.0 +3 NaN +dtype: float64 +``` + +目前的行为 + +``` python +In [4]: s.loc[[1, 2, 3]] +Passing list-likes to .loc with any non-matching elements will raise +KeyError in the future, you can use .reindex() as an alternative. + +See the documentation here: +http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike + +Out[4]: +1 2.0 +2 3.0 +3 NaN +dtype: float64 +``` + +### 重新索引 + +实现选择潜在的未找到元素的惯用方法是通过``.reindex()``。另请参阅[重建索引](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-reindexing)部分。 + +``` python +In [101]: s.reindex([1, 2, 3]) +Out[101]: +1 2.0 +2 3.0 +3 NaN +dtype: float64 +``` + +或者,如果您只想选择*有效的*密钥,则以下是惯用且有效的; 保证保留选择的dtype。 + +``` python +In [102]: labels = [1, 2, 3] + +In [103]: s.loc[s.index.intersection(labels)] +Out[103]: +1 2 +2 3 +dtype: int64 +``` + +拥有重复索引会引发``.reindex()``: + +``` python +In [104]: s = pd.Series(np.arange(4), index=['a', 'a', 'b', 'c']) + +In [105]: labels = ['c', 'd'] +``` + +``` python +In [17]: s.reindex(labels) +ValueError: cannot reindex from a duplicate axis +``` + +通常,您可以将所需标签与当前轴相交,然后重新索引。 + +``` python +In [106]: s.loc[s.index.intersection(labels)].reindex(labels) +Out[106]: +c 3.0 +d NaN +dtype: float64 +``` + +但是,如果生成的索引重复,这*仍然会*提高。 + +``` python +In [41]: labels = ['a', 'd'] + +In [42]: s.loc[s.index.intersection(labels)].reindex(labels) +ValueError: cannot reindex from a duplicate axis +``` + +## 选择随机样本 + +使用该[``sample()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html#pandas.DataFrame.sample)方法随机选择Series或DataFrame中的行或列。默认情况下,该方法将对行进行采样,并接受要返回的特定行数/列数或一小部分行。 + +``` python +In [107]: s = pd.Series([0, 1, 2, 3, 4, 5]) + +# When no arguments are passed, returns 1 row. +In [108]: s.sample() +Out[108]: +4 4 +dtype: int64 + +# One may specify either a number of rows: +In [109]: s.sample(n=3) +Out[109]: +0 0 +4 4 +1 1 +dtype: int64 + +# Or a fraction of the rows: +In [110]: s.sample(frac=0.5) +Out[110]: +5 5 +3 3 +1 1 +dtype: int64 +``` + +默认情况下,``sample``最多会返回每行一次,但也可以使用以下``replace``选项进行替换: + +``` python +In [111]: s = pd.Series([0, 1, 2, 3, 4, 5]) + +# Without replacement (default): +In [112]: s.sample(n=6, replace=False) +Out[112]: +0 0 +1 1 +5 5 +3 3 +2 2 +4 4 +dtype: int64 + +# With replacement: +In [113]: s.sample(n=6, replace=True) +Out[113]: +0 0 +4 4 +3 3 +2 2 +4 4 +4 4 +dtype: int64 +``` + +默认情况下,每行具有相同的选择概率,但如果您希望行具有不同的概率,则可以将``sample``函数采样权重作为 + ``weights``。这些权重可以是列表,NumPy数组或系列,但它们的长度必须与您采样的对象的长度相同。缺失的值将被视为零的权重,并且不允许使用inf值。如果权重不总和为1,则通过将所有权重除以权重之和来对它们进行重新规范化。例如: + +``` python +In [114]: s = pd.Series([0, 1, 2, 3, 4, 5]) + +In [115]: example_weights = [0, 0, 0.2, 0.2, 0.2, 0.4] + +In [116]: s.sample(n=3, weights=example_weights) +Out[116]: +5 5 +4 4 +3 3 +dtype: int64 + +# Weights will be re-normalized automatically +In [117]: example_weights2 = [0.5, 0, 0, 0, 0, 0] + +In [118]: s.sample(n=1, weights=example_weights2) +Out[118]: +0 0 +dtype: int64 +``` + +应用于DataFrame时,只需将列的名称作为字符串传递,就可以使用DataFrame的列作为采样权重(假设您要对行而不是列进行采样)。 + +``` python +In [119]: df2 = pd.DataFrame({'col1': [9, 8, 7, 6], + .....: 'weight_column': [0.5, 0.4, 0.1, 0]}) + .....: + +In [120]: df2.sample(n=3, weights='weight_column') +Out[120]: + col1 weight_column +1 8 0.4 +0 9 0.5 +2 7 0.1 +``` + +``sample``还允许用户使用``axis``参数对列而不是行进行采样。 + +``` python +In [121]: df3 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [2, 3, 4]}) + +In [122]: df3.sample(n=1, axis=1) +Out[122]: + col1 +0 1 +1 2 +2 3 +``` + +最后,还可以``sample``使用``random_state``参数为随机数生成器设置种子,该参数将接受整数(作为种子)或NumPy RandomState对象。 + +``` python +In [123]: df4 = pd.DataFrame({'col1': [1, 2, 3], 'col2': [2, 3, 4]}) + +# With a given seed, the sample will always draw the same rows. +In [124]: df4.sample(n=2, random_state=2) +Out[124]: + col1 col2 +2 3 4 +1 2 3 + +In [125]: df4.sample(n=2, random_state=2) +Out[125]: + col1 col2 +2 3 4 +1 2 3 +``` + +## 用放大设定 + +``.loc/[]``当为该轴设置不存在的键时,操作可以执行放大。 + +在这种``Series``情况下,这实际上是一种附加操作。 + +``` python +In [126]: se = pd.Series([1, 2, 3]) + +In [127]: se +Out[127]: +0 1 +1 2 +2 3 +dtype: int64 + +In [128]: se[5] = 5. + +In [129]: se +Out[129]: +0 1.0 +1 2.0 +2 3.0 +5 5.0 +dtype: float64 +``` + +A ``DataFrame``可以在任一轴上放大``.loc``。 + +``` python +In [130]: dfi = pd.DataFrame(np.arange(6).reshape(3, 2), + .....: columns=['A', 'B']) + .....: + +In [131]: dfi +Out[131]: + A B +0 0 1 +1 2 3 +2 4 5 + +In [132]: dfi.loc[:, 'C'] = dfi.loc[:, 'A'] + +In [133]: dfi +Out[133]: + A B C +0 0 1 0 +1 2 3 2 +2 4 5 4 +``` + +这就像是一个``append``操作``DataFrame``。 + +``` python +In [134]: dfi.loc[3] = 5 + +In [135]: dfi +Out[135]: + A B C +0 0 1 0 +1 2 3 2 +2 4 5 4 +3 5 5 5 +``` + +## 快速标量值获取和设置 + +因为索引``[]``必须处理很多情况(单标签访问,切片,布尔索引等),所以它有一些开销以便弄清楚你要求的是什么。如果您只想访问标量值,最快的方法是使用在所有数据结构上实现的``at``和``iat``方法。 + +与之类似``loc``,``at``提供基于**标签**的标量查找,同时``iat``提供类似于基于**整数**的查找``iloc`` + +``` python +In [136]: s.iat[5] +Out[136]: 5 + +In [137]: df.at[dates[5], 'A'] +Out[137]: -0.6736897080883706 + +In [138]: df.iat[3, 0] +Out[138]: 0.7215551622443669 +``` + +您也可以使用这些相同的索引器进行设置。 + +``` python +In [139]: df.at[dates[5], 'E'] = 7 + +In [140]: df.iat[3, 0] = 7 +``` + +``at`` 如果索引器丢失,可以如上所述放大对象。 + +``` python +In [141]: df.at[dates[-1] + pd.Timedelta('1 day'), 0] = 7 + +In [142]: df +Out[142]: + A B C D E 0 +2000-01-01 0.469112 -0.282863 -1.509059 -1.135632 NaN NaN +2000-01-02 1.212112 -0.173215 0.119209 -1.044236 NaN NaN +2000-01-03 -0.861849 -2.104569 -0.494929 1.071804 NaN NaN +2000-01-04 7.000000 -0.706771 -1.039575 0.271860 NaN NaN +2000-01-05 -0.424972 0.567020 0.276232 -1.087401 NaN NaN +2000-01-06 -0.673690 0.113648 -1.478427 0.524988 7.0 NaN +2000-01-07 0.404705 0.577046 -1.715002 -1.039268 NaN NaN +2000-01-08 -0.370647 -1.157892 -1.344312 0.844885 NaN NaN +2000-01-09 NaN NaN NaN NaN NaN 7.0 +``` + +## 布尔索引 + +另一种常见操作是使用布尔向量来过滤数据。运营商是:``|``for ``or``,``&``for ``and``和``~``for ``not``。**必须**使用括号对这些进行分组,因为默认情况下,Python将评估表达式,例如as + ,而期望的评估顺序是 + 。``df.A > 2 & df.B < 3````df.A > (2 & df.B) < 3````(df.A > 2) & (df.B < 3)`` + +使用布尔向量索引系列的工作方式与NumPy ndarray完全相同: + +``` python +In [143]: s = pd.Series(range(-3, 4)) + +In [144]: s +Out[144]: +0 -3 +1 -2 +2 -1 +3 0 +4 1 +5 2 +6 3 +dtype: int64 + +In [145]: s[s > 0] +Out[145]: +4 1 +5 2 +6 3 +dtype: int64 + +In [146]: s[(s < -1) | (s > 0.5)] +Out[146]: +0 -3 +1 -2 +4 1 +5 2 +6 3 +dtype: int64 + +In [147]: s[~(s < 0)] +Out[147]: +3 0 +4 1 +5 2 +6 3 +dtype: int64 +``` + +您可以使用与DataFrame索引长度相同的布尔向量从DataFrame中选择行(例如,从DataFrame的其中一列派生的东西): + +``` python +In [148]: df[df['A'] > 0] +Out[148]: + A B C D E 0 +2000-01-01 0.469112 -0.282863 -1.509059 -1.135632 NaN NaN +2000-01-02 1.212112 -0.173215 0.119209 -1.044236 NaN NaN +2000-01-04 7.000000 -0.706771 -1.039575 0.271860 NaN NaN +2000-01-07 0.404705 0.577046 -1.715002 -1.039268 NaN NaN +``` + +列表推导和``map``系列方法也可用于产生更复杂的标准: + +``` python +In [149]: df2 = pd.DataFrame({'a': ['one', 'one', 'two', 'three', 'two', 'one', 'six'], + .....: 'b': ['x', 'y', 'y', 'x', 'y', 'x', 'x'], + .....: 'c': np.random.randn(7)}) + .....: + +# only want 'two' or 'three' +In [150]: criterion = df2['a'].map(lambda x: x.startswith('t')) + +In [151]: df2[criterion] +Out[151]: + a b c +2 two y 0.041290 +3 three x 0.361719 +4 two y -0.238075 + +# equivalent but slower +In [152]: df2[[x.startswith('t') for x in df2['a']]] +Out[152]: + a b c +2 two y 0.041290 +3 three x 0.361719 +4 two y -0.238075 + +# Multiple criteria +In [153]: df2[criterion & (df2['b'] == 'x')] +Out[153]: + a b c +3 three x 0.361719 +``` + +随着选择方法[通过标签选择](#indexing-label),[通过位置选择](#indexing-integer)和[高级索引](advanced.html#advanced),你可以沿着使用布尔向量与其他索引表达式中组合选择多个轴。 + +``` python +In [154]: df2.loc[criterion & (df2['b'] == 'x'), 'b':'c'] +Out[154]: + b c +3 x 0.361719 +``` + +## 使用isin进行索引 + +考虑一下[``isin()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.isin.html#pandas.Series.isin)方法``Series``,该方法返回一个布尔向量,只要``Series``元素存在于传递列表中,该向量就为真。这允许您选择一列或多列具有所需值的行: + +``` python +In [155]: s = pd.Series(np.arange(5), index=np.arange(5)[::-1], dtype='int64') + +In [156]: s +Out[156]: +4 0 +3 1 +2 2 +1 3 +0 4 +dtype: int64 + +In [157]: s.isin([2, 4, 6]) +Out[157]: +4 False +3 False +2 True +1 False +0 True +dtype: bool + +In [158]: s[s.isin([2, 4, 6])] +Out[158]: +2 2 +0 4 +dtype: int64 +``` + +``Index``对象可以使用相同的方法,当您不知道哪些搜索标签实际存在时,它们非常有用: + +``` python +In [159]: s[s.index.isin([2, 4, 6])] +Out[159]: +4 0 +2 2 +dtype: int64 + +# compare it to the following +In [160]: s.reindex([2, 4, 6]) +Out[160]: +2 2.0 +4 0.0 +6 NaN +dtype: float64 +``` + +除此之外,还``MultiIndex``允许选择在成员资格检查中使用的单独级别: + +``` python +In [161]: s_mi = pd.Series(np.arange(6), + .....: index=pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']])) + .....: + +In [162]: s_mi +Out[162]: +0 a 0 + b 1 + c 2 +1 a 3 + b 4 + c 5 +dtype: int64 + +In [163]: s_mi.iloc[s_mi.index.isin([(1, 'a'), (2, 'b'), (0, 'c')])] +Out[163]: +0 c 2 +1 a 3 +dtype: int64 + +In [164]: s_mi.iloc[s_mi.index.isin(['a', 'c', 'e'], level=1)] +Out[164]: +0 a 0 + c 2 +1 a 3 + c 5 +dtype: int64 +``` + +DataFrame也有一个[``isin()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html#pandas.DataFrame.isin)方法。调用时``isin``,将一组值作为数组或字典传递。如果values是一个数组,则``isin``返回与原始DataFrame形状相同的布尔数据框,并在元素序列中的任何位置使用True。 + +``` python +In [165]: df = pd.DataFrame({'vals': [1, 2, 3, 4], 'ids': ['a', 'b', 'f', 'n'], + .....: 'ids2': ['a', 'n', 'c', 'n']}) + .....: + +In [166]: values = ['a', 'b', 1, 3] + +In [167]: df.isin(values) +Out[167]: + vals ids ids2 +0 True True True +1 False True False +2 True False False +3 False False False +``` + +通常,您需要将某些值与某些列匹配。只需将值设置``dict``为键为列的位置,值即为要检查的项目列表。 + +``` python +In [168]: values = {'ids': ['a', 'b'], 'vals': [1, 3]} + +In [169]: df.isin(values) +Out[169]: + vals ids ids2 +0 True True False +1 False True False +2 True False False +3 False False False +``` + +结合数据帧的``isin``同``any()``和``all()``方法来快速选择符合给定的标准对数据子集。要选择每列符合其自己标准的行: + +``` python +In [170]: values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]} + +In [171]: row_mask = df.isin(values).all(1) + +In [172]: df[row_mask] +Out[172]: + vals ids ids2 +0 1 a a +``` + +## 该``where()``方法和屏蔽 + +从具有布尔向量的Series中选择值通常会返回数据的子集。为了保证选择输出与原始数据具有相同的形状,您可以``where``在``Series``和中使用该方法``DataFrame``。 + +仅返回选定的行: + +``` python +In [173]: s[s > 0] +Out[173]: +3 1 +2 2 +1 3 +0 4 +dtype: int64 +``` + +要返回与原始形状相同的系列: + +``` python +In [174]: s.where(s > 0) +Out[174]: +4 NaN +3 1.0 +2 2.0 +1 3.0 +0 4.0 +dtype: float64 +``` + +现在,使用布尔标准从DataFrame中选择值也可以保留输入数据形状。``where``在引擎盖下用作实现。下面的代码相当于。``df.where(df < 0)`` + +``` python +In [175]: df[df < 0] +Out[175]: + A B C D +2000-01-01 -2.104139 -1.309525 NaN NaN +2000-01-02 -0.352480 NaN -1.192319 NaN +2000-01-03 -0.864883 NaN -0.227870 NaN +2000-01-04 NaN -1.222082 NaN -1.233203 +2000-01-05 NaN -0.605656 -1.169184 NaN +2000-01-06 NaN -0.948458 NaN -0.684718 +2000-01-07 -2.670153 -0.114722 NaN -0.048048 +2000-01-08 NaN NaN -0.048788 -0.808838 +``` + +此外,在返回的副本中,``where``使用可选``other``参数替换条件为False的值。 + +``` python +In [176]: df.where(df < 0, -df) +Out[176]: + A B C D +2000-01-01 -2.104139 -1.309525 -0.485855 -0.245166 +2000-01-02 -0.352480 -0.390389 -1.192319 -1.655824 +2000-01-03 -0.864883 -0.299674 -0.227870 -0.281059 +2000-01-04 -0.846958 -1.222082 -0.600705 -1.233203 +2000-01-05 -0.669692 -0.605656 -1.169184 -0.342416 +2000-01-06 -0.868584 -0.948458 -2.297780 -0.684718 +2000-01-07 -2.670153 -0.114722 -0.168904 -0.048048 +2000-01-08 -0.801196 -1.392071 -0.048788 -0.808838 +``` + +您可能希望根据某些布尔条件设置值。这可以直观地完成,如下所示: + +``` python +In [177]: s2 = s.copy() + +In [178]: s2[s2 < 0] = 0 + +In [179]: s2 +Out[179]: +4 0 +3 1 +2 2 +1 3 +0 4 +dtype: int64 + +In [180]: df2 = df.copy() + +In [181]: df2[df2 < 0] = 0 + +In [182]: df2 +Out[182]: + A B C D +2000-01-01 0.000000 0.000000 0.485855 0.245166 +2000-01-02 0.000000 0.390389 0.000000 1.655824 +2000-01-03 0.000000 0.299674 0.000000 0.281059 +2000-01-04 0.846958 0.000000 0.600705 0.000000 +2000-01-05 0.669692 0.000000 0.000000 0.342416 +2000-01-06 0.868584 0.000000 2.297780 0.000000 +2000-01-07 0.000000 0.000000 0.168904 0.000000 +2000-01-08 0.801196 1.392071 0.000000 0.000000 +``` + +默认情况下,``where``返回数据的修改副本。有一个可选参数,``inplace``以便可以在不创建副本的情况下修改原始数据: + +``` python +In [183]: df_orig = df.copy() + +In [184]: df_orig.where(df > 0, -df, inplace=True) + +In [185]: df_orig +Out[185]: + A B C D +2000-01-01 2.104139 1.309525 0.485855 0.245166 +2000-01-02 0.352480 0.390389 1.192319 1.655824 +2000-01-03 0.864883 0.299674 0.227870 0.281059 +2000-01-04 0.846958 1.222082 0.600705 1.233203 +2000-01-05 0.669692 0.605656 1.169184 0.342416 +2000-01-06 0.868584 0.948458 2.297780 0.684718 +2000-01-07 2.670153 0.114722 0.168904 0.048048 +2000-01-08 0.801196 1.392071 0.048788 0.808838 +``` + +::: tip 注意 + +签名[``DataFrame.where()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html#pandas.DataFrame.where)不同于[``numpy.where()``](https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html#numpy.where)。大致相当于。``df1.where(m, df2)````np.where(m, df1, df2)`` + +``` python +In [186]: df.where(df < 0, -df) == np.where(df < 0, df, -df) +Out[186]: + A B C D +2000-01-01 True True True True +2000-01-02 True True True True +2000-01-03 True True True True +2000-01-04 True True True True +2000-01-05 True True True True +2000-01-06 True True True True +2000-01-07 True True True True +2000-01-08 True True True True +``` + +::: + +**对准** + +此外,``where``对齐输入布尔条件(ndarray或DataFrame),以便可以使用设置进行部分选择。这类似于部分设置通过``.loc``(但是在内容而不是轴标签上)。 + +``` python +In [187]: df2 = df.copy() + +In [188]: df2[df2[1:4] > 0] = 3 + +In [189]: df2 +Out[189]: + A B C D +2000-01-01 -2.104139 -1.309525 0.485855 0.245166 +2000-01-02 -0.352480 3.000000 -1.192319 3.000000 +2000-01-03 -0.864883 3.000000 -0.227870 3.000000 +2000-01-04 3.000000 -1.222082 3.000000 -1.233203 +2000-01-05 0.669692 -0.605656 -1.169184 0.342416 +2000-01-06 0.868584 -0.948458 2.297780 -0.684718 +2000-01-07 -2.670153 -0.114722 0.168904 -0.048048 +2000-01-08 0.801196 1.392071 -0.048788 -0.808838 +``` + +哪里也可以接受``axis``和``level``参数在执行时对齐输入``where``。 + +``` python +In [190]: df2 = df.copy() + +In [191]: df2.where(df2 > 0, df2['A'], axis='index') +Out[191]: + A B C D +2000-01-01 -2.104139 -2.104139 0.485855 0.245166 +2000-01-02 -0.352480 0.390389 -0.352480 1.655824 +2000-01-03 -0.864883 0.299674 -0.864883 0.281059 +2000-01-04 0.846958 0.846958 0.600705 0.846958 +2000-01-05 0.669692 0.669692 0.669692 0.342416 +2000-01-06 0.868584 0.868584 2.297780 0.868584 +2000-01-07 -2.670153 -2.670153 0.168904 -2.670153 +2000-01-08 0.801196 1.392071 0.801196 0.801196 +``` + +这相当于(但快于)以下内容。 + +``` python +In [192]: df2 = df.copy() + +In [193]: df.apply(lambda x, y: x.where(x > 0, y), y=df['A']) +Out[193]: + A B C D +2000-01-01 -2.104139 -2.104139 0.485855 0.245166 +2000-01-02 -0.352480 0.390389 -0.352480 1.655824 +2000-01-03 -0.864883 0.299674 -0.864883 0.281059 +2000-01-04 0.846958 0.846958 0.600705 0.846958 +2000-01-05 0.669692 0.669692 0.669692 0.342416 +2000-01-06 0.868584 0.868584 2.297780 0.868584 +2000-01-07 -2.670153 -2.670153 0.168904 -2.670153 +2000-01-08 0.801196 1.392071 0.801196 0.801196 +``` + +*版本0.18.1中的新功能。* + +哪里可以接受一个可调用的条件和``other``参数。该函数必须带有一个参数(调用Series或DataFrame),并返回有效的输出作为条件和``other``参数。 + +``` python +In [194]: df3 = pd.DataFrame({'A': [1, 2, 3], + .....: 'B': [4, 5, 6], + .....: 'C': [7, 8, 9]}) + .....: + +In [195]: df3.where(lambda x: x > 4, lambda x: x + 10) +Out[195]: + A B C +0 11 14 7 +1 12 5 8 +2 13 6 9 +``` + +### 面具 + +[``mask()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mask.html#pandas.DataFrame.mask)是的反布尔运算``where``。 + +``` python +In [196]: s.mask(s >= 0) +Out[196]: +4 NaN +3 NaN +2 NaN +1 NaN +0 NaN +dtype: float64 + +In [197]: df.mask(df >= 0) +Out[197]: + A B C D +2000-01-01 -2.104139 -1.309525 NaN NaN +2000-01-02 -0.352480 NaN -1.192319 NaN +2000-01-03 -0.864883 NaN -0.227870 NaN +2000-01-04 NaN -1.222082 NaN -1.233203 +2000-01-05 NaN -0.605656 -1.169184 NaN +2000-01-06 NaN -0.948458 NaN -0.684718 +2000-01-07 -2.670153 -0.114722 NaN -0.048048 +2000-01-08 NaN NaN -0.048788 -0.808838 +``` + +## 该``query()``方法 + +[``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)对象有一个[``query()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query) +允许使用表达式进行选择的方法。 + +您可以获取列的值,其中列``b``具有列值``a``和值之间的值``c``。例如: + +``` python +In [198]: n = 10 + +In [199]: df = pd.DataFrame(np.random.rand(n, 3), columns=list('abc')) + +In [200]: df +Out[200]: + a b c +0 0.438921 0.118680 0.863670 +1 0.138138 0.577363 0.686602 +2 0.595307 0.564592 0.520630 +3 0.913052 0.926075 0.616184 +4 0.078718 0.854477 0.898725 +5 0.076404 0.523211 0.591538 +6 0.792342 0.216974 0.564056 +7 0.397890 0.454131 0.915716 +8 0.074315 0.437913 0.019794 +9 0.559209 0.502065 0.026437 + +# pure python +In [201]: df[(df.a < df.b) & (df.b < df.c)] +Out[201]: + a b c +1 0.138138 0.577363 0.686602 +4 0.078718 0.854477 0.898725 +5 0.076404 0.523211 0.591538 +7 0.397890 0.454131 0.915716 + +# query +In [202]: df.query('(a < b) & (b < c)') +Out[202]: + a b c +1 0.138138 0.577363 0.686602 +4 0.078718 0.854477 0.898725 +5 0.076404 0.523211 0.591538 +7 0.397890 0.454131 0.915716 +``` + +如果没有名称的列,则执行相同的操作但返回命名索引``a``。 + +``` python +In [203]: df = pd.DataFrame(np.random.randint(n / 2, size=(n, 2)), columns=list('bc')) + +In [204]: df.index.name = 'a' + +In [205]: df +Out[205]: + b c +a +0 0 4 +1 0 1 +2 3 4 +3 4 3 +4 1 4 +5 0 3 +6 0 1 +7 3 4 +8 2 3 +9 1 1 + +In [206]: df.query('a < b and b < c') +Out[206]: + b c +a +2 3 4 +``` + +如果您不希望或不能命名索引,则可以``index``在查询表达式中使用该名称 + : + +``` python +In [207]: df = pd.DataFrame(np.random.randint(n, size=(n, 2)), columns=list('bc')) + +In [208]: df +Out[208]: + b c +0 3 1 +1 3 0 +2 5 6 +3 5 2 +4 7 4 +5 0 1 +6 2 5 +7 0 1 +8 6 0 +9 7 9 + +In [209]: df.query('index < b < c') +Out[209]: + b c +2 5 6 +``` + +::: tip 注意 + +如果索引的名称与列名称重叠,则列名称优先。例如, + +``` python +In [210]: df = pd.DataFrame({'a': np.random.randint(5, size=5)}) + +In [211]: df.index.name = 'a' + +In [212]: df.query('a > 2') # uses the column 'a', not the index +Out[212]: + a +a +1 3 +3 3 +``` + +您仍然可以使用特殊标识符'index'在查询表达式中使用索引: + +``` python +In [213]: df.query('index > 2') +Out[213]: + a +a +3 3 +4 2 +``` + +如果由于某种原因你有一个名为列的列``index``,那么你也可以引用索引``ilevel_0``,但是此时你应该考虑将列重命名为不那么模糊的列。 + +::: + +### ``MultiIndex`` ``query()``语法 + +您还可以使用的水平``DataFrame``带 + [``MultiIndex``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html#pandas.MultiIndex),好像他们是在框架柱: + +``` python +In [214]: n = 10 + +In [215]: colors = np.random.choice(['red', 'green'], size=n) + +In [216]: foods = np.random.choice(['eggs', 'ham'], size=n) + +In [217]: colors +Out[217]: +array(['red', 'red', 'red', 'green', 'green', 'green', 'green', 'green', + 'green', 'green'], dtype=' +``` + +### ``query()``Python与pandas语法比较 + +完全类似numpy的语法: + +``` python +In [232]: df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=list('abc')) + +In [233]: df +Out[233]: + a b c +0 7 8 9 +1 1 0 7 +2 2 7 2 +3 6 2 2 +4 2 6 3 +5 3 8 2 +6 1 7 2 +7 5 1 5 +8 9 8 0 +9 1 5 0 + +In [234]: df.query('(a < b) & (b < c)') +Out[234]: + a b c +0 7 8 9 + +In [235]: df[(df.a < df.b) & (df.b < df.c)] +Out[235]: + a b c +0 7 8 9 +``` + +通过删除括号略微更好(通过绑定使比较运算符绑定比``&``和更紧``|``)。 + +``` python +In [236]: df.query('a < b & b < c') +Out[236]: + a b c +0 7 8 9 +``` + +使用英语而不是符号: + +``` python +In [237]: df.query('a < b and b < c') +Out[237]: + a b c +0 7 8 9 +``` + +非常接近你如何在纸上写它: + +``` python +In [238]: df.query('a < b < c') +Out[238]: + a b c +0 7 8 9 +``` + +### 在``in``与运营商``not in`` + +[``query()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query)还支持Python ``in``和 + 比较运算符的特殊用法,为调用或的方法提供了简洁的语法 + 。``not in````isin````Series````DataFrame`` + +``` python +# get all rows where columns "a" and "b" have overlapping values +In [239]: df = pd.DataFrame({'a': list('aabbccddeeff'), 'b': list('aaaabbbbcccc'), + .....: 'c': np.random.randint(5, size=12), + .....: 'd': np.random.randint(9, size=12)}) + .....: + +In [240]: df +Out[240]: + a b c d +0 a a 2 6 +1 a a 4 7 +2 b a 1 6 +3 b a 2 1 +4 c b 3 6 +5 c b 0 2 +6 d b 3 3 +7 d b 2 1 +8 e c 4 3 +9 e c 2 0 +10 f c 0 6 +11 f c 1 2 + +In [241]: df.query('a in b') +Out[241]: + a b c d +0 a a 2 6 +1 a a 4 7 +2 b a 1 6 +3 b a 2 1 +4 c b 3 6 +5 c b 0 2 + +# How you'd do it in pure Python +In [242]: df[df.a.isin(df.b)] +Out[242]: + a b c d +0 a a 2 6 +1 a a 4 7 +2 b a 1 6 +3 b a 2 1 +4 c b 3 6 +5 c b 0 2 + +In [243]: df.query('a not in b') +Out[243]: + a b c d +6 d b 3 3 +7 d b 2 1 +8 e c 4 3 +9 e c 2 0 +10 f c 0 6 +11 f c 1 2 + +# pure Python +In [244]: df[~df.a.isin(df.b)] +Out[244]: + a b c d +6 d b 3 3 +7 d b 2 1 +8 e c 4 3 +9 e c 2 0 +10 f c 0 6 +11 f c 1 2 +``` + +您可以将此与其他表达式结合使用,以获得非常简洁的查询: + +``` python +# rows where cols a and b have overlapping values +# and col c's values are less than col d's +In [245]: df.query('a in b and c < d') +Out[245]: + a b c d +0 a a 2 6 +1 a a 4 7 +2 b a 1 6 +4 c b 3 6 +5 c b 0 2 + +# pure Python +In [246]: df[df.b.isin(df.a) & (df.c < df.d)] +Out[246]: + a b c d +0 a a 2 6 +1 a a 4 7 +2 b a 1 6 +4 c b 3 6 +5 c b 0 2 +10 f c 0 6 +11 f c 1 2 +``` + +::: tip 注意 + +请注意``in``并在Python中进行评估,因为 +它没有相应的操作。但是,**只有** / **expression本身**在vanilla Python中进行评估。例如,在表达式中``not in````numexpr``**** ``in````not in`` +**** + +``` python +df.query('a in b + c + d') +``` + +``(b + c + d)``通过评估``numexpr``和*然后*的``in`` +操作在普通的Python评价。通常,任何可以使用的评估操作``numexpr``都是。 + +::: + +### ``==``运算符与``list``对象的特殊用法 + +一个比较``list``值的使用列``==``/ ``!=``工程,以类似``in``/ 。``not in`` + +``` python +In [247]: df.query('b == ["a", "b", "c"]') +Out[247]: + a b c d +0 a a 2 6 +1 a a 4 7 +2 b a 1 6 +3 b a 2 1 +4 c b 3 6 +5 c b 0 2 +6 d b 3 3 +7 d b 2 1 +8 e c 4 3 +9 e c 2 0 +10 f c 0 6 +11 f c 1 2 + +# pure Python +In [248]: df[df.b.isin(["a", "b", "c"])] +Out[248]: + a b c d +0 a a 2 6 +1 a a 4 7 +2 b a 1 6 +3 b a 2 1 +4 c b 3 6 +5 c b 0 2 +6 d b 3 3 +7 d b 2 1 +8 e c 4 3 +9 e c 2 0 +10 f c 0 6 +11 f c 1 2 + +In [249]: df.query('c == [1, 2]') +Out[249]: + a b c d +0 a a 2 6 +2 b a 1 6 +3 b a 2 1 +7 d b 2 1 +9 e c 2 0 +11 f c 1 2 + +In [250]: df.query('c != [1, 2]') +Out[250]: + a b c d +1 a a 4 7 +4 c b 3 6 +5 c b 0 2 +6 d b 3 3 +8 e c 4 3 +10 f c 0 6 + +# using in/not in +In [251]: df.query('[1, 2] in c') +Out[251]: + a b c d +0 a a 2 6 +2 b a 1 6 +3 b a 2 1 +7 d b 2 1 +9 e c 2 0 +11 f c 1 2 + +In [252]: df.query('[1, 2] not in c') +Out[252]: + a b c d +1 a a 4 7 +4 c b 3 6 +5 c b 0 2 +6 d b 3 3 +8 e c 4 3 +10 f c 0 6 + +# pure Python +In [253]: df[df.c.isin([1, 2])] +Out[253]: + a b c d +0 a a 2 6 +2 b a 1 6 +3 b a 2 1 +7 d b 2 1 +9 e c 2 0 +11 f c 1 2 +``` + +### 布尔运算符 + +您可以使用单词``not``或``~``运算符否定布尔表达式。 + +``` python +In [254]: df = pd.DataFrame(np.random.rand(n, 3), columns=list('abc')) + +In [255]: df['bools'] = np.random.rand(len(df)) > 0.5 + +In [256]: df.query('~bools') +Out[256]: + a b c bools +2 0.697753 0.212799 0.329209 False +7 0.275396 0.691034 0.826619 False +8 0.190649 0.558748 0.262467 False + +In [257]: df.query('not bools') +Out[257]: + a b c bools +2 0.697753 0.212799 0.329209 False +7 0.275396 0.691034 0.826619 False +8 0.190649 0.558748 0.262467 False + +In [258]: df.query('not bools') == df[~df.bools] +Out[258]: + a b c bools +2 True True True True +7 True True True True +8 True True True True +``` + +当然,表达式也可以是任意复杂的: + +``` python +# short query syntax +In [259]: shorter = df.query('a < b < c and (not bools) or bools > 2') + +# equivalent in pure Python +In [260]: longer = df[(df.a < df.b) & (df.b < df.c) & (~df.bools) | (df.bools > 2)] + +In [261]: shorter +Out[261]: + a b c bools +7 0.275396 0.691034 0.826619 False + +In [262]: longer +Out[262]: + a b c bools +7 0.275396 0.691034 0.826619 False + +In [263]: shorter == longer +Out[263]: + a b c bools +7 True True True True +``` + +### 的表现¶``query()`` + +``DataFrame.query()````numexpr``对于大型帧,使用比Python略快。 + +![query-perf](https://static.pypandas.cn/public/static/images/query-perf.png) + +::: tip 注意 + +如果您的框架超过大约200,000行,您将只看到使用``numexpr``引擎的性能优势``DataFrame.query()``。 + +![query-perf-small](https://static.pypandas.cn/public/static/images/query-perf-small.png) + +::: + +此图是使用``DataFrame``3列创建的,每列包含使用生成的浮点值``numpy.random.randn()``。 + +## 重复数据 + +如果要识别和删除DataFrame中的重复行,有两种方法可以提供帮助:``duplicated``和``drop_duplicates``。每个都将用于标识重复行的列作为参数。 + +- ``duplicated`` 返回一个布尔向量,其长度为行数,表示行是否重复。 +- ``drop_duplicates`` 删除重复的行。 + +默认情况下,重复集的第一个观察行被认为是唯一的,但每个方法都有一个``keep``参数来指定要保留的目标。 + +- ``keep='first'`` (默认值):标记/删除重复项,第一次出现除外。 +- ``keep='last'``:标记/删除重复项,除了最后一次出现。 +- ``keep=False``:标记/删除所有重复项。 + +``` python +In [264]: df2 = pd.DataFrame({'a': ['one', 'one', 'two', 'two', 'two', 'three', 'four'], + .....: 'b': ['x', 'y', 'x', 'y', 'x', 'x', 'x'], + .....: 'c': np.random.randn(7)}) + .....: + +In [265]: df2 +Out[265]: + a b c +0 one x -1.067137 +1 one y 0.309500 +2 two x -0.211056 +3 two y -1.842023 +4 two x -0.390820 +5 three x -1.964475 +6 four x 1.298329 + +In [266]: df2.duplicated('a') +Out[266]: +0 False +1 True +2 False +3 True +4 True +5 False +6 False +dtype: bool + +In [267]: df2.duplicated('a', keep='last') +Out[267]: +0 True +1 False +2 True +3 True +4 False +5 False +6 False +dtype: bool + +In [268]: df2.duplicated('a', keep=False) +Out[268]: +0 True +1 True +2 True +3 True +4 True +5 False +6 False +dtype: bool + +In [269]: df2.drop_duplicates('a') +Out[269]: + a b c +0 one x -1.067137 +2 two x -0.211056 +5 three x -1.964475 +6 four x 1.298329 + +In [270]: df2.drop_duplicates('a', keep='last') +Out[270]: + a b c +1 one y 0.309500 +4 two x -0.390820 +5 three x -1.964475 +6 four x 1.298329 + +In [271]: df2.drop_duplicates('a', keep=False) +Out[271]: + a b c +5 three x -1.964475 +6 four x 1.298329 +``` + +此外,您可以传递列表列表以识别重复。 + +``` python +In [272]: df2.duplicated(['a', 'b']) +Out[272]: +0 False +1 False +2 False +3 False +4 True +5 False +6 False +dtype: bool + +In [273]: df2.drop_duplicates(['a', 'b']) +Out[273]: + a b c +0 one x -1.067137 +1 one y 0.309500 +2 two x -0.211056 +3 two y -1.842023 +5 three x -1.964475 +6 four x 1.298329 +``` + +要按索引值删除重复项,请使用``Index.duplicated``然后执行切片。``keep``参数可以使用相同的选项集。 + +``` python +In [274]: df3 = pd.DataFrame({'a': np.arange(6), + .....: 'b': np.random.randn(6)}, + .....: index=['a', 'a', 'b', 'c', 'b', 'a']) + .....: + +In [275]: df3 +Out[275]: + a b +a 0 1.440455 +a 1 2.456086 +b 2 1.038402 +c 3 -0.894409 +b 4 0.683536 +a 5 3.082764 + +In [276]: df3.index.duplicated() +Out[276]: array([False, True, False, False, True, True]) + +In [277]: df3[~df3.index.duplicated()] +Out[277]: + a b +a 0 1.440455 +b 2 1.038402 +c 3 -0.894409 + +In [278]: df3[~df3.index.duplicated(keep='last')] +Out[278]: + a b +c 3 -0.894409 +b 4 0.683536 +a 5 3.082764 + +In [279]: df3[~df3.index.duplicated(keep=False)] +Out[279]: + a b +c 3 -0.894409 +``` + +## 类字典``get()``方法 + +Series或DataFrame中的每一个都有一个``get``可以返回默认值的方法。 + +``` python +In [280]: s = pd.Series([1, 2, 3], index=['a', 'b', 'c']) + +In [281]: s.get('a') # equivalent to s['a'] +Out[281]: 1 + +In [282]: s.get('x', default=-1) +Out[282]: -1 +``` + +## 该``lookup()``方法 + +有时,您希望在给定一系列行标签和列标签的情况下提取一组值,并且该``lookup``方法允许此操作并返回NumPy数组。例如: + +``` python +In [283]: dflookup = pd.DataFrame(np.random.rand(20, 4), columns = ['A', 'B', 'C', 'D']) + +In [284]: dflookup.lookup(list(range(0, 10, 2)), ['B', 'C', 'A', 'B', 'D']) +Out[284]: array([0.3506, 0.4779, 0.4825, 0.9197, 0.5019]) +``` + +## 索引对象 + +pandas [``Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index)类及其子类可以视为实现*有序的多集合*。允许重复。但是,如果您尝试将[``Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index)具有重复条目的对象转换为a + ``set``,则会引发异常。 + +[``Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index)还提供了查找,数据对齐和重建索引所需的基础结构。[``Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index)直接创建的最简单方法 + 是将一个``list``或其他序列传递给 + [``Index``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index): + +``` python +In [285]: index = pd.Index(['e', 'd', 'a', 'b']) + +In [286]: index +Out[286]: Index(['e', 'd', 'a', 'b'], dtype='object') + +In [287]: 'd' in index +Out[287]: True +``` + +您还可以传递一个``name``存储在索引中: + +``` python +In [288]: index = pd.Index(['e', 'd', 'a', 'b'], name='something') + +In [289]: index.name +Out[289]: 'something' +``` + +名称(如果已设置)将显示在控制台显示中: + +``` python +In [290]: index = pd.Index(list(range(5)), name='rows') + +In [291]: columns = pd.Index(['A', 'B', 'C'], name='cols') + +In [292]: df = pd.DataFrame(np.random.randn(5, 3), index=index, columns=columns) + +In [293]: df +Out[293]: +cols A B C +rows +0 1.295989 0.185778 0.436259 +1 0.678101 0.311369 -0.528378 +2 -0.674808 -1.103529 -0.656157 +3 1.889957 2.076651 -1.102192 +4 -1.211795 -0.791746 0.634724 + +In [294]: df['A'] +Out[294]: +rows +0 1.295989 +1 0.678101 +2 -0.674808 +3 1.889957 +4 -1.211795 +Name: A, dtype: float64 +``` + +### 设置元数据 + +索引是“不可改变的大多是”,但它可以设置和改变它们的元数据,如指数``name``(或为``MultiIndex``,``levels``和 + ``codes``)。 + +您可以使用``rename``,``set_names``,``set_levels``,和``set_codes`` +直接设置这些属性。他们默认返回一份副本; 但是,您可以指定``inplace=True``使数据更改到位。 + +有关MultiIndexes的使用,请参阅[高级索引](advanced.html#advanced)。 + +``` python +In [295]: ind = pd.Index([1, 2, 3]) + +In [296]: ind.rename("apple") +Out[296]: Int64Index([1, 2, 3], dtype='int64', name='apple') + +In [297]: ind +Out[297]: Int64Index([1, 2, 3], dtype='int64') + +In [298]: ind.set_names(["apple"], inplace=True) + +In [299]: ind.name = "bob" + +In [300]: ind +Out[300]: Int64Index([1, 2, 3], dtype='int64', name='bob') +``` + +``set_names``,``set_levels``并且``set_codes``还采用可选 + ``level``参数 + +``` python +In [301]: index = pd.MultiIndex.from_product([range(3), ['one', 'two']], names=['first', 'second']) + +In [302]: index +Out[302]: +MultiIndex([(0, 'one'), + (0, 'two'), + (1, 'one'), + (1, 'two'), + (2, 'one'), + (2, 'two')], + names=['first', 'second']) + +In [303]: index.levels[1] +Out[303]: Index(['one', 'two'], dtype='object', name='second') + +In [304]: index.set_levels(["a", "b"], level=1) +Out[304]: +MultiIndex([(0, 'a'), + (0, 'b'), + (1, 'a'), + (1, 'b'), + (2, 'a'), + (2, 'b')], + names=['first', 'second']) +``` + +### 在Index对象上设置操作 + +两个主要业务是和。这些可以直接称为实例方法,也可以通过重载运算符使用。通过该方法提供差异。``union (|)````intersection (&)````.difference()`` + +``` python +In [305]: a = pd.Index(['c', 'b', 'a']) + +In [306]: b = pd.Index(['c', 'e', 'd']) + +In [307]: a | b +Out[307]: Index(['a', 'b', 'c', 'd', 'e'], dtype='object') + +In [308]: a & b +Out[308]: Index(['c'], dtype='object') + +In [309]: a.difference(b) +Out[309]: Index(['a', 'b'], dtype='object') +``` + +同时还提供了操作,它返回出现在任一元件或,但不是在两者。这相当于创建的索引,删除了重复项。``symmetric_difference (^)````idx1````idx2````idx1.difference(idx2).union(idx2.difference(idx1))`` + +``` python +In [310]: idx1 = pd.Index([1, 2, 3, 4]) + +In [311]: idx2 = pd.Index([2, 3, 4, 5]) + +In [312]: idx1.symmetric_difference(idx2) +Out[312]: Int64Index([1, 5], dtype='int64') + +In [313]: idx1 ^ idx2 +Out[313]: Int64Index([1, 5], dtype='int64') +``` + +::: tip 注意 + +来自设置操作的结果索引将按升序排序。 + +::: + +在[``Index.union()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.union.html#pandas.Index.union)具有不同dtypes的索引之间执行时,必须将索引强制转换为公共dtype。通常,虽然并非总是如此,但这是对象dtype。例外是在整数和浮点数据之间执行联合。在这种情况下,整数值将转换为float + +``` python +In [314]: idx1 = pd.Index([0, 1, 2]) + +In [315]: idx2 = pd.Index([0.5, 1.5]) + +In [316]: idx1 | idx2 +Out[316]: Float64Index([0.0, 0.5, 1.0, 1.5, 2.0], dtype='float64') +``` + +### 缺少值 + +即使``Index``可以保存缺失值(``NaN``),但如果您不想要任何意外结果,也应该避免使用。例如,某些操作会隐式排除缺失值。 + +``Index.fillna`` 使用指定的标量值填充缺失值。 + +``` python +In [317]: idx1 = pd.Index([1, np.nan, 3, 4]) + +In [318]: idx1 +Out[318]: Float64Index([1.0, nan, 3.0, 4.0], dtype='float64') + +In [319]: idx1.fillna(2) +Out[319]: Float64Index([1.0, 2.0, 3.0, 4.0], dtype='float64') + +In [320]: idx2 = pd.DatetimeIndex([pd.Timestamp('2011-01-01'), + .....: pd.NaT, + .....: pd.Timestamp('2011-01-03')]) + .....: + +In [321]: idx2 +Out[321]: DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03'], dtype='datetime64[ns]', freq=None) + +In [322]: idx2.fillna(pd.Timestamp('2011-01-02')) +Out[322]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns]', freq=None) +``` + +## 设置/重置索引 + +有时您会将数据集加载或创建到DataFrame中,并希望在您已经完成之后添加索引。有几种不同的方式。 + +### 设置索引 + +DataFrame有一个[``set_index()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html#pandas.DataFrame.set_index)方法,它采用列名(对于常规``Index``)或列名列表(对于a ``MultiIndex``)。要创建新的重新索引的DataFrame: + +``` python +In [323]: data +Out[323]: + a b c d +0 bar one z 1.0 +1 bar two y 2.0 +2 foo one x 3.0 +3 foo two w 4.0 + +In [324]: indexed1 = data.set_index('c') + +In [325]: indexed1 +Out[325]: + a b d +c +z bar one 1.0 +y bar two 2.0 +x foo one 3.0 +w foo two 4.0 + +In [326]: indexed2 = data.set_index(['a', 'b']) + +In [327]: indexed2 +Out[327]: + c d +a b +bar one z 1.0 + two y 2.0 +foo one x 3.0 + two w 4.0 +``` + +该``append``关键字选项让你保持现有索引并追加给列一个多指标: + +``` python +In [328]: frame = data.set_index('c', drop=False) + +In [329]: frame = frame.set_index(['a', 'b'], append=True) + +In [330]: frame +Out[330]: + c d +c a b +z bar one z 1.0 +y bar two y 2.0 +x foo one x 3.0 +w foo two w 4.0 +``` + +其他选项``set_index``允许您不删除索引列或就地添加索引(不创建新对象): + +``` python +In [331]: data.set_index('c', drop=False) +Out[331]: + a b c d +c +z bar one z 1.0 +y bar two y 2.0 +x foo one x 3.0 +w foo two w 4.0 + +In [332]: data.set_index(['a', 'b'], inplace=True) + +In [333]: data +Out[333]: + c d +a b +bar one z 1.0 + two y 2.0 +foo one x 3.0 + two w 4.0 +``` + +### 重置索引 + +为方便起见,DataFrame上有一个新函数,它将 + [``reset_index()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html#pandas.DataFrame.reset_index)索引值传输到DataFrame的列中并设置一个简单的整数索引。这是反向操作[``set_index()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html#pandas.DataFrame.set_index)。 + +``` python +In [334]: data +Out[334]: + c d +a b +bar one z 1.0 + two y 2.0 +foo one x 3.0 + two w 4.0 + +In [335]: data.reset_index() +Out[335]: + a b c d +0 bar one z 1.0 +1 bar two y 2.0 +2 foo one x 3.0 +3 foo two w 4.0 +``` + +输出更类似于SQL表或记录数组。从索引派生的列的名称是存储在``names``属性中的名称。 + +您可以使用``level``关键字仅删除索引的一部分: + +``` python +In [336]: frame +Out[336]: + c d +c a b +z bar one z 1.0 +y bar two y 2.0 +x foo one x 3.0 +w foo two w 4.0 + +In [337]: frame.reset_index(level=1) +Out[337]: + a c d +c b +z one bar z 1.0 +y two bar y 2.0 +x one foo x 3.0 +w two foo w 4.0 +``` + +``reset_index``采用一个可选参数``drop``,如果为true,则只丢弃索引,而不是将索引值放在DataFrame的列中。 + +### 添加ad hoc索引 + +如果您自己创建索引,则可以将其分配给``index``字段: + +``` python +data.index = index +``` + +## 返回视图与副本 + +在pandas对象中设置值时,必须注意避免调用所谓的对象 + 。这是一个例子。``chained indexing`` + +``` python +In [338]: dfmi = pd.DataFrame([list('abcd'), + .....: list('efgh'), + .....: list('ijkl'), + .....: list('mnop')], + .....: columns=pd.MultiIndex.from_product([['one', 'two'], + .....: ['first', 'second']])) + .....: + +In [339]: dfmi +Out[339]: + one two + first second first second +0 a b c d +1 e f g h +2 i j k l +3 m n o p +``` + +比较这两种访问方法: + +``` python +In [340]: dfmi['one']['second'] +Out[340]: +0 b +1 f +2 j +3 n +Name: second, dtype: object +``` + +``` python +In [341]: dfmi.loc[:, ('one', 'second')] +Out[341]: +0 b +1 f +2 j +3 n +Name: (one, second), dtype: object +``` + +这两者都产生相同的结果,所以你应该使用哪个?理解这些操作的顺序以及为什么方法2(``.loc``)比方法1(链接``[]``)更受欢迎是有益的。 + +``dfmi['one']``选择列的第一级并返回单索引的DataFrame。然后另一个Python操作``dfmi_with_one['second']``选择索引的系列``'second'``。这由变量指示,``dfmi_with_one``因为pandas将这些操作视为单独的事件。例如,单独调用``__getitem__``,因此它必须将它们视为线性操作,它们一个接一个地发生。 + +对比这个``df.loc[:,('one','second')]``将一个嵌套的元组传递``(slice(None),('one','second'))``给一个单独的调用 + ``__getitem__``。这允许pandas将其作为单个实体来处理。此外,这种操作顺序*可以*明显更快,并且如果需要,允许人们对*两个*轴进行索引。 + +### 使用链式索引时为什么分配失败? + +上一节中的问题只是一个性能问题。这是怎么回事与``SettingWithCopy``警示?当你做一些可能花费几毫秒的事情时,我们**通常**不会发出警告! + +但事实证明,分配链式索引的产品具有固有的不可预测的结果。要看到这一点,请考虑Python解释器如何执行此代码: + +``` python +dfmi.loc[:, ('one', 'second')] = value +# becomes +dfmi.loc.__setitem__((slice(None), ('one', 'second')), value) +``` + +但是这个代码的处理方式不同: + +``` python +dfmi['one']['second'] = value +# becomes +dfmi.__getitem__('one').__setitem__('second', value) +``` + +看到``__getitem__``那里?除了简单的情况之外,很难预测它是否会返回一个视图或一个副本(它取决于数组的内存布局,关于哪些pandas不能保证),因此是否``__setitem__``会修改``dfmi``或者是一个临时对象之后立即抛出。**那**什么``SettingWithCopy``是警告你! + +::: tip 注意 + +您可能想知道我们是否应该关注``loc`` +第一个示例中的属性。但``dfmi.loc``保证``dfmi`` +本身具有修改的索引行为,因此``dfmi.loc.__getitem__``/ + 直接``dfmi.loc.__setitem__``操作``dfmi``。当然, + ``dfmi.loc.__getitem__(idx)``可能是一个视图或副本``dfmi``。 + +::: + +有时``SettingWithCopy``,当没有明显的链式索引时,会出现警告。**这些**``SettingWithCopy``是旨在捕获的错误 + !Pandas可能会试图警告你,你已经这样做了: + +``` python +def do_something(df): + foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows! + # ... many lines here ... + # We don't know whether this will modify df or not! + foo['quux'] = value + return foo +``` + +哎呀! + +### 评估订单事项 + +使用链式索引时,索引操作的顺序和类型会部分确定结果是原始对象的切片还是切片的副本。 + +Pandas有,``SettingWithCopyWarning``因为分配一个切片的副本通常不是故意的,而是由链式索引引起的错误返回一个预期切片的副本。 + +如果您希望pandas或多或少地信任链接索引表达式的赋值,则可以将[选项](options.html#options) +设置``mode.chained_assignment``为以下值之一: + +- ``'warn'``,默认值表示``SettingWithCopyWarning``打印。 +- ``'raise'``意味着大Pandas会提出``SettingWithCopyException`` +你必须处理的事情。 +- ``None`` 将完全压制警告。 + +``` python +In [342]: dfb = pd.DataFrame({'a': ['one', 'one', 'two', + .....: 'three', 'two', 'one', 'six'], + .....: 'c': np.arange(7)}) + .....: + +# This will show the SettingWithCopyWarning +# but the frame values will be set +In [343]: dfb['c'][dfb.a.str.startswith('o')] = 42 +``` + +然而,这是在副本上运行,不起作用。 + +``` python +>>> pd.set_option('mode.chained_assignment','warn') +>>> dfb[dfb.a.str.startswith('o')]['c'] = 42 +Traceback (most recent call last) + ... +SettingWithCopyWarning: + A value is trying to be set on a copy of a slice from a DataFrame. + Try using .loc[row_index,col_indexer] = value instead +``` + +链式分配也可以在混合dtype帧中进行设置。 + +::: tip 注意 + +这些设置规则适用于所有``.loc/.iloc``。 + +::: + +这是正确的访问方法: + +``` python +In [344]: dfc = pd.DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]}) + +In [345]: dfc.loc[0, 'A'] = 11 + +In [346]: dfc +Out[346]: + A B +0 11 1 +1 bbb 2 +2 ccc 3 +``` + +这有时*会*起作用,但不能保证,因此应该避免: + +``` python +In [347]: dfc = dfc.copy() + +In [348]: dfc['A'][0] = 111 + +In [349]: dfc +Out[349]: + A B +0 111 1 +1 bbb 2 +2 ccc 3 +``` + +这**根本**不起作用,所以应该避免: + +``` python +>>> pd.set_option('mode.chained_assignment','raise') +>>> dfc.loc[0]['A'] = 1111 +Traceback (most recent call last) + ... +SettingWithCopyException: + A value is trying to be set on a copy of a slice from a DataFrame. + Try using .loc[row_index,col_indexer] = value instead +``` + +::: danger 警告 + +链式分配警告/异常旨在通知用户可能无效的分配。可能存在误报; 无意中报告链式作业的情况。 + +::: diff --git a/Python/pandas/user_guide/integer_na.md b/Python/pandas/user_guide/integer_na.md new file mode 100644 index 00000000..5e5f4442 --- /dev/null +++ b/Python/pandas/user_guide/integer_na.md @@ -0,0 +1,175 @@ +--- +meta: + - name: keywords + content: Nullable,整型数据类型 + - name: description + content: 在处理丢失的数据部分, 我们知道pandas主要使用 NaN 来代表丢失数据。因为 NaN 属于浮点型数据,这强制有缺失值的整型array强制转换成浮点型。 +--- + +# Nullable整型数据类型 + +*在0.24.0版本中新引入* + +::: tip 小贴士 + +IntegerArray目前属于实验性阶段,因此他的API或者使用方式可能会在没有提示的情况下更改。 + +::: + +在 [处理丢失的数据](missing_data.html#missing-data)部分, 我们知道pandas主要使用 ``NaN`` 来代表丢失数据。因为 ``NaN`` 属于浮点型数据,这强制有缺失值的整型array强制转换成浮点型。在某些情况下,这可能不会有太大影响,但是如果你的整型数据恰好是标识符,数据类型的转换可能会存在隐患。同时,某些整数无法使用浮点型来表示。 + +Pandas能够将可能存在缺失值的整型数据使用[``arrays.IntegerArray``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.arrays.IntegerArray.html#pandas.arrays.IntegerArray)来表示。这是pandas中内置的 [扩展方式](https://pandas.pydata.org/pandas-docs/stable/development/extending.html#extending-extension-types)。 它并不是整型数据组成array对象的默认方式,并且并不会被pandas直接使用。因此,如果你希望生成这种数据类型,你需要在生成[``array()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.array.html#pandas.array) 或者 [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series)时,在``dtype``变量中直接指定。 + +``` python +In [1]: arr = pd.array([1, 2, np.nan], dtype=pd.Int64Dtype()) + +In [2]: arr +Out[2]: + +[1, 2, NaN] +Length: 3, dtype: Int64 +``` + +或者使用字符串``"Int64"``(注意此处的 ``"I"``需要大写,以此和NumPy中的``'int64'``数据类型作出区别): + +``` python +In [3]: pd.array([1, 2, np.nan], dtype="Int64") +Out[3]: + +[1, 2, NaN] +Length: 3, dtype: Int64 +``` + +这样的array对象与NumPy的array对象类似,可以被存放在[``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) 或 [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series)中。 + +``` python +In [4]: pd.Series(arr) +Out[4]: +0 1 +1 2 +2 NaN +dtype: Int64 +``` + +你也可以直接将列表形式的数据直接传入[``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series)中,并指明``dtype``。 + +``` python +In [5]: s = pd.Series([1, 2, np.nan], dtype="Int64") + +In [6]: s +Out[6]: +0 1 +1 2 +2 NaN +dtype: Int64 +``` + +默认情况下(如果你不指明``dtype``),则会使用NumPy来构建这个数据,最终你会得到``float64``类型的Series: + +``` python +In [7]: pd.Series([1, 2, np.nan]) +Out[7]: +0 1.0 +1 2.0 +2 NaN +dtype: float64 +``` + +对使用了整型array的操作与对NumPy中array的操作类似,缺失值会被继承并保留原本的数据类型,但在必要的情况下,数据类型也会发生转变。 + +``` python +# 运算 +In [8]: s + 1 +Out[8]: +0 2 +1 3 +2 NaN +dtype: Int64 + +# 比较 +In [9]: s == 1 +Out[9]: +0 True +1 False +2 False +dtype: bool + +# 索引 +In [10]: s.iloc[1:3] +Out[10]: +1 2 +2 NaN +dtype: Int64 + +# 和其他数据类型联合使用 +In [11]: s + s.iloc[1:3].astype('Int8') +Out[11]: +0 NaN +1 4 +2 NaN +dtype: Int64 + +# 在必要情况下,数据类型发生转变 +In [12]: s + 0.01 +Out[12]: +0 1.01 +1 2.01 +2 NaN +dtype: float64 +``` + +这种数据类型可以作为 ``DataFrame``的一部分进行使用。 + +``` python +In [13]: df = pd.DataFrame({'A': s, 'B': [1, 1, 3], 'C': list('aab')}) + +In [14]: df +Out[14]: + A B C +0 1 1 a +1 2 1 a +2 NaN 3 b + +In [15]: df.dtypes +Out[15]: +A Int64 +B int64 +C object +dtype: object +``` + +这种数据类型也可以在合并(merge)、重构(reshape)和类型转换(cast)。 + +``` python +In [16]: pd.concat([df[['A']], df[['B', 'C']]], axis=1).dtypes +Out[16]: +A Int64 +B int64 +C object +dtype: object + +In [17]: df['A'].astype(float) +Out[17]: +0 1.0 +1 2.0 +2 NaN +Name: A, dtype: float64 +``` + +类似于求和的降维和分组操作也能正常使用。 + +``` python +In [18]: df.sum() +Out[18]: +A 3 +B 5 +C aab +dtype: object + +In [19]: df.groupby('B').A.sum() +Out[19]: +B +1 3 +3 0 +Name: A, dtype: Int64 +``` diff --git a/Python/pandas/user_guide/io.md b/Python/pandas/user_guide/io.md new file mode 100644 index 00000000..55874820 --- /dev/null +++ b/Python/pandas/user_guide/io.md @@ -0,0 +1,7123 @@ +# IO工具(文本,CSV,HDF5,…) + +pandas的I/O API是一组``read``函数,比如[``pandas.read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)函数。这类函数可以返回pandas对象。相应的``write``函数是像[``DataFrame.to_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv)一样的对象方法。下面是一个方法列表,包含了这里面的所有``readers``函数和``writer``函数。 + +Format Type | Data Description | Reader | Writer +---|---|---|--- +text | [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) | [read_csv](#io-read-csv-table) | [to_csv](#io-store-in-csv) +text | [JSON](https://www.json.org/) | [read_json](#io-json-reader) | [to_json](#io-json-writer) +text | [HTML](https://en.wikipedia.org/wiki/HTML) | [read_html](#io-read-html) | [to_html](#io-html) +text | Local clipboard | [read_clipboard](#io-clipboard) | [to_clipboard](#io-clipboard) +binary | [MS Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) | [read_excel](#io-excel-reader) | [to_excel](#io-excel-writer) +binary | [OpenDocument](http://www.opendocumentformat.org) | [read_excel](#io-ods) |   +binary | [HDF5 Format](https://support.hdfgroup.org/HDF5/whatishdf5.html) | [read_hdf](#io-hdf5) | [to_hdf](#io-hdf5) +binary | [Feather Format](https://github.com/wesm/feather) | [read_feather](#io-feather) | [to_feather](#io-feather) +binary | [Parquet Format](https://parquet.apache.org/) | [read_parquet](#io-parquet) | [to_parquet](#io-parquet) +binary | [Msgpack](https://msgpack.org/index.html) | [read_msgpack](#io-msgpack) | [to_msgpack](#io-msgpack) +binary | [Stata](https://en.wikipedia.org/wiki/Stata) | [read_stata](#io-stata-reader) | [to_stata](#io-stata-writer) +binary | [SAS](https://en.wikipedia.org/wiki/SAS_(software)) | [read_sas](#io-sas-reader) |   +binary | [Python Pickle Format](https://docs.python.org/3/library/pickle.html) | [read_pickle](#io-pickle) | [to_pickle](#io-pickle) +[SQL](https://en.wikipedia.org/wiki/SQL) | SQL | [read_sql](#io-sql) | [to_sql](#io-sql) +SQL | [Google Big Query](https://en.wikipedia.org/wiki/BigQuery) | [read_gbq](#io-bigquery) | [to_gbq](#io-bigquery) + +[Here](#io-perf) is an informal performance comparison for some of these IO methods. + +::: tip 注意 + +比如在使用 ``StringIO`` 类时, 请先确定python的版本信息。也就是说,是使用python2的``from StringIO import StringIO``还是python3的``from io import StringIO``。 + +::: + +## CSV & 文本文件 + +读文本文件 (a.k.a. flat files)的主要方法 is +[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv). 关于一些更高级的用法请参阅[cookbook](cookbook.html#cookbook-csv)。 + +### 方法解析(Parsing options) + +[``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) 可接受以下常用参数: + +#### 基础 + +filepath_or_buffer : *various* + +- 文件路径 (a [``str``](https://docs.python.org/3/library/stdtypes.html#str), [``pathlib.Path``](https://docs.python.org/3/library/pathlib.html#pathlib.Path), +or ``py._path.local.LocalPath``), URL (including http, ftp, and S3 +locations), 或者具有 ``read()`` 方法的任何对象 (such as an open file or +[``StringIO``](https://docs.python.org/3/library/io.html#io.StringIO)). + +sep : *str, 默认 [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)分隔符为``','``, [``read_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_table.html#pandas.read_table)方法,分隔符为 ``\t``* + +- 分隔符的使用. 如果分隔符为``None``,虽然C不能解析,但python解析引擎可解析,这意味着python将被使用,通过内置的sniffer tool自动检测分隔符, +[``csv.Sniffer``](https://docs.python.org/3/library/csv.html#csv.Sniffer). 除此之外,字符长度超过1并且不同于 ``'s+'`` 的将被视为正则表达式,并且将强制使用python解析引擎。需要注意的是,正则表达式易于忽略引用数据(主要注意转义字符的使用) 例如: ``'\\r\\t'``. + +delimiter : *str, default ``None``* + +- sep的替代参数. + +delim_whitespace : *boolean, default False* + +- 指定是否将空格 (e.g. ``' '`` or ``'\t'``)当作delimiter。 +等价于设置 ``sep='\s+'``. +如果这个选项被设置为 ``True``,就不要给 +``delimiter`` 传参了. + +*version 0.18.1:* 支持Python解析器. + +#### 列、索引、名称 + +header : *int or list of ints, default ``'infer'``* + +- 当选择默认值或``header=0``时,将首行设为列名。如果列名被传入明确值就令``header=None``。注意,当``header=0``时,即使列名被传参也会被覆盖。 + + +- 标题可以是指定列上的MultiIndex的行位置的整数列表,例如 ``[0,1,3]``。在列名指定时,若某列未被指定,读取时将跳过该列 (例如 在下面的例子中第二列将被跳过).注意,如果 ``skip_blank_lines=True``,此参数将忽略空行和注释行, 因此 header=0 表示第一行数据而非文件的第一行. + +names : *array-like, default ``None``* + +- 列名列表的使用. 如果文件不包含列名,那么应该设置``header=None``。 列名列表中不允许有重复值. + +index_col : *int, str, sequence of int / str, or False, default ``None``* + +- ``DataFrame``的行索引列表, 既可以是字符串名称也可以是列索引. 如果传入一个字符串序列或者整数序列,那么一定要使用多级索引(MultiIndex). + +- 注意: 当``index_col=False`` ,pandas不再使用首列作为索引。例如, 当你的文件是一个每行末尾都带有一个分割符的格式错误的文件时. + +usecols : *list-like or callable, default ``None``* + +- 返回列名列表的子集. 如果该参数为列表形式, 那么所有元素应全为位置(即文档列中的整数索引)或者 全为相应列的列名字符串(这些列名字符串为*names*参数给出的或者文档的``header``行内容).例如,一个有效的列表型参数 +*usecols* 将会是是 ``[0, 1, 2]`` 或者 ``['foo', 'bar', 'baz']``. + +- 元素顺序可忽略,因此 ``usecols=[0, 1]``等价于 ``[1, 0]``。如果想实例化一个自定义列顺序的DataFrame,请使用``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` ,这样列的顺序为 ``['foo', 'bar']`` 。如果设置``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]`` 那么列的顺序为``['bar', 'foo']`` 。 + +- 如果使用callable的方式, 可调用函数将根据列名计算, +返回可调用函数计算结果为True的名称: + +``` python +In [1]: from io import StringIO, BytesIO + +In [2]: data = ('col1,col2,col3\n' + ...: 'a,b,1\n' + ...: 'a,b,2\n' + ...: 'c,d,3') + ...: + +In [3]: pd.read_csv(StringIO(data)) +Out[3]: + col1 col2 col3 +0 a b 1 +1 a b 2 +2 c d 3 + +In [4]: pd.read_csv(StringIO(data), usecols=lambda x: x.upper() in ['COL1', 'COL3']) +Out[4]: + col1 col3 +0 a 1 +1 a 2 +2 c 3 + +``` + +使用此参数可以大大加快解析时间并降低内存使用率。 + +squeeze : *boolean, default ``False``* + +- 如果解析的数据仅包含一个列,那么结果将以 ``Series``的形式返回. + +prefix : *str, default ``None``* + +- 当没有header时,可通过该参数为数字列名添加前缀, e.g. ‘X’ for X0, X1, … + +mangle_dupe_cols : *boolean, default ``True``* + +- 当列名有重复时,解析列名将变为 ‘X’, ‘X.1’…’X.N’而不是 ‘X’…’X’。 如果该参数为 ``False`` ,那么当列名中有重复时,前列将会被后列覆盖。 + +#### 常规解析配置 + +dtype : *Type name or dict of column -> type, default ``None``* + +- 指定某列或整体数据的数据类型. E.g. ``{'a': np.float64, 'b': np.int32}`` +(不支持 ``engine='python'``).将*str*或*object*与合适的设置一起使用以保留和不解释dtype。 + +- *New in version 0.20.0:* 支持python解析器. + +engine : *{``'c'``, ``'python'``}* + +- 解析引擎的使用。 尽管C引擎速度更快,但是目前python引擎功能更加完美。 + +converters : *dict, default ``None``* + +- Dict of functions for converting values in certain columns. Keys can either be integers or column labels. + +true_values : *list, default ``None``* + +- Values to consider as ``True``. + +false_values : *list, default ``None``* + +- Values to consider as ``False``. + +skipinitialspace : *boolean, default ``False``* + +- Skip spaces after delimiter. + +skiprows : *list-like or integer, default ``None``* + +- Line numbers to skip (0-indexed) or number of lines to skip (int) at the start +of the file. + +- If callable, the callable function will be evaluated against the row +indices, returning True if the row should be skipped and False otherwise: + +``` python +In [5]: data = ('col1,col2,col3\n' + ...: 'a,b,1\n' + ...: 'a,b,2\n' + ...: 'c,d,3') + ...: + +In [6]: pd.read_csv(StringIO(data)) +Out[6]: + col1 col2 col3 +0 a b 1 +1 a b 2 +2 c d 3 + +In [7]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0) +Out[7]: + col1 col2 col3 +0 a b 2 + +``` + +skipfooter : *int, default ``0``* + +- Number of lines at bottom of file to skip (unsupported with engine=’c’). + +nrows : *int, default ``None``* + +- Number of rows of file to read. Useful for reading pieces of large files. + +low_memory : *boolean, default ``True``* + +- Internally process the file in chunks, resulting in lower memory use +while parsing, but possibly mixed type inference. To ensure no mixed +types either set ``False``, or specify the type with the ``dtype`` parameter. +Note that the entire file is read into a single ``DataFrame`` regardless, +use the ``chunksize`` or ``iterator`` parameter to return the data in chunks. +(Only valid with C parser) + +memory_map : *boolean, default False* + +- If a filepath is provided for ``filepath_or_buffer``, map the file object +directly onto memory and access the data directly from there. Using this +option can improve performance because there is no longer any I/O overhead. + +#### NA and missing data handling + +na_values : *scalar, str, list-like, or dict, default ``None``* + +- Additional strings to recognize as NA/NaN. If dict passed, specific per-column +NA values. See [na values const](#io-navaluesconst) below +for a list of the values interpreted as NaN by default. + +keep_default_na : *boolean, default ``True``* + +- Whether or not to include the default NaN values when parsing the data. +Depending on whether *na_values* is passed in, the behavior is as follows: + - If *keep_default_na* is ``True``, and *na_values* are specified, *na_values* + is appended to the default NaN values used for parsing. + - If *keep_default_na* is ``True``, and *na_values* are not specified, only + the default NaN values are used for parsing. + - If *keep_default_na* is ``False``, and *na_values* are specified, only + the NaN values specified *na_values* are used for parsing. + - If *keep_default_na* is ``False``, and *na_values* are not specified, no + strings will be parsed as NaN. + + Note that if *na_filter* is passed in as ``False``, the *keep_default_na* and *na_values* parameters will be ignored. + +na_filter : *boolean, default ``True``* + +- Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing ``na_filter=False`` can improve the performance of reading a large file. + +verbose : *boolean, default ``False``* + +- Indicate number of NA values placed in non-numeric columns. + +skip_blank_lines : *boolean, default ``True``* + +- If ``True``, skip over blank lines rather than interpreting as NaN values. + +#### Datetime handling + +parse_dates : *boolean or list of ints or names or list of lists or dict, default ``False``.* + +- If ``True`` -> try parsing the index. +- If ``[1, 2, 3]`` -> try parsing columns 1, 2, 3 each as a separate date +column. +- If ``[[1, 3]]`` -> combine columns 1 and 3 and parse as a single date +column. +- If ``{'foo': [1, 3]}`` -> parse columns 1, 3 as date and call result ‘foo’. +A fast-path exists for iso8601-formatted dates. + +infer_datetime_format : *boolean, default ``False``* + +- If ``True`` and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processing. + +keep_date_col : *boolean, default ``False``* + +- If ``True`` and parse_dates specifies combining multiple columns then keep the original columns. + +date_parser : *function, default ``None``* + +- Function to use for converting a sequence of string columns to an array of +datetime instances. The default uses ``dateutil.parser.parser`` to do the +conversion. pandas will try to call date_parser in three different ways, +advancing to the next if an exception occurs: 1) Pass one or more arrays (as +defined by parse_dates) as arguments; 2) concatenate (row-wise) the string +values from the columns defined by parse_dates into a single array and pass +that; and 3) call date_parser once for each row using one or more strings +(corresponding to the columns defined by parse_dates) as arguments. + +dayfirst : *boolean, default ``False``* + +- DD/MM format dates, international and European format. + +cache_dates : *boolean, default True* + +- If True, use a cache of unique, converted dates to apply the datetime +conversion. May produce significant speed-up when parsing duplicate +date strings, especially ones with timezone offsets. + +*New in version 0.25.0.* + +#### Iteration + +iterator : *boolean, default ``False``* + +- Return TextFileReader object for iteration or getting chunks with ``get_chunk()``. + +chunksize : *int, default ``None``* + +- Return TextFileReader object for iteration. See [iterating and chunking](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-chunking) below. + +#### Quoting, compression, and file format + +compression : *{``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``}, default ``'infer'``* + +- For on-the-fly decompression of on-disk data. If ‘infer’, then use gzip, +bz2, zip, or xz if filepath_or_buffer is a string ending in ‘.gz’, ‘.bz2’, +‘.zip’, or ‘.xz’, respectively, and no decompression otherwise. If using ‘zip’, +the ZIP file must contain only one data file to be read in. +Set to ``None`` for no decompression. + +*New in version 0.18.1:* support for ‘zip’ and ‘xz’ compression. + +*Changed in version 0.24.0:* ‘infer’ option added and set to default. + +thousands : *str, default ``None``* + +- Thousands separator. + +decimal : *str, default ``'.'``* + +- Character to recognize as decimal point. E.g. use ',' for European data. + +float_precision : *string, default None* + +- Specifies which converter the C engine should use for floating-point values. +The options are ``None`` for the ordinary converter, ``high`` for the +high-precision converter, and ``round_trip`` for the round-trip converter. + +lineterminator : *str (length 1), default ``None``* + +- Character to break file into lines. Only valid with C parser. + +quotechar : *str (length 1)* + +- The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored. + +quoting : *int or ``csv.QUOTE_*`` instance, default ``0``* + +- Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of ``QUOTE_MINIMAL`` (0), ``QUOTE_ALL`` (1), ``QUOTE_NONNUMERIC`` (2) or ``QUOTE_NONE`` (3). + +doublequote : *boolean, default ``True``* + +- When ``quotechar`` is specified and ``quoting`` is not ``QUOTE_NONE``, indicate whether or not to interpret two consecutive ``quotechar`` elements **inside** a field as a single ``quotechar`` element. + +escapechar : *str (length 1), default ``None``* + +- One-character string used to escape delimiter when quoting is ``QUOTE_NONE``. + +comment : *str, default ``None``* + +- Indicates remainder of line should not be parsed. If found at the beginning of +a line, the line will be ignored altogether. This parameter must be a single +character. Like empty lines (as long as ``skip_blank_lines=True``), fully +commented lines are ignored by the parameter *header* but not by *skiprows*. +For example, if ``comment='#'``, parsing ‘#empty a,b,c 1,2,3’ with *header=0* will result in ‘a,b,c’ being treated as the header. + +encoding : *str, default ``None``* + +- Encoding to use for UTF when reading/writing (e.g. ``'utf-8'``). [List of Python standard encodings](https://docs.python.org/3/library/codecs.html#standard-encodings). + +dialect : *str or [``csv.Dialect``](https://docs.python.org/3/library/csv.html#csv.Dialect) instance, default ``None``* + +- If provided, this parameter will override values (default or not) for the following parameters: *delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting.* If it is necessary to override values, a ParserWarning will be issued. See [csv.Dialect](https://docs.python.org/3/library/csv.html#csv.Dialect) documentation for more details. + +#### Error handling + +error_bad_lines : *boolean, default ``True``* + +- Lines with too many fields (e.g. a csv line with too many commas) will by +default cause an exception to be raised, and no ``DataFrame`` will be +returned. If ``False``, then these “bad lines” will dropped from the +``DataFrame`` that is returned. See [bad lines](#io-bad-lines) below. + +warn_bad_lines : *boolean, default ``True``* + +- If error_bad_lines is ``False``, and warn_bad_lines is ``True``, a warning for each “bad line” will be output. + +### Specifying column data types + +You can indicate the data type for the whole ``DataFrame`` or individual +columns: + +``` python +In [8]: data = ('a,b,c,d\n' + ...: '1,2,3,4\n' + ...: '5,6,7,8\n' + ...: '9,10,11') + ...: + +In [9]: print(data) +a,b,c,d +1,2,3,4 +5,6,7,8 +9,10,11 + +In [10]: df = pd.read_csv(StringIO(data), dtype=object) + +In [11]: df +Out[11]: + a b c d +0 1 2 3 4 +1 5 6 7 8 +2 9 10 11 NaN + +In [12]: df['a'][0] +Out[12]: '1' + +In [13]: df = pd.read_csv(StringIO(data), + ....: dtype={'b': object, 'c': np.float64, 'd': 'Int64'}) + ....: + +In [14]: df.dtypes +Out[14]: +a int64 +b object +c float64 +d Int64 +dtype: object + +``` + +Fortunately, pandas offers more than one way to ensure that your column(s) +contain only one ``dtype``. If you’re unfamiliar with these concepts, you can +see [here](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes) to learn more about dtypes, and +[here](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-object-conversion) to learn more about ``object`` conversion in +pandas. + +For instance, you can use the ``converters`` argument +of [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv): + +``` python +In [15]: data = ("col_1\n" + ....: "1\n" + ....: "2\n" + ....: "'A'\n" + ....: "4.22") + ....: + +In [16]: df = pd.read_csv(StringIO(data), converters={'col_1': str}) + +In [17]: df +Out[17]: + col_1 +0 1 +1 2 +2 'A' +3 4.22 + +In [18]: df['col_1'].apply(type).value_counts() +Out[18]: + 4 +Name: col_1, dtype: int64 + +``` + +Or you can use the [``to_numeric()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html#pandas.to_numeric) function to coerce the +dtypes after reading in the data, + +``` python +In [19]: df2 = pd.read_csv(StringIO(data)) + +In [20]: df2['col_1'] = pd.to_numeric(df2['col_1'], errors='coerce') + +In [21]: df2 +Out[21]: + col_1 +0 1.00 +1 2.00 +2 NaN +3 4.22 + +In [22]: df2['col_1'].apply(type).value_counts() +Out[22]: + 4 +Name: col_1, dtype: int64 + +``` + +which will convert all valid parsing to floats, leaving the invalid parsing +as ``NaN``. + +Ultimately, how you deal with reading in columns containing mixed dtypes +depends on your specific needs. In the case above, if you wanted to ``NaN`` out +the data anomalies, then [``to_numeric()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html#pandas.to_numeric) is probably your best option. +However, if you wanted for all the data to be coerced, no matter the type, then +using the ``converters`` argument of [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) would certainly be +worth trying. + +*New in version 0.20.0:* support for the Python parser. + +The ``dtype`` option is supported by the ‘python’ engine. + +::: tip Note + +In some cases, reading in abnormal data with columns containing mixed dtypes +will result in an inconsistent dataset. If you rely on pandas to infer the +dtypes of your columns, the parsing engine will go and infer the dtypes for +different chunks of the data, rather than the whole dataset at once. Consequently, +you can end up with column(s) with mixed dtypes. For example, + +``` python +In [23]: col_1 = list(range(500000)) + ['a', 'b'] + list(range(500000)) + +In [24]: df = pd.DataFrame({'col_1': col_1}) + +In [25]: df.to_csv('foo.csv') + +In [26]: mixed_df = pd.read_csv('foo.csv') + +In [27]: mixed_df['col_1'].apply(type).value_counts() +Out[27]: + 737858 + 262144 +Name: col_1, dtype: int64 + +In [28]: mixed_df['col_1'].dtype +Out[28]: dtype('O') + +``` + +will result with *mixed_df* containing an ``int`` dtype for certain chunks +of the column, and ``str`` for others due to the mixed dtypes from the +data that was read in. It is important to note that the overall column will be +marked with a ``dtype`` of ``object``, which is used for columns with mixed dtypes. + +::: + +### Specifying categorical dtype + +*New in version 0.19.0.* + +``Categorical`` columns can be parsed directly by specifying ``dtype='category'`` or +``dtype=CategoricalDtype(categories, ordered)``. + +``` python +In [29]: data = ('col1,col2,col3\n' + ....: 'a,b,1\n' + ....: 'a,b,2\n' + ....: 'c,d,3') + ....: + +In [30]: pd.read_csv(StringIO(data)) +Out[30]: + col1 col2 col3 +0 a b 1 +1 a b 2 +2 c d 3 + +In [31]: pd.read_csv(StringIO(data)).dtypes +Out[31]: +col1 object +col2 object +col3 int64 +dtype: object + +In [32]: pd.read_csv(StringIO(data), dtype='category').dtypes +Out[32]: +col1 category +col2 category +col3 category +dtype: object + +``` + +Individual columns can be parsed as a ``Categorical`` using a dict +specification: + +``` python +In [33]: pd.read_csv(StringIO(data), dtype={'col1': 'category'}).dtypes +Out[33]: +col1 category +col2 object +col3 int64 +dtype: object + +``` + +*New in version 0.21.0.* + +Specifying ``dtype='category'`` will result in an unordered ``Categorical`` +whose ``categories`` are the unique values observed in the data. For more +control on the categories and order, create a +``CategoricalDtype`` ahead of time, and pass that for +that column’s ``dtype``. + +``` python +In [34]: from pandas.api.types import CategoricalDtype + +In [35]: dtype = CategoricalDtype(['d', 'c', 'b', 'a'], ordered=True) + +In [36]: pd.read_csv(StringIO(data), dtype={'col1': dtype}).dtypes +Out[36]: +col1 category +col2 object +col3 int64 +dtype: object + +``` + +When using ``dtype=CategoricalDtype``, “unexpected” values outside of +``dtype.categories`` are treated as missing values. + +``` python +In [37]: dtype = CategoricalDtype(['a', 'b', 'd']) # No 'c' + +In [38]: pd.read_csv(StringIO(data), dtype={'col1': dtype}).col1 +Out[38]: +0 a +1 a +2 NaN +Name: col1, dtype: category +Categories (3, object): [a, b, d] + +``` + +This matches the behavior of ``Categorical.set_categories()``. + +::: tip Note + +With ``dtype='category'``, the resulting categories will always be parsed +as strings (object dtype). If the categories are numeric they can be +converted using the [``to_numeric()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html#pandas.to_numeric) function, or as appropriate, another +converter such as [``to_datetime()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime). + +When ``dtype`` is a ``CategoricalDtype`` with homogeneous ``categories`` ( +all numeric, all datetimes, etc.), the conversion is done automatically. + +``` python +In [39]: df = pd.read_csv(StringIO(data), dtype='category') + +In [40]: df.dtypes +Out[40]: +col1 category +col2 category +col3 category +dtype: object + +In [41]: df['col3'] +Out[41]: +0 1 +1 2 +2 3 +Name: col3, dtype: category +Categories (3, object): [1, 2, 3] + +In [42]: df['col3'].cat.categories = pd.to_numeric(df['col3'].cat.categories) + +In [43]: df['col3'] +Out[43]: +0 1 +1 2 +2 3 +Name: col3, dtype: category +Categories (3, int64): [1, 2, 3] + +``` + +::: + +### Naming and using columns + +#### Handling column names + +A file may or may not have a header row. pandas assumes the first row should be +used as the column names: + +``` python +In [44]: data = ('a,b,c\n' + ....: '1,2,3\n' + ....: '4,5,6\n' + ....: '7,8,9') + ....: + +In [45]: print(data) +a,b,c +1,2,3 +4,5,6 +7,8,9 + +In [46]: pd.read_csv(StringIO(data)) +Out[46]: + a b c +0 1 2 3 +1 4 5 6 +2 7 8 9 + +``` + +By specifying the ``names`` argument in conjunction with ``header`` you can +indicate other names to use and whether or not to throw away the header row (if +any): + +``` python +In [47]: print(data) +a,b,c +1,2,3 +4,5,6 +7,8,9 + +In [48]: pd.read_csv(StringIO(data), names=['foo', 'bar', 'baz'], header=0) +Out[48]: + foo bar baz +0 1 2 3 +1 4 5 6 +2 7 8 9 + +In [49]: pd.read_csv(StringIO(data), names=['foo', 'bar', 'baz'], header=None) +Out[49]: + foo bar baz +0 a b c +1 1 2 3 +2 4 5 6 +3 7 8 9 + +``` + +If the header is in a row other than the first, pass the row number to +``header``. This will skip the preceding rows: + +``` python +In [50]: data = ('skip this skip it\n' + ....: 'a,b,c\n' + ....: '1,2,3\n' + ....: '4,5,6\n' + ....: '7,8,9') + ....: + +In [51]: pd.read_csv(StringIO(data), header=1) +Out[51]: + a b c +0 1 2 3 +1 4 5 6 +2 7 8 9 + +``` + +::: tip Note + +Default behavior is to infer the column names: if no names are +passed the behavior is identical to ``header=0`` and column names +are inferred from the first non-blank line of the file, if column +names are passed explicitly then the behavior is identical to +``header=None``. + +::: + +### Duplicate names parsing + +If the file or header contains duplicate names, pandas will by default +distinguish between them so as to prevent overwriting data: + +``` python +In [52]: data = ('a,b,a\n' + ....: '0,1,2\n' + ....: '3,4,5') + ....: + +In [53]: pd.read_csv(StringIO(data)) +Out[53]: + a b a.1 +0 0 1 2 +1 3 4 5 + +``` + +There is no more duplicate data because ``mangle_dupe_cols=True`` by default, +which modifies a series of duplicate columns ‘X’, …, ‘X’ to become +‘X’, ‘X.1’, …, ‘X.N’. If ``mangle_dupe_cols=False``, duplicate data can +arise: + +``` python +In [2]: data = 'a,b,a\n0,1,2\n3,4,5' +In [3]: pd.read_csv(StringIO(data), mangle_dupe_cols=False) +Out[3]: + a b a +0 2 1 2 +1 5 4 5 + +``` + +To prevent users from encountering this problem with duplicate data, a ``ValueError`` +exception is raised if ``mangle_dupe_cols != True``: + +``` python +In [2]: data = 'a,b,a\n0,1,2\n3,4,5' +In [3]: pd.read_csv(StringIO(data), mangle_dupe_cols=False) +... +ValueError: Setting mangle_dupe_cols=False is not supported yet + +``` + +#### Filtering columns (``usecols``) + +The ``usecols`` argument allows you to select any subset of the columns in a +file, either using the column names, position numbers or a callable: + +*New in version 0.20.0:* support for callable *usecols* arguments + +``` python +In [54]: data = 'a,b,c,d\n1,2,3,foo\n4,5,6,bar\n7,8,9,baz' + +In [55]: pd.read_csv(StringIO(data)) +Out[55]: + a b c d +0 1 2 3 foo +1 4 5 6 bar +2 7 8 9 baz + +In [56]: pd.read_csv(StringIO(data), usecols=['b', 'd']) +Out[56]: + b d +0 2 foo +1 5 bar +2 8 baz + +In [57]: pd.read_csv(StringIO(data), usecols=[0, 2, 3]) +Out[57]: + a c d +0 1 3 foo +1 4 6 bar +2 7 9 baz + +In [58]: pd.read_csv(StringIO(data), usecols=lambda x: x.upper() in ['A', 'C']) +Out[58]: + a c +0 1 3 +1 4 6 +2 7 9 + +``` + +The ``usecols`` argument can also be used to specify which columns not to +use in the final result: + +``` python +In [59]: pd.read_csv(StringIO(data), usecols=lambda x: x not in ['a', 'c']) +Out[59]: + b d +0 2 foo +1 5 bar +2 8 baz + +``` + +In this case, the callable is specifying that we exclude the “a” and “c” +columns from the output. + +### Comments and empty lines + +#### Ignoring line comments and empty lines + +If the ``comment`` parameter is specified, then completely commented lines will +be ignored. By default, completely blank lines will be ignored as well. + +``` python +In [60]: data = ('\n' + ....: 'a,b,c\n' + ....: ' \n' + ....: '# commented line\n' + ....: '1,2,3\n' + ....: '\n' + ....: '4,5,6') + ....: + +In [61]: print(data) + +a,b,c + +# commented line +1,2,3 + +4,5,6 + +In [62]: pd.read_csv(StringIO(data), comment='#') +Out[62]: + a b c +0 1 2 3 +1 4 5 6 + +``` + +If ``skip_blank_lines=False``, then ``read_csv`` will not ignore blank lines: + +``` python +In [63]: data = ('a,b,c\n' + ....: '\n' + ....: '1,2,3\n' + ....: '\n' + ....: '\n' + ....: '4,5,6') + ....: + +In [64]: pd.read_csv(StringIO(data), skip_blank_lines=False) +Out[64]: + a b c +0 NaN NaN NaN +1 1.0 2.0 3.0 +2 NaN NaN NaN +3 NaN NaN NaN +4 4.0 5.0 6.0 + +``` + +::: danger Warning + +The presence of ignored lines might create ambiguities involving line numbers; +the parameter ``header`` uses row numbers (ignoring commented/empty +lines), while ``skiprows`` uses line numbers (including commented/empty lines): + +``` python +In [65]: data = ('#comment\n' + ....: 'a,b,c\n' + ....: 'A,B,C\n' + ....: '1,2,3') + ....: + +In [66]: pd.read_csv(StringIO(data), comment='#', header=1) +Out[66]: + A B C +0 1 2 3 + +In [67]: data = ('A,B,C\n' + ....: '#comment\n' + ....: 'a,b,c\n' + ....: '1,2,3') + ....: + +In [68]: pd.read_csv(StringIO(data), comment='#', skiprows=2) +Out[68]: + a b c +0 1 2 3 + +``` + +If both ``header`` and ``skiprows`` are specified, ``header`` will be +relative to the end of ``skiprows``. For example: + +::: + +``` python +In [69]: data = ('# empty\n' + ....: '# second empty line\n' + ....: '# third emptyline\n' + ....: 'X,Y,Z\n' + ....: '1,2,3\n' + ....: 'A,B,C\n' + ....: '1,2.,4.\n' + ....: '5.,NaN,10.0\n') + ....: + +In [70]: print(data) +# empty +# second empty line +# third emptyline +X,Y,Z +1,2,3 +A,B,C +1,2.,4. +5.,NaN,10.0 + + +In [71]: pd.read_csv(StringIO(data), comment='#', skiprows=4, header=1) +Out[71]: + A B C +0 1.0 2.0 4.0 +1 5.0 NaN 10.0 + +``` + +#### Comments + +Sometimes comments or meta data may be included in a file: + +``` python +In [72]: print(open('tmp.csv').read()) +ID,level,category +Patient1,123000,x # really unpleasant +Patient2,23000,y # wouldn't take his medicine +Patient3,1234018,z # awesome + +``` + +By default, the parser includes the comments in the output: + +``` python +In [73]: df = pd.read_csv('tmp.csv') + +In [74]: df +Out[74]: + ID level category +0 Patient1 123000 x # really unpleasant +1 Patient2 23000 y # wouldn't take his medicine +2 Patient3 1234018 z # awesome + +``` + +We can suppress the comments using the ``comment`` keyword: + +``` python +In [75]: df = pd.read_csv('tmp.csv', comment='#') + +In [76]: df +Out[76]: + ID level category +0 Patient1 123000 x +1 Patient2 23000 y +2 Patient3 1234018 z + +``` + +### Dealing with Unicode data + +The ``encoding`` argument should be used for encoded unicode data, which will +result in byte strings being decoded to unicode in the result: + +``` python +In [77]: data = (b'word,length\n' + ....: b'Tr\xc3\xa4umen,7\n' + ....: b'Gr\xc3\xbc\xc3\x9fe,5') + ....: + +In [78]: data = data.decode('utf8').encode('latin-1') + +In [79]: df = pd.read_csv(BytesIO(data), encoding='latin-1') + +In [80]: df +Out[80]: + word length +0 Träumen 7 +1 Grüße 5 + +In [81]: df['word'][1] +Out[81]: 'Grüße' + +``` + +Some formats which encode all characters as multiple bytes, like UTF-16, won’t +parse correctly at all without specifying the encoding. [Full list of Python +standard encodings](https://docs.python.org/3/library/codecs.html#standard-encodings). + +### Index columns and trailing delimiters + +If a file has one more column of data than the number of column names, the +first column will be used as the ``DataFrame``’s row names: + +``` python +In [82]: data = ('a,b,c\n' + ....: '4,apple,bat,5.7\n' + ....: '8,orange,cow,10') + ....: + +In [83]: pd.read_csv(StringIO(data)) +Out[83]: + a b c +4 apple bat 5.7 +8 orange cow 10.0 + +``` + +``` python +In [84]: data = ('index,a,b,c\n' + ....: '4,apple,bat,5.7\n' + ....: '8,orange,cow,10') + ....: + +In [85]: pd.read_csv(StringIO(data), index_col=0) +Out[85]: + a b c +index +4 apple bat 5.7 +8 orange cow 10.0 + +``` + +Ordinarily, you can achieve this behavior using the ``index_col`` option. + +There are some exception cases when a file has been prepared with delimiters at +the end of each data line, confusing the parser. To explicitly disable the +index column inference and discard the last column, pass ``index_col=False``: + +``` python +In [86]: data = ('a,b,c\n' + ....: '4,apple,bat,\n' + ....: '8,orange,cow,') + ....: + +In [87]: print(data) +a,b,c +4,apple,bat, +8,orange,cow, + +In [88]: pd.read_csv(StringIO(data)) +Out[88]: + a b c +4 apple bat NaN +8 orange cow NaN + +In [89]: pd.read_csv(StringIO(data), index_col=False) +Out[89]: + a b c +0 4 apple bat +1 8 orange cow + +``` + +If a subset of data is being parsed using the ``usecols`` option, the +``index_col`` specification is based on that subset, not the original data. + +``` python +In [90]: data = ('a,b,c\n' + ....: '4,apple,bat,\n' + ....: '8,orange,cow,') + ....: + +In [91]: print(data) +a,b,c +4,apple,bat, +8,orange,cow, + +In [92]: pd.read_csv(StringIO(data), usecols=['b', 'c']) +Out[92]: + b c +4 bat NaN +8 cow NaN + +In [93]: pd.read_csv(StringIO(data), usecols=['b', 'c'], index_col=0) +Out[93]: + b c +4 bat NaN +8 cow NaN + +``` + +### Date Handling + +#### Specifying date columns + +To better facilitate working with datetime data, [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) +uses the keyword arguments ``parse_dates`` and ``date_parser`` +to allow users to specify a variety of columns and date/time formats to turn the +input text data into ``datetime`` objects. + +The simplest case is to just pass in ``parse_dates=True``: + +``` python +# Use a column as an index, and parse it as dates. +In [94]: df = pd.read_csv('foo.csv', index_col=0, parse_dates=True) + +In [95]: df +Out[95]: + A B C +date +2009-01-01 a 1 2 +2009-01-02 b 3 4 +2009-01-03 c 4 5 + +# These are Python datetime objects +In [96]: df.index +Out[96]: DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03'], dtype='datetime64[ns]', name='date', freq=None) + +``` + +It is often the case that we may want to store date and time data separately, +or store various date fields separately. the ``parse_dates`` keyword can be +used to specify a combination of columns to parse the dates and/or times from. + +You can specify a list of column lists to ``parse_dates``, the resulting date +columns will be prepended to the output (so as to not affect the existing column +order) and the new column names will be the concatenation of the component +column names: + +``` python +In [97]: print(open('tmp.csv').read()) +KORD,19990127, 19:00:00, 18:56:00, 0.8100 +KORD,19990127, 20:00:00, 19:56:00, 0.0100 +KORD,19990127, 21:00:00, 20:56:00, -0.5900 +KORD,19990127, 21:00:00, 21:18:00, -0.9900 +KORD,19990127, 22:00:00, 21:56:00, -0.5900 +KORD,19990127, 23:00:00, 22:56:00, -0.5900 + +In [98]: df = pd.read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]]) + +In [99]: df +Out[99]: + 1_2 1_3 0 4 +0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81 +1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01 +2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59 +3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99 +4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59 +5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59 + +``` + +By default the parser removes the component date columns, but you can choose +to retain them via the ``keep_date_col`` keyword: + +``` python +In [100]: df = pd.read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]], + .....: keep_date_col=True) + .....: + +In [101]: df +Out[101]: + 1_2 1_3 0 1 2 3 4 +0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 19990127 19:00:00 18:56:00 0.81 +1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 19990127 20:00:00 19:56:00 0.01 +2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD 19990127 21:00:00 20:56:00 -0.59 +3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD 19990127 21:00:00 21:18:00 -0.99 +4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD 19990127 22:00:00 21:56:00 -0.59 +5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD 19990127 23:00:00 22:56:00 -0.59 + +``` + +Note that if you wish to combine multiple columns into a single date column, a +nested list must be used. In other words, ``parse_dates=[1, 2]`` indicates that +the second and third columns should each be parsed as separate date columns +while ``parse_dates=[[1, 2]]`` means the two columns should be parsed into a +single column. + +You can also use a dict to specify custom name columns: + +``` python +In [102]: date_spec = {'nominal': [1, 2], 'actual': [1, 3]} + +In [103]: df = pd.read_csv('tmp.csv', header=None, parse_dates=date_spec) + +In [104]: df +Out[104]: + nominal actual 0 4 +0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81 +1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01 +2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59 +3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99 +4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59 +5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59 + +``` + +It is important to remember that if multiple text columns are to be parsed into +a single date column, then a new column is prepended to the data. The *index_col* +specification is based off of this new set of columns rather than the original +data columns: + +``` python +In [105]: date_spec = {'nominal': [1, 2], 'actual': [1, 3]} + +In [106]: df = pd.read_csv('tmp.csv', header=None, parse_dates=date_spec, + .....: index_col=0) # index is the nominal column + .....: + +In [107]: df +Out[107]: + actual 0 4 +nominal +1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81 +1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01 +1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59 +1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99 +1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59 +1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59 + +``` + +::: tip Note + +If a column or index contains an unparsable date, the entire column or +index will be returned unaltered as an object data type. For non-standard +datetime parsing, use [``to_datetime()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime) after ``pd.read_csv``. + +::: + +::: tip Note + +read_csv has a fast_path for parsing datetime strings in iso8601 format, +e.g “2000-01-01T00:01:02+00:00” and similar variations. If you can arrange +for your data to store datetimes in this format, load times will be +significantly faster, ~20x has been observed. + +::: + +::: tip Note + +When passing a dict as the *parse_dates* argument, the order of +the columns prepended is not guaranteed, because *dict* objects do not impose +an ordering on their keys. On Python 2.7+ you may use *collections.OrderedDict* +instead of a regular *dict* if this matters to you. Because of this, when using a +dict for ‘parse_dates’ in conjunction with the *index_col* argument, it’s best to +specify *index_col* as a column label rather then as an index on the resulting frame. + +::: + +#### Date parsing functions + +Finally, the parser allows you to specify a custom ``date_parser`` function to +take full advantage of the flexibility of the date parsing API: + +``` python +In [108]: df = pd.read_csv('tmp.csv', header=None, parse_dates=date_spec, + .....: date_parser=pd.io.date_converters.parse_date_time) + .....: + +In [109]: df +Out[109]: + nominal actual 0 4 +0 1999-01-27 19:00:00 1999-01-27 18:56:00 KORD 0.81 +1 1999-01-27 20:00:00 1999-01-27 19:56:00 KORD 0.01 +2 1999-01-27 21:00:00 1999-01-27 20:56:00 KORD -0.59 +3 1999-01-27 21:00:00 1999-01-27 21:18:00 KORD -0.99 +4 1999-01-27 22:00:00 1999-01-27 21:56:00 KORD -0.59 +5 1999-01-27 23:00:00 1999-01-27 22:56:00 KORD -0.59 + +``` + +Pandas will try to call the ``date_parser`` function in three different ways. If +an exception is raised, the next one is tried: + +1. ``date_parser`` is first called with one or more arrays as arguments, +as defined using *parse_dates* (e.g., ``date_parser(['2013', '2013'], ['1', '2'])``). +1. If #1 fails, ``date_parser`` is called with all the columns +concatenated row-wise into a single array (e.g., ``date_parser(['2013 1', '2013 2'])``). +1. If #2 fails, ``date_parser`` is called once for every row with one or more +string arguments from the columns indicated with *parse_dates* +(e.g., ``date_parser('2013', '1')`` for the first row, ``date_parser('2013', '2')`` +for the second, etc.). + +Note that performance-wise, you should try these methods of parsing dates in order: + +1. Try to infer the format using ``infer_datetime_format=True`` (see section below). +1. If you know the format, use ``pd.to_datetime()``: +``date_parser=lambda x: pd.to_datetime(x, format=...)``. +1. If you have a really non-standard format, use a custom ``date_parser`` function. +For optimal performance, this should be vectorized, i.e., it should accept arrays +as arguments. + +You can explore the date parsing functionality in +[date_converters.py](https://github.com/pandas-dev/pandas/blob/master/pandas/io/date_converters.py) +and add your own. We would love to turn this module into a community supported +set of date/time parsers. To get you started, ``date_converters.py`` contains +functions to parse dual date and time columns, year/month/day columns, +and year/month/day/hour/minute/second columns. It also contains a +``generic_parser`` function so you can curry it with a function that deals with +a single date rather than the entire array. + +#### Parsing a CSV with mixed timezones + +Pandas cannot natively represent a column or index with mixed timezones. If your CSV +file contains columns with a mixture of timezones, the default result will be +an object-dtype column with strings, even with ``parse_dates``. + +``` python +In [110]: content = """\ + .....: a + .....: 2000-01-01T00:00:00+05:00 + .....: 2000-01-01T00:00:00+06:00""" + .....: + +In [111]: df = pd.read_csv(StringIO(content), parse_dates=['a']) + +In [112]: df['a'] +Out[112]: +0 2000-01-01 00:00:00+05:00 +1 2000-01-01 00:00:00+06:00 +Name: a, dtype: object + +``` + +To parse the mixed-timezone values as a datetime column, pass a partially-applied +[``to_datetime()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime) with ``utc=True`` as the ``date_parser``. + +``` python +In [113]: df = pd.read_csv(StringIO(content), parse_dates=['a'], + .....: date_parser=lambda col: pd.to_datetime(col, utc=True)) + .....: + +In [114]: df['a'] +Out[114]: +0 1999-12-31 19:00:00+00:00 +1 1999-12-31 18:00:00+00:00 +Name: a, dtype: datetime64[ns, UTC] + +``` + +#### Inferring datetime format + +If you have ``parse_dates`` enabled for some or all of your columns, and your +datetime strings are all formatted the same way, you may get a large speed +up by setting ``infer_datetime_format=True``. If set, pandas will attempt +to guess the format of your datetime strings, and then use a faster means +of parsing the strings. 5-10x parsing speeds have been observed. pandas +will fallback to the usual parsing if either the format cannot be guessed +or the format that was guessed cannot properly parse the entire column +of strings. So in general, ``infer_datetime_format`` should not have any +negative consequences if enabled. + +Here are some examples of datetime strings that can be guessed (All +representing December 30th, 2011 at 00:00:00): + +- “20111230” +- “2011/12/30” +- “20111230 00:00:00” +- “12/30/2011 00:00:00” +- “30/Dec/2011 00:00:00” +- “30/December/2011 00:00:00” + +Note that ``infer_datetime_format`` is sensitive to ``dayfirst``. With +``dayfirst=True``, it will guess “01/12/2011” to be December 1st. With +``dayfirst=False`` (default) it will guess “01/12/2011” to be January 12th. + +``` python +# Try to infer the format for the index column +In [115]: df = pd.read_csv('foo.csv', index_col=0, parse_dates=True, + .....: infer_datetime_format=True) + .....: + +In [116]: df +Out[116]: + A B C +date +2009-01-01 a 1 2 +2009-01-02 b 3 4 +2009-01-03 c 4 5 + +``` + +#### International date formats + +While US date formats tend to be MM/DD/YYYY, many international formats use +DD/MM/YYYY instead. For convenience, a ``dayfirst`` keyword is provided: + +``` python +In [117]: print(open('tmp.csv').read()) +date,value,cat +1/6/2000,5,a +2/6/2000,10,b +3/6/2000,15,c + +In [118]: pd.read_csv('tmp.csv', parse_dates=[0]) +Out[118]: + date value cat +0 2000-01-06 5 a +1 2000-02-06 10 b +2 2000-03-06 15 c + +In [119]: pd.read_csv('tmp.csv', dayfirst=True, parse_dates=[0]) +Out[119]: + date value cat +0 2000-06-01 5 a +1 2000-06-02 10 b +2 2000-06-03 15 c + +``` + +### Specifying method for floating-point conversion + +The parameter ``float_precision`` can be specified in order to use +a specific floating-point converter during parsing with the C engine. +The options are the ordinary converter, the high-precision converter, and +the round-trip converter (which is guaranteed to round-trip values after +writing to a file). For example: + +``` python +In [120]: val = '0.3066101993807095471566981359501369297504425048828125' + +In [121]: data = 'a,b,c\n1,2,{0}'.format(val) + +In [122]: abs(pd.read_csv(StringIO(data), engine='c', + .....: float_precision=None)['c'][0] - float(val)) + .....: +Out[122]: 1.1102230246251565e-16 + +In [123]: abs(pd.read_csv(StringIO(data), engine='c', + .....: float_precision='high')['c'][0] - float(val)) + .....: +Out[123]: 5.551115123125783e-17 + +In [124]: abs(pd.read_csv(StringIO(data), engine='c', + .....: float_precision='round_trip')['c'][0] - float(val)) + .....: +Out[124]: 0.0 + +``` + +### Thousand separators + +For large numbers that have been written with a thousands separator, you can +set the ``thousands`` keyword to a string of length 1 so that integers will be parsed +correctly: + +By default, numbers with a thousands separator will be parsed as strings: + +``` python +In [125]: print(open('tmp.csv').read()) +ID|level|category +Patient1|123,000|x +Patient2|23,000|y +Patient3|1,234,018|z + +In [126]: df = pd.read_csv('tmp.csv', sep='|') + +In [127]: df +Out[127]: + ID level category +0 Patient1 123,000 x +1 Patient2 23,000 y +2 Patient3 1,234,018 z + +In [128]: df.level.dtype +Out[128]: dtype('O') + +``` + +The ``thousands`` keyword allows integers to be parsed correctly: + +``` python +In [129]: print(open('tmp.csv').read()) +ID|level|category +Patient1|123,000|x +Patient2|23,000|y +Patient3|1,234,018|z + +In [130]: df = pd.read_csv('tmp.csv', sep='|', thousands=',') + +In [131]: df +Out[131]: + ID level category +0 Patient1 123000 x +1 Patient2 23000 y +2 Patient3 1234018 z + +In [132]: df.level.dtype +Out[132]: dtype('int64') + +``` + +### NA values + +To control which values are parsed as missing values (which are signified by +``NaN``), specify a string in ``na_values``. If you specify a list of strings, +then all values in it are considered to be missing values. If you specify a +number (a ``float``, like ``5.0`` or an ``integer`` like ``5``), the +corresponding equivalent values will also imply a missing value (in this case +effectively ``[5.0, 5]`` are recognized as ``NaN``). + +To completely override the default values that are recognized as missing, specify ``keep_default_na=False``. + +The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', +'n/a', 'NA', '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', '']``. + +Let us consider some examples: + +``` python +pd.read_csv('path_to_file.csv', na_values=[5]) + +``` + +In the example above ``5`` and ``5.0`` will be recognized as ``NaN``, in +addition to the defaults. A string will first be interpreted as a numerical +``5``, then as a ``NaN``. + +``` python +pd.read_csv('path_to_file.csv', keep_default_na=False, na_values=[""]) + +``` + +Above, only an empty field will be recognized as ``NaN``. + +``` python +pd.read_csv('path_to_file.csv', keep_default_na=False, na_values=["NA", "0"]) + +``` + +Above, both ``NA`` and ``0`` as strings are ``NaN``. + +``` python +pd.read_csv('path_to_file.csv', na_values=["Nope"]) + +``` + +The default values, in addition to the string ``"Nope"`` are recognized as +``NaN``. + +### Infinity + +``inf`` like values will be parsed as ``np.inf`` (positive infinity), and ``-inf`` as ``-np.inf`` (negative infinity). +These will ignore the case of the value, meaning ``Inf``, will also be parsed as ``np.inf``. + +### Returning Series + +Using the ``squeeze`` keyword, the parser will return output with a single column +as a ``Series``: + +``` python +In [133]: print(open('tmp.csv').read()) +level +Patient1,123000 +Patient2,23000 +Patient3,1234018 + +In [134]: output = pd.read_csv('tmp.csv', squeeze=True) + +In [135]: output +Out[135]: +Patient1 123000 +Patient2 23000 +Patient3 1234018 +Name: level, dtype: int64 + +In [136]: type(output) +Out[136]: pandas.core.series.Series + +``` + +### Boolean values + +The common values ``True``, ``False``, ``TRUE``, and ``FALSE`` are all +recognized as boolean. Occasionally you might want to recognize other values +as being boolean. To do this, use the ``true_values`` and ``false_values`` +options as follows: + +``` python +In [137]: data = ('a,b,c\n' + .....: '1,Yes,2\n' + .....: '3,No,4') + .....: + +In [138]: print(data) +a,b,c +1,Yes,2 +3,No,4 + +In [139]: pd.read_csv(StringIO(data)) +Out[139]: + a b c +0 1 Yes 2 +1 3 No 4 + +In [140]: pd.read_csv(StringIO(data), true_values=['Yes'], false_values=['No']) +Out[140]: + a b c +0 1 True 2 +1 3 False 4 + +``` + +### Handling “bad” lines + +Some files may have malformed lines with too few fields or too many. Lines with +too few fields will have NA values filled in the trailing fields. Lines with +too many fields will raise an error by default: + +``` python +In [141]: data = ('a,b,c\n' + .....: '1,2,3\n' + .....: '4,5,6,7\n' + .....: '8,9,10') + .....: + +In [142]: pd.read_csv(StringIO(data)) +--------------------------------------------------------------------------- +ParserError Traceback (most recent call last) + in +----> 1 pd.read_csv(StringIO(data)) + +/pandas/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) + 683 ) + 684 +--> 685 return _read(filepath_or_buffer, kwds) + 686 + 687 parser_f.__name__ = name + +/pandas/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) + 461 + 462 try: +--> 463 data = parser.read(nrows) + 464 finally: + 465 parser.close() + +/pandas/pandas/io/parsers.py in read(self, nrows) + 1152 def read(self, nrows=None): + 1153 nrows = _validate_integer("nrows", nrows) +-> 1154 ret = self._engine.read(nrows) + 1155 + 1156 # May alter columns / col_dict + +/pandas/pandas/io/parsers.py in read(self, nrows) + 2046 def read(self, nrows=None): + 2047 try: +-> 2048 data = self._reader.read(nrows) + 2049 except StopIteration: + 2050 if self._first_chunk: + +/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read() + +/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory() + +/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows() + +/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows() + +/pandas/pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error() + +ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4 + +``` + +You can elect to skip bad lines: + +``` python +In [29]: pd.read_csv(StringIO(data), error_bad_lines=False) +Skipping line 3: expected 3 fields, saw 4 + +Out[29]: + a b c +0 1 2 3 +1 8 9 10 + +``` + +You can also use the ``usecols`` parameter to eliminate extraneous column +data that appear in some lines but not others: + +``` python +In [30]: pd.read_csv(StringIO(data), usecols=[0, 1, 2]) + + Out[30]: + a b c + 0 1 2 3 + 1 4 5 6 + 2 8 9 10 + +``` + +### Dialect + +The ``dialect`` keyword gives greater flexibility in specifying the file format. +By default it uses the Excel dialect but you can specify either the dialect name +or a [``csv.Dialect``](https://docs.python.org/3/library/csv.html#csv.Dialect) instance. + +Suppose you had data with unenclosed quotes: + +``` python +In [143]: print(data) +label1,label2,label3 +index1,"a,c,e +index2,b,d,f + +``` + +By default, ``read_csv`` uses the Excel dialect and treats the double quote as +the quote character, which causes it to fail when it finds a newline before it +finds the closing double quote. + +We can get around this using ``dialect``: + +``` python +In [144]: import csv + +In [145]: dia = csv.excel() + +In [146]: dia.quoting = csv.QUOTE_NONE + +In [147]: pd.read_csv(StringIO(data), dialect=dia) +Out[147]: + label1 label2 label3 +index1 "a c e +index2 b d f + +``` + +All of the dialect options can be specified separately by keyword arguments: + +``` python +In [148]: data = 'a,b,c~1,2,3~4,5,6' + +In [149]: pd.read_csv(StringIO(data), lineterminator='~') +Out[149]: + a b c +0 1 2 3 +1 4 5 6 + +``` + +Another common dialect option is ``skipinitialspace``, to skip any whitespace +after a delimiter: + +``` python +In [150]: data = 'a, b, c\n1, 2, 3\n4, 5, 6' + +In [151]: print(data) +a, b, c +1, 2, 3 +4, 5, 6 + +In [152]: pd.read_csv(StringIO(data), skipinitialspace=True) +Out[152]: + a b c +0 1 2 3 +1 4 5 6 + +``` + +The parsers make every attempt to “do the right thing” and not be fragile. Type +inference is a pretty big deal. If a column can be coerced to integer dtype +without altering the contents, the parser will do so. Any non-numeric +columns will come through as object dtype as with the rest of pandas objects. + +### Quoting and Escape Characters + +Quotes (and other escape characters) in embedded fields can be handled in any +number of ways. One way is to use backslashes; to properly parse this data, you +should pass the ``escapechar`` option: + +``` python +In [153]: data = 'a,b\n"hello, \\"Bob\\", nice to see you",5' + +In [154]: print(data) +a,b +"hello, \"Bob\", nice to see you",5 + +In [155]: pd.read_csv(StringIO(data), escapechar='\\') +Out[155]: + a b +0 hello, "Bob", nice to see you 5 + +``` + +### Files with fixed width columns + +While [``read_csv()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv) reads delimited data, the [``read_fwf()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_fwf.html#pandas.read_fwf) function works +with data files that have known and fixed column widths. The function parameters +to ``read_fwf`` are largely the same as *read_csv* with two extra parameters, and +a different usage of the ``delimiter`` parameter: + +- ``colspecs``: A list of pairs (tuples) giving the extents of the +fixed-width fields of each line as half-open intervals (i.e., [from, to[ ). +String value ‘infer’ can be used to instruct the parser to try detecting +the column specifications from the first 100 rows of the data. Default +behavior, if not specified, is to infer. +- ``widths``: A list of field widths which can be used instead of ‘colspecs’ +if the intervals are contiguous. +- ``delimiter``: Characters to consider as filler characters in the fixed-width file. +Can be used to specify the filler character of the fields +if it is not spaces (e.g., ‘~’). + +Consider a typical fixed-width data file: + +``` python +In [156]: print(open('bar.csv').read()) +id8141 360.242940 149.910199 11950.7 +id1594 444.953632 166.985655 11788.4 +id1849 364.136849 183.628767 11806.2 +id1230 413.836124 184.375703 11916.8 +id1948 502.953953 173.237159 12468.3 + +``` + +In order to parse this file into a ``DataFrame``, we simply need to supply the +column specifications to the *read_fwf* function along with the file name: + +``` python +# Column specifications are a list of half-intervals +In [157]: colspecs = [(0, 6), (8, 20), (21, 33), (34, 43)] + +In [158]: df = pd.read_fwf('bar.csv', colspecs=colspecs, header=None, index_col=0) + +In [159]: df +Out[159]: + 1 2 3 +0 +id8141 360.242940 149.910199 11950.7 +id1594 444.953632 166.985655 11788.4 +id1849 364.136849 183.628767 11806.2 +id1230 413.836124 184.375703 11916.8 +id1948 502.953953 173.237159 12468.3 + +``` + +Note how the parser automatically picks column names ``X.`` when +``header=None`` argument is specified. Alternatively, you can supply just the +column widths for contiguous columns: + +``` python +# Widths are a list of integers +In [160]: widths = [6, 14, 13, 10] + +In [161]: df = pd.read_fwf('bar.csv', widths=widths, header=None) + +In [162]: df +Out[162]: + 0 1 2 3 +0 id8141 360.242940 149.910199 11950.7 +1 id1594 444.953632 166.985655 11788.4 +2 id1849 364.136849 183.628767 11806.2 +3 id1230 413.836124 184.375703 11916.8 +4 id1948 502.953953 173.237159 12468.3 + +``` + +The parser will take care of extra white spaces around the columns +so it’s ok to have extra separation between the columns in the file. + +By default, ``read_fwf`` will try to infer the file’s ``colspecs`` by using the +first 100 rows of the file. It can do it only in cases when the columns are +aligned and correctly separated by the provided ``delimiter`` (default delimiter +is whitespace). + +``` python +In [163]: df = pd.read_fwf('bar.csv', header=None, index_col=0) + +In [164]: df +Out[164]: + 1 2 3 +0 +id8141 360.242940 149.910199 11950.7 +id1594 444.953632 166.985655 11788.4 +id1849 364.136849 183.628767 11806.2 +id1230 413.836124 184.375703 11916.8 +id1948 502.953953 173.237159 12468.3 + +``` + +*New in version 0.20.0.* + +``read_fwf`` supports the ``dtype`` parameter for specifying the types of +parsed columns to be different from the inferred type. + +``` python +In [165]: pd.read_fwf('bar.csv', header=None, index_col=0).dtypes +Out[165]: +1 float64 +2 float64 +3 float64 +dtype: object + +In [166]: pd.read_fwf('bar.csv', header=None, dtype={2: 'object'}).dtypes +Out[166]: +0 object +1 float64 +2 object +3 float64 +dtype: object + +``` + +### Indexes + +#### Files with an “implicit” index column + +Consider a file with one less entry in the header than the number of data +column: + +``` python +In [167]: print(open('foo.csv').read()) +A,B,C +20090101,a,1,2 +20090102,b,3,4 +20090103,c,4,5 + +``` + +In this special case, ``read_csv`` assumes that the first column is to be used +as the index of the ``DataFrame``: + +``` python +In [168]: pd.read_csv('foo.csv') +Out[168]: + A B C +20090101 a 1 2 +20090102 b 3 4 +20090103 c 4 5 + +``` + +Note that the dates weren’t automatically parsed. In that case you would need +to do as before: + +``` python +In [169]: df = pd.read_csv('foo.csv', parse_dates=True) + +In [170]: df.index +Out[170]: DatetimeIndex(['2009-01-01', '2009-01-02', '2009-01-03'], dtype='datetime64[ns]', freq=None) + +``` + +#### Reading an index with a ``MultiIndex`` + +Suppose you have data indexed by two columns: + +``` python +In [171]: print(open('data/mindex_ex.csv').read()) +year,indiv,zit,xit +1977,"A",1.2,.6 +1977,"B",1.5,.5 +1977,"C",1.7,.8 +1978,"A",.2,.06 +1978,"B",.7,.2 +1978,"C",.8,.3 +1978,"D",.9,.5 +1978,"E",1.4,.9 +1979,"C",.2,.15 +1979,"D",.14,.05 +1979,"E",.5,.15 +1979,"F",1.2,.5 +1979,"G",3.4,1.9 +1979,"H",5.4,2.7 +1979,"I",6.4,1.2 + +``` + +The ``index_col`` argument to ``read_csv`` can take a list of +column numbers to turn multiple columns into a ``MultiIndex`` for the index of the +returned object: + +``` python +In [172]: df = pd.read_csv("data/mindex_ex.csv", index_col=[0, 1]) + +In [173]: df +Out[173]: + zit xit +year indiv +1977 A 1.20 0.60 + B 1.50 0.50 + C 1.70 0.80 +1978 A 0.20 0.06 + B 0.70 0.20 + C 0.80 0.30 + D 0.90 0.50 + E 1.40 0.90 +1979 C 0.20 0.15 + D 0.14 0.05 + E 0.50 0.15 + F 1.20 0.50 + G 3.40 1.90 + H 5.40 2.70 + I 6.40 1.20 + +In [174]: df.loc[1978] +Out[174]: + zit xit +indiv +A 0.2 0.06 +B 0.7 0.20 +C 0.8 0.30 +D 0.9 0.50 +E 1.4 0.90 + +``` + +#### Reading columns with a ``MultiIndex`` + +By specifying list of row locations for the ``header`` argument, you +can read in a ``MultiIndex`` for the columns. Specifying non-consecutive +rows will skip the intervening rows. + +``` python +In [175]: from pandas.util.testing import makeCustomDataframe as mkdf + +In [176]: df = mkdf(5, 3, r_idx_nlevels=2, c_idx_nlevels=4) + +In [177]: df.to_csv('mi.csv') + +In [178]: print(open('mi.csv').read()) +C0,,C_l0_g0,C_l0_g1,C_l0_g2 +C1,,C_l1_g0,C_l1_g1,C_l1_g2 +C2,,C_l2_g0,C_l2_g1,C_l2_g2 +C3,,C_l3_g0,C_l3_g1,C_l3_g2 +R0,R1,,, +R_l0_g0,R_l1_g0,R0C0,R0C1,R0C2 +R_l0_g1,R_l1_g1,R1C0,R1C1,R1C2 +R_l0_g2,R_l1_g2,R2C0,R2C1,R2C2 +R_l0_g3,R_l1_g3,R3C0,R3C1,R3C2 +R_l0_g4,R_l1_g4,R4C0,R4C1,R4C2 + + +In [179]: pd.read_csv('mi.csv', header=[0, 1, 2, 3], index_col=[0, 1]) +Out[179]: +C0 C_l0_g0 C_l0_g1 C_l0_g2 +C1 C_l1_g0 C_l1_g1 C_l1_g2 +C2 C_l2_g0 C_l2_g1 C_l2_g2 +C3 C_l3_g0 C_l3_g1 C_l3_g2 +R0 R1 +R_l0_g0 R_l1_g0 R0C0 R0C1 R0C2 +R_l0_g1 R_l1_g1 R1C0 R1C1 R1C2 +R_l0_g2 R_l1_g2 R2C0 R2C1 R2C2 +R_l0_g3 R_l1_g3 R3C0 R3C1 R3C2 +R_l0_g4 R_l1_g4 R4C0 R4C1 R4C2 + +``` + +``read_csv`` is also able to interpret a more common format +of multi-columns indices. + +``` python +In [180]: print(open('mi2.csv').read()) +,a,a,a,b,c,c +,q,r,s,t,u,v +one,1,2,3,4,5,6 +two,7,8,9,10,11,12 + +In [181]: pd.read_csv('mi2.csv', header=[0, 1], index_col=0) +Out[181]: + a b c + q r s t u v +one 1 2 3 4 5 6 +two 7 8 9 10 11 12 + +``` + +Note: If an ``index_col`` is not specified (e.g. you don’t have an index, or wrote it +with ``df.to_csv(..., index=False)``, then any ``names`` on the columns index will be lost. + +### Automatically “sniffing” the delimiter + +``read_csv`` is capable of inferring delimited (not necessarily +comma-separated) files, as pandas uses the [``csv.Sniffer``](https://docs.python.org/3/library/csv.html#csv.Sniffer) +class of the csv module. For this, you have to specify ``sep=None``. + +``` python +In [182]: print(open('tmp2.sv').read()) +:0:1:2:3 +0:0.4691122999071863:-0.2828633443286633:-1.5090585031735124:-1.1356323710171934 +1:1.2121120250208506:-0.17321464905330858:0.11920871129693428:-1.0442359662799567 +2:-0.8618489633477999:-2.1045692188948086:-0.4949292740687813:1.071803807037338 +3:0.7215551622443669:-0.7067711336300845:-1.0395749851146963:0.27185988554282986 +4:-0.42497232978883753:0.567020349793672:0.27623201927771873:-1.0874006912859915 +5:-0.6736897080883706:0.1136484096888855:-1.4784265524372235:0.5249876671147047 +6:0.4047052186802365:0.5770459859204836:-1.7150020161146375:-1.0392684835147725 +7:-0.3706468582364464:-1.1578922506419993:-1.344311812731667:0.8448851414248841 +8:1.0757697837155533:-0.10904997528022223:1.6435630703622064:-1.4693879595399115 +9:0.35702056413309086:-0.6746001037299882:-1.776903716971867:-0.9689138124473498 + + +In [183]: pd.read_csv('tmp2.sv', sep=None, engine='python') +Out[183]: + Unnamed: 0 0 1 2 3 +0 0 0.469112 -0.282863 -1.509059 -1.135632 +1 1 1.212112 -0.173215 0.119209 -1.044236 +2 2 -0.861849 -2.104569 -0.494929 1.071804 +3 3 0.721555 -0.706771 -1.039575 0.271860 +4 4 -0.424972 0.567020 0.276232 -1.087401 +5 5 -0.673690 0.113648 -1.478427 0.524988 +6 6 0.404705 0.577046 -1.715002 -1.039268 +7 7 -0.370647 -1.157892 -1.344312 0.844885 +8 8 1.075770 -0.109050 1.643563 -1.469388 +9 9 0.357021 -0.674600 -1.776904 -0.968914 + +``` + +### Reading multiple files to create a single DataFrame + +It’s best to use [``concat()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html#pandas.concat) to combine multiple files. +See the [cookbook](cookbook.html#cookbook-csv-multiple-files) for an example. + +### Iterating through files chunk by chunk + +Suppose you wish to iterate through a (potentially very large) file lazily +rather than reading the entire file into memory, such as the following: + +``` python +In [184]: print(open('tmp.sv').read()) +|0|1|2|3 +0|0.4691122999071863|-0.2828633443286633|-1.5090585031735124|-1.1356323710171934 +1|1.2121120250208506|-0.17321464905330858|0.11920871129693428|-1.0442359662799567 +2|-0.8618489633477999|-2.1045692188948086|-0.4949292740687813|1.071803807037338 +3|0.7215551622443669|-0.7067711336300845|-1.0395749851146963|0.27185988554282986 +4|-0.42497232978883753|0.567020349793672|0.27623201927771873|-1.0874006912859915 +5|-0.6736897080883706|0.1136484096888855|-1.4784265524372235|0.5249876671147047 +6|0.4047052186802365|0.5770459859204836|-1.7150020161146375|-1.0392684835147725 +7|-0.3706468582364464|-1.1578922506419993|-1.344311812731667|0.8448851414248841 +8|1.0757697837155533|-0.10904997528022223|1.6435630703622064|-1.4693879595399115 +9|0.35702056413309086|-0.6746001037299882|-1.776903716971867|-0.9689138124473498 + + +In [185]: table = pd.read_csv('tmp.sv', sep='|') + +In [186]: table +Out[186]: + Unnamed: 0 0 1 2 3 +0 0 0.469112 -0.282863 -1.509059 -1.135632 +1 1 1.212112 -0.173215 0.119209 -1.044236 +2 2 -0.861849 -2.104569 -0.494929 1.071804 +3 3 0.721555 -0.706771 -1.039575 0.271860 +4 4 -0.424972 0.567020 0.276232 -1.087401 +5 5 -0.673690 0.113648 -1.478427 0.524988 +6 6 0.404705 0.577046 -1.715002 -1.039268 +7 7 -0.370647 -1.157892 -1.344312 0.844885 +8 8 1.075770 -0.109050 1.643563 -1.469388 +9 9 0.357021 -0.674600 -1.776904 -0.968914 + +``` + +By specifying a ``chunksize`` to ``read_csv``, the return +value will be an iterable object of type ``TextFileReader``: + +``` python +In [187]: reader = pd.read_csv('tmp.sv', sep='|', chunksize=4) + +In [188]: reader +Out[188]: + +In [189]: for chunk in reader: + .....: print(chunk) + .....: + Unnamed: 0 0 1 2 3 +0 0 0.469112 -0.282863 -1.509059 -1.135632 +1 1 1.212112 -0.173215 0.119209 -1.044236 +2 2 -0.861849 -2.104569 -0.494929 1.071804 +3 3 0.721555 -0.706771 -1.039575 0.271860 + Unnamed: 0 0 1 2 3 +4 4 -0.424972 0.567020 0.276232 -1.087401 +5 5 -0.673690 0.113648 -1.478427 0.524988 +6 6 0.404705 0.577046 -1.715002 -1.039268 +7 7 -0.370647 -1.157892 -1.344312 0.844885 + Unnamed: 0 0 1 2 3 +8 8 1.075770 -0.10905 1.643563 -1.469388 +9 9 0.357021 -0.67460 -1.776904 -0.968914 + +``` + +Specifying ``iterator=True`` will also return the ``TextFileReader`` object: + +``` python +In [190]: reader = pd.read_csv('tmp.sv', sep='|', iterator=True) + +In [191]: reader.get_chunk(5) +Out[191]: + Unnamed: 0 0 1 2 3 +0 0 0.469112 -0.282863 -1.509059 -1.135632 +1 1 1.212112 -0.173215 0.119209 -1.044236 +2 2 -0.861849 -2.104569 -0.494929 1.071804 +3 3 0.721555 -0.706771 -1.039575 0.271860 +4 4 -0.424972 0.567020 0.276232 -1.087401 + +``` + +### Specifying the parser engine + +Under the hood pandas uses a fast and efficient parser implemented in C as well +as a Python implementation which is currently more feature-complete. Where +possible pandas uses the C parser (specified as ``engine='c'``), but may fall +back to Python if C-unsupported options are specified. Currently, C-unsupported +options include: + +- ``sep`` other than a single character (e.g. regex separators) +- ``skipfooter`` +- ``sep=None`` with ``delim_whitespace=False`` + +Specifying any of the above options will produce a ``ParserWarning`` unless the +python engine is selected explicitly using ``engine='python'``. + +### Reading remote files + +You can pass in a URL to a CSV file: + +``` python +df = pd.read_csv('https://download.bls.gov/pub/time.series/cu/cu.item', + sep='\t') + +``` + +S3 URLs are handled as well but require installing the [S3Fs](https://pypi.org/project/s3fs/) library: + +``` python +df = pd.read_csv('s3://pandas-test/tips.csv') + +``` + +If your S3 bucket requires credentials you will need to set them as environment +variables or in the ``~/.aws/credentials`` config file, refer to the [S3Fs +documentation on credentials](https://s3fs.readthedocs.io/en/latest/#credentials). + +### Writing out data + +#### Writing to CSV format + +The ``Series`` and ``DataFrame`` objects have an instance method ``to_csv`` which +allows storing the contents of the object as a comma-separated-values file. The +function takes a number of arguments. Only the first is required. + +- ``path_or_buf``: A string path to the file to write or a file object. If a file object it must be opened with *newline=’‘* +- ``sep`` : Field delimiter for the output file (default “,”) +- ``na_rep``: A string representation of a missing value (default ‘’) +- ``float_format``: Format string for floating point numbers +- ``columns``: Columns to write (default None) +- ``header``: Whether to write out the column names (default True) +- ``index``: whether to write row (index) names (default True) +- ``index_label``: Column label(s) for index column(s) if desired. If None +(default), and *header* and *index* are True, then the index names are +used. (A sequence should be given if the ``DataFrame`` uses MultiIndex). +- ``mode`` : Python write mode, default ‘w’ +- ``encoding``: a string representing the encoding to use if the contents are +non-ASCII, for Python versions prior to 3 +- ``line_terminator``: Character sequence denoting line end (default *os.linesep*) +- ``quoting``: Set quoting rules as in csv module (default csv.QUOTE_MINIMAL). Note that if you have set a *float_format* then floats are converted to strings and csv.QUOTE_NONNUMERIC will treat them as non-numeric +- ``quotechar``: Character used to quote fields (default ‘”’) +- ``doublequote``: Control quoting of ``quotechar`` in fields (default True) +- ``escapechar``: Character used to escape ``sep`` and ``quotechar`` when +appropriate (default None) +- ``chunksize``: Number of rows to write at a time +- ``date_format``: Format string for datetime objects + +#### Writing a formatted string + +The ``DataFrame`` object has an instance method ``to_string`` which allows control +over the string representation of the object. All arguments are optional: + +- ``buf`` default None, for example a StringIO object +- ``columns`` default None, which columns to write +- ``col_space`` default None, minimum width of each column. +- ``na_rep`` default ``NaN``, representation of NA value +- ``formatters`` default None, a dictionary (by column) of functions each of +which takes a single argument and returns a formatted string +- ``float_format`` default None, a function which takes a single (float) +argument and returns a formatted string; to be applied to floats in the +``DataFrame``. +- ``sparsify`` default True, set to False for a ``DataFrame`` with a hierarchical +index to print every MultiIndex key at each row. +- ``index_names`` default True, will print the names of the indices +- ``index`` default True, will print the index (ie, row labels) +- ``header`` default True, will print the column labels +- ``justify`` default ``left``, will print column headers left- or +right-justified + +The ``Series`` object also has a ``to_string`` method, but with only the ``buf``, +``na_rep``, ``float_format`` arguments. There is also a ``length`` argument +which, if set to ``True``, will additionally output the length of the Series. + +## JSON + +读取和写入 `JSON` 格式的文本和字符串。 + +### Writing JSON + +一个`Series` 或 ` DataFrame` 能转化成一个有效的`JSON`字符串。使用`to_json` 同可选的参数: +- `path_or_buf` : 写入输出的路径名或缓存可以是`None` , 在这种情况下会返回一个JSON字符串。 +- `orient` : + + `Series` : + - 默认是 `index` ; + - 允许的值可以是{`split`, `records`, `index`}。 + + `DataFrame` : + - 默认是 `columns` ; + - 允许的值可以是{`split`, `records`, ` index`, `columns`, `values`, `table`}。 + + JSON字符串的格式: + + + split | dict like {index -> [index], columns -> [columns], data -> [values]} + ------------- | ------------- + records | list like [{column -> value}, … , {column -> value}] + index | dict like {index -> {column -> value}} + columns | dict like {column -> {index -> value}} + values | just the values array + +- `date_format` : 字符串,日期类型的转换,'eposh'是时间戳,'iso'是 ISO8601。 + +- `double_precision` : 当要编码的是浮点数值时使用的小数位数,默认是 10。 + +- `force_ascii` : 强制编码字符串为 ASCII , 默认是True。 + +- `date_unit` : 时间单位被编码来管理时间戳 和 ISO8601精度。's', 'ms', 'us' 或'ns'中的一个分别为 秒,毫秒,微秒,纳秒。默认是 'ms'。 + +- `default_handler` : 如果一个对象没有转换成一个恰当的JSON格式,处理程序就会被调用。采用单个参数,即要转换的对象,并返回一个序列化的对象。 + +- `lines` : 如果面向 `records` ,就将每行写入记录为json。 + +注意:`NaN`'S , `NaT`'S 和`None` 将会被转换为`null`, 并且`datetime` 将会基于`date_format` 和 `date_unit` 两个参数转换。 + +```python +In [192]: dfj = pd.DataFrame(np.random.randn(5, 2), columns=list('AB')) + +In [193]: json = dfj.to_json() + +In [194]: json +Out[194]: '{"A":{"0":-1.2945235903,"1":0.2766617129,"2":-0.0139597524,"3":-0.0061535699,"4":0.8957173022},"B":{"0":0.4137381054, +"1":-0.472034511,"2":-0.3625429925,"3":-0.923060654,"4":0.8052440254}}' +``` + +#### 面向选项(Orient options) + +要生成JSON文件/字符串,这儿有很多可选的格式。如下面的 `DataFrame ` 和 `Series` : + +```python +In [195]: dfjo = pd.DataFrame(dict(A=range(1, 4), B=range(4, 7), C=range(7, 10)), + ..... : columns=list('ABC'), index=list('xyz')) + ..... : + +In [196]: dfjo +Out[196]: + A B C +x 1 4 7 +y 2 5 8 +z 3 6 9 + +In [197]: sjo = pd.Series(dict(x=15, y=16, z=17), name='D') + +In [198]: sjo +Out[198]: +x 15 +y 16 +z 17 +Name: D, dtype: int64 + +``` + +**面向列** 序列化数据(默认是 `DataFrame`)来作为嵌套的JSON对象,且列标签充当主索引: + +```python +In [199]: dfjo.to_json(orient="columns") +Out[199]: '{"A":{"x":1,"y":2,"z":3},"B":{"x":4,"y":5,"z":6},"C":{"x":7,"y":8,"z":9}}' + +# Not available for Series (不适用于 Series) + +``` + +**面向索引** (默认是 `Series`) 与面向列类似,但是索引标签是主键: + +```python +In [200]: dfjo.to_json(orient="index") +Out[200]: '{"x":{"A":1,"B":4,"C":7},"y":{"A":2,"B":5,"C":8},"z":{"A":3,"B":6,"C":9}}' + +In [201]: sjo.to_json(orient="index") +Out[201]: '{"x":15,"y":16,"z":17}' + +``` + +**面向记录** 序列化数据为一列JSON数组 -> 值的记录,索引标签不包括在内。这个在传递 `DataFrame` 数据到绘图库的时候很有用,例如JavaScript库 `d3.js` : + +```python +In [202]: dfjo.to_json(orient="records") +Out[202]: '[{"A":1,"B":4,"C":7},{"A":2,"B":5,"C":8},{"A":3,"B":6,"C":9}]' + +In [203]: sjo.to_json(orient="records") +Out[203]: '[15,16,17]' + +``` + +**面向值** 是一个概要的选项,它只序列化为嵌套的JSON数组值,列和索引标签不包括在内: + +```python +In [204]: dfjo.to_json(orient="values") +Out[204]: '[[1,4,7],[2,5,8],[3,6,9]]' + +# Not available for Series + +``` + +**面向切分** 序列化成一个JSON对象,它包括单项的值、索引和列。`Series` 的命名也包括: + +```python +In [205]: dfjo.to_json(orient="split") +Out[205]: '{"columns":["A","B","C"],"index":["x","y","z"],"data":[[1,4,7],[2,5,8],[3,6,9]]}' + +In [206]: sjo.to_json(orient="split") +Out[206]: '{"name":"D","index":["x","y","z"],"data":[15,16,17]}' + +``` + +**面向表格** 序列化为JSON的[ 表格模式(Table Schema)](https://specs.frictionlessdata.io/json-table-schema/ " Table Schema"),允许保存为元数据,包括但不限于dtypes和索引名称。 + +::: tip 注意 + +任何面向选项编码为一个JSON对象在转为序列化期间将不会保留索引和列标签的顺序。如果你想要保留标签的顺序,就使用`split`选项,因为它使用有序的容器。 + +::: + +#### 日期处理(Date handling) + +用ISO日期格式来写入: + +```python +In [207]: dfd = pd.DataFrame(np.random.randn(5, 2), columns=list('AB')) + +In [208]: dfd['date'] = pd.Timestamp('20130101') + +In [209]: dfd = dfd.sort_index(1, ascending=False) + +In [210]: json = dfd.to_json(date_format='iso') + +In [211]: json +Out[211]: '{"date":{"0":"2013-01-01T00:00:00.000Z","1":"2013-01-01T00:00:00.000Z","2":"2013-01-01T00:00:00.000Z","3":"2013-01-01T00:00:00.000Z","4":"2013-01-01T00:00:00.000Z"},"B":{"0":2.5656459463,"1":1.3403088498,"2":-0.2261692849,"3":0.8138502857,"4":-0.8273169356},"A":{"0":-1.2064117817,"1":1.4312559863,"2":-1.1702987971,"3":0.4108345112,"4":0.1320031703}}' + +``` + +以ISO日期格式的微秒单位写入: + +```python +In [212]: json = dfd.to_json(date_format='iso', date_unit='us') + +In [213]: json +Out[213]: '{"date":{"0":"2013-01-01T00:00:00.000000Z","1":"2013-01-01T00:00:00.000000Z","2":"2013-01-01T00:00:00.000000Z","3":"2013-01-01T00:00:00.000000Z","4":"2013-01-01T00:00:00.000000Z"},"B":{"0":2.5656459463,"1":1.3403088498,"2":-0.2261692849,"3":0.8138502857,"4":-0.8273169356},"A":{"0":-1.2064117817,"1":1.4312559863,"2":-1.1702987971,"3":0.4108345112,"4":0.1320031703}} + +``` +时间戳的时间,以秒为单位: + +```python +In [214]: json = dfd.to_json(date_format='epoch', date_unit='s') + +In [215]: json +Out[215]: '{"date":{"0":1356998400,"1":1356998400,"2":1356998400,"3":1356998400,"4":1356998400},"B":{"0":2.5656459463,"1":1.3403088498,"2":-0.2261692849,"3":0.8138502857,"4":-0.8273169356},"A":{"0":-1.2064117817,"1":1.4312559863,"2":-1.1702987971,"3":0.4108345112,"4":0.1320031703}}' + +``` + +写入文件,以日期索引和日期列格式: + +```python +In [216]: dfj2 = dfj.copy() + +In [217]: dfj2['date'] = pd.Timestamp('20130101') + +In [218]: dfj2['ints'] = list(range(5)) + +In [219]: dfj2['bools'] = True + +In [220]: dfj2.index = pd.date_range('20130101', periods=5) + +In [221]: dfj2.to_json('test.json') + +In [222]: with open('test.json') as fh: + .....: print(fh.read()) + .....: +{"A":{"1356998400000":-1.2945235903,"1357084800000":0.2766617129,"1357171200000":-0.0139597524,"1357257600000":-0.0061535699,"1357344000000":0.8957173022},"B":{"1356998400000":0.4137381054,"1357084800000":-0.472034511,"1357171200000":-0.3625429925,"1357257600000":-0.923060654,"1357344000000":0.8052440254},"date":{"1356998400000":1356998400000,"1357084800000":1356998400000,"1357171200000":1356998400000,"1357257600000":1356998400000,"1357344000000":1356998400000},"ints":{"1356998400000":0,"1357084800000":1,"1357171200000":2,"1357257600000":3,"1357344000000":4},"bools":{"1356998400000":true,"1357084800000":true,"1357171200000":true,"1357257600000":true,"1357344000000":true}} + +``` + +#### 回退行为(Fallback behavior) + +如果JSON序列不能直接处理容器的内容,他将会以下面的方式发生回退: + +- 如果dtype是不被支持的(例如:` np.complex` ) ,则将为每个值调用 `default_handler` (如果提供),否则引发异常。 + +- 如果对象不受支持,它将尝试以下操作: + - 检查一下是否对象被定义为 `toDict ` 的方法并调用它。`toDict`的方法将返回一个`dict`,它将会是序列化的JSON格式。 + - 如果提供了`default_handler`,则调用它。 + - 通过遍历其内容将对象转换为`dict`。 但是,这通常会出现`OverflowError`而失败或抛出意外的结果。 + +通常,对于不被支持的对象或dtypes,处理的最佳方法是提供`default_handler`。 例如: + +``` python +>>> DataFrame([1.0, 2.0, complex(1.0, 2.0)]).to_json() # raises +RuntimeError: Unhandled numpy dtype 15 + +``` + +可以通过指定一个简单`default_handler`来处理: + +``` python +In [223]: pd.DataFrame([1.0, 2.0, complex(1.0, 2.0)]).to_json(default_handler=str) +Out[223]: '{"0":{"0":"(1+0j)","1":"(2+0j)","2":"(1+2j)"}}' + +``` + +### JSON的读取(Reading JSON) + +把JSON字符串读取到pandas对象里会采用很多参数。如果`typ`没有提供或者为`None`,解析器将尝试解析`DataFrame`。 要强制地进行`Series`解析,请传递参数如`typ = series`。 + +- `filepath_or_buffer` : 一个**有效**的JSON字符串或文件句柄/StringIO(在内存中读写字符串)。字符串可以是一个URL。有效的URL格式包括http, ftp, S3和文件。对于文件型的URL, 最好有个主机地址。例如一个本地文件可以是 file://localhost/path/to/table.json 这样的格式。 + +- `typ` : 要恢复的对象类型(series或者frame),默认“frame”。 + +- `orient` : + + Series: + - 默认是 `index `。 + - 允许值为{ `split`, `records`, `index`}。 + + DataFrame: + - 默认是 `columns `。 + - 允许值是{ `split`, `records`, `index`, `columns`, `values`, `table`}。 + +JSON字符串的格式: + + + split | dict like {index -> [index], columns -> [columns], data -> [values]} + ------------- | ------------- + records | list like [{column -> value}, … , {column -> value}] + index | dict like {index -> {column -> value}} + columns | dict like {column -> {index -> value}} + values | just the values array + table | adhering to the JSON [Table Schema](https://specs.frictionlessdata.io/json-table-schema/) + +- ` dtype `: 如果为True,推断dtypes,如果列为dtype的字典,则使用那些;如果为`False`,则根本不推断dtypes,默认为True,仅适用于数据。 + +- `convert_axes` : 布尔值,尝试将轴转换为正确的dtypes,默认为`True`。 + +- `convert_dates` :一列列表要解析为日期; 如果为`True`,则尝试解析类似日期的列,默认为`True`。 + +- `keep_default_dates` :布尔值,默认为`True`。 如果解析日期,则解析默认的类似日期的列。 + +- `numpy` :直接解码为NumPy数组。 默认为`False`; 虽然标签可能是非数字的,但仅支持数字数据。 另请注意,如果`numpy = True`,则每个术语的JSON顺序 **必须** 相同。 + +- `precise_float` :布尔值,默认为`False`。 当解码字符串为双值时,设置为能使用更高精度(strtod)函数。 默认(`False`)快速使用但不精确的内置功能。 + +- `date_unit` :字符串,用于检测转换日期的时间戳单位。 默认无。 默认情况下,将检测时间戳精度,如果不需要,则传递's','ms','us'或'ns'中的一个,以强制时间戳精度分别为秒,毫秒,微秒或纳秒。 + +- `lines` :读取文件每行作为一个JSON对象。 + +- `encoding` :用于解码py3字节的编码。 + +- `chunksize` :当与`lines = True`结合使用时,返回一个Json读取器(JSONReader),每次迭代读取`chunksize`行。 + +如果JSON不能解析,解析器将抛出`ValueError / TypeError / AssertionError `中的一个错误。 + +如果在编码为JSON时使用非默认的`orient`方法,请确保在此处传递相同的选项以便解码产生合理的结果,请参阅 [Orient Options](https://www.pypandas.cn/docs/user_guide/io.html#orient-options)以获取概述。 + +#### 数据转换(Data conversion) + +`convert_axes = True`,`dtype = True`和`convert_dates = True`的默认值将尝试解析轴,并将所有数据解析为适当的类型,包括日期。 如果需要覆盖特定的dtypes,请将字典传递给`dtype`。 如果您需要在轴中保留类似字符串的数字(例如“1”,“2”),则只应将`convert_axes`设置为`False`。 + +::: tip 注意 + +如果`convert_dates = True`并且数据和/或列标签显示为“类似日期('date-like')“,则可以将大的整数值转换为日期。 确切的标准取决于指定的`date_unit`。 'date-like'表示列标签符合以下标准之一: +- 结尾以 `'_at'` +- 结尾以 `'_time'` +- 开头以 `'timestamp'` +- 它是 `'modified'` +- 它是 `'date'` + +::: + +::: danger 警告 + +在读取JSON数据时,自动强制转换为dtypes有一些不同寻常的地方: + +- 索引可以按序列化的不同顺序重建,也就是说,返回的顺序不能保证与序列化之前的顺序相同 + +- 如果可以安全地,那么一列浮动(`float`)数据将被转换为一列整数(`integer`),例如 一列 `1` +- 布尔列将在重建时转换为整数(`integer `) + +因此,有时你会有那样的时刻可能想通过`dtype`关键字参数指定特定的dtypes。 + +::: + +读取JSON字符串: + +```python +In [224]: pd.read_json(json) +Out[224]: + date B A +0 2013-01-01 2.565646 -1.206412 +1 2013-01-01 1.340309 1.431256 +2 2013-01-01 -0.226169 -1.170299 +3 2013-01-01 0.813850 0.410835 +4 2013-01-01 -0.827317 0.132003 + +``` +读取文件: + +```python +In [225]: pd.read_json('test.json') +Out[225]: + A B date ints bools +2013-01-01 -1.294524 0.413738 2013-01-01 0 True +2013-01-02 0.276662 -0.472035 2013-01-01 1 True +2013-01-03 -0.013960 -0.362543 2013-01-01 2 True +2013-01-04 -0.006154 -0.923061 2013-01-01 3 True +2013-01-05 0.895717 0.805244 2013-01-01 4 True + +``` +不要转换任何数据(但仍然转换轴和日期): + +```python +In [226]: pd.read_json('test.json', dtype=object).dtypes +Out[226]: +A object +B object +date object +ints object +bools object +dtype: object + +``` +指定转换的dtypes: + +```python +In [227]: pd.read_json('test.json', dtype={'A': 'float32', 'bools': 'int8'}).dtypes +Out[227]: +A float32 +B float64 +date datetime64[ns] +ints int64 +bools int8 +dtype: object + +``` +保留字符串索引: + +```python +In [228]: si = pd.DataFrame(np.zeros((4, 4)), columns=list(range(4)), + .....: index=[str(i) for i in range(4)]) + .....: + +In [229]: si +Out[229]: + 0 1 2 3 +0 0.0 0.0 0.0 0.0 +1 0.0 0.0 0.0 0.0 +2 0.0 0.0 0.0 0.0 +3 0.0 0.0 0.0 0.0 + +In [230]: si.index +Out[230]: Index(['0', '1', '2', '3'], dtype='object') + +In [231]: si.columns +Out[231]: Int64Index([0, 1, 2, 3], dtype='int64') + +In [232]: json = si.to_json() + +In [233]: sij = pd.read_json(json, convert_axes=False) + +In [234]: sij +Out[234]: + 0 1 2 3 +0 0 0 0 0 +1 0 0 0 0 +2 0 0 0 0 +3 0 0 0 0 + +In [235]: sij.index +Out[235]: Index(['0', '1', '2', '3'], dtype='object') + +In [236]: sij.columns +Out[236]: Index(['0', '1', '2', '3'], dtype='object') + +``` +以纳秒为单位的日期需要以纳秒为单位读回: + +``` python +In [237]: json = dfj2.to_json(date_unit='ns') + +# Try to parse timestamps as milliseconds -> Won't Work +In [238]: dfju = pd.read_json(json, date_unit='ms') + +In [239]: dfju +Out[239]: + A B date ints bools +1356998400000000000 -1.294524 0.413738 1356998400000000000 0 True +1357084800000000000 0.276662 -0.472035 1356998400000000000 1 True +1357171200000000000 -0.013960 -0.362543 1356998400000000000 2 True +1357257600000000000 -0.006154 -0.923061 1356998400000000000 3 True +1357344000000000000 0.895717 0.805244 1356998400000000000 4 True + +# Let pandas detect the correct precision +In [240]: dfju = pd.read_json(json) + +In [241]: dfju +Out[241]: + A B date ints bools +2013-01-01 -1.294524 0.413738 2013-01-01 0 True +2013-01-02 0.276662 -0.472035 2013-01-01 1 True +2013-01-03 -0.013960 -0.362543 2013-01-01 2 True +2013-01-04 -0.006154 -0.923061 2013-01-01 3 True +2013-01-05 0.895717 0.805244 2013-01-01 4 True + +# Or specify that all timestamps are in nanoseconds +In [242]: dfju = pd.read_json(json, date_unit='ns') + +In [243]: dfju +Out[243]: + A B date ints bools +2013-01-01 -1.294524 0.413738 2013-01-01 0 True +2013-01-02 0.276662 -0.472035 2013-01-01 1 True +2013-01-03 -0.013960 -0.362543 2013-01-01 2 True +2013-01-04 -0.006154 -0.923061 2013-01-01 3 True +2013-01-05 0.895717 0.805244 2013-01-01 4 True + +``` + +#### Numpy 参数 + +::: tip 注意 + +这仅支持数值数据。 索引和列标签可以是非数字的,例如 字符串,日期等。 + +::: + +如果将`numpy = True`传递给`read_json`,则会在反序列化期间尝试找到适当的dtype,然后直接解码到NumPy数组,从而绕过对中间Python对象的需求。 + +如果要反序列化大量数值数据,这可以提供加速: + +``` python +In [244]: randfloats = np.random.uniform(-100, 1000, 10000) + +In [245]: randfloats.shape = (1000, 10) + +In [246]: dffloats = pd.DataFrame(randfloats, columns=list('ABCDEFGHIJ')) + +In [247]: jsonfloats = dffloats.to_json() + +``` + +``` python +In [248]: %timeit pd.read_json(jsonfloats) +12.4 ms +- 116 us per loop (mean +- std. dev. of 7 runs, 100 loops each) + +``` + +``` python +In [249]: %timeit pd.read_json(jsonfloats, numpy=True) +9.56 ms +- 82.8 us per loop (mean +- std. dev. of 7 runs, 100 loops each) + +``` + +对于较小的数据集,加速不太明显: + +``` python +In [250]: jsonfloats = dffloats.head(100).to_json() + +``` + +``` python +In [251]: %timeit pd.read_json(jsonfloats) +8.05 ms +- 120 us per loop (mean +- std. dev. of 7 runs, 100 loops each) + +``` + +``` python +In [252]: %timeit pd.read_json(jsonfloats, numpy=True) +7 ms +- 162 us per loop (mean +- std. dev. of 7 runs, 100 loops each) + +``` +::: danger 警告 + +直接NumPy解码会产生许多假设并可能导致失败,或如果这些假设不满足,则产生意外地输出: + +- 数据是数值。 +- 数据是统一的。 从解码的第一个值中找到dtype。可能会引发`ValueError`错误,或者如果这个条件不满足可能产生不正确的输出。 + +- 标签是有序的。 标签仅从第一个容器读取,假设每个后续行/列已按相同顺序编码。 如果使用`to_json`编码数据,则应该满足这一要求,但如果JSON来自其他来源,则可能不是这种情况。 + +::: + + +### 标准化(Normalization) + +pandas提供了一个实用程序函数来获取一个字典或字典列表,并将这个半结构化数据规范化为一个平面表。 + +``` python +In [253]: from pandas.io.json import json_normalize + +In [254]: data = [{'id': 1, 'name': {'first': 'Coleen', 'last': 'Volk'}}, + .....: {'name': {'given': 'Mose', 'family': 'Regner'}}, + .....: {'id': 2, 'name': 'Faye Raker'}] + .....: + +In [255]: json_normalize(data) +Out[255]: + id name.first name.last name.given name.family name +0 1.0 Coleen Volk NaN NaN NaN +1 NaN NaN NaN Mose Regner NaN +2 2.0 NaN NaN NaN NaN Faye Raker + +``` + +``` python +In [256]: data = [{'state': 'Florida', + .....: 'shortname': 'FL', + .....: 'info': {'governor': 'Rick Scott'}, + .....: 'counties': [{'name': 'Dade', 'population': 12345}, + .....: {'name': 'Broward', 'population': 40000}, + .....: {'name': 'Palm Beach', 'population': 60000}]}, + .....: {'state': 'Ohio', + .....: 'shortname': 'OH', + .....: 'info': {'governor': 'John Kasich'}, + .....: 'counties': [{'name': 'Summit', 'population': 1234}, + .....: {'name': 'Cuyahoga', 'population': 1337}]}] + .....: + +In [257]: json_normalize(data, 'counties', ['state', 'shortname', ['info', 'governor']]) +Out[257]: + name population state shortname info.governor +0 Dade 12345 Florida FL Rick Scott +1 Broward 40000 Florida FL Rick Scott +2 Palm Beach 60000 Florida FL Rick Scott +3 Summit 1234 Ohio OH John Kasich +4 Cuyahoga 1337 Ohio OH John Kasich + +``` +max_level 参数提供了对结束规范化的级别的更多控制。 当max_level = 1时,以下代码段会标准化,直到提供了字典的第一个嵌套级别为止。 + +``` python +In [258]: data = [{'CreatedBy': {'Name': 'User001'}, + .....: 'Lookup': {'TextField': 'Some text', + .....: 'UserField': {'Id': 'ID001', + .....: 'Name': 'Name001'}}, + .....: 'Image': {'a': 'b'} + .....: }] + .....: + +In [259]: json_normalize(data, max_level=1) +Out[259]: + CreatedBy.Name Lookup.TextField Lookup.UserField Image.a +0 User001 Some text {'Id': 'ID001', 'Name': 'Name001'} b + +``` + +### json的行分割(Line delimited json) + +*New in version 0.19.0.* + +pandas能够读取和写入行分隔的json文件通常是在用Hadoop或Spark进行数据处理的管道中。 + +*New in version 0.21.0.* + +对于行分隔的json文件,pandas也可以返回一个迭代器,它能一次读取`chunksize`行。 这对于大型文件或从数据流中读取非常有用。 + +``` python +In [260]: jsonl = ''' + .....: {"a": 1, "b": 2} + .....: {"a": 3, "b": 4} + .....: ''' + .....: + +In [261]: df = pd.read_json(jsonl, lines=True) + +In [262]: df +Out[262]: + a b +0 1 2 +1 3 4 + +In [263]: df.to_json(orient='records', lines=True) +Out[263]: '{"a":1,"b":2}\n{"a":3,"b":4}' + +# reader is an iterator that returns `chunksize` lines each iteration +In [264]: reader = pd.read_json(StringIO(jsonl), lines=True, chunksize=1) + +In [265]: reader +Out[265]: + +In [266]: for chunk in reader: + .....: print(chunk) + .....: +Empty DataFrame +Columns: [] +Index: [] + a b +0 1 2 + a b +1 3 4 + +``` + +### 表模式(Table schema) + +*New in version 0.20.0.* + +表模式([Table schema](https://specs.frictionlessdata.io/json-table-schema/))是用于将表格数据集描述为JSON对象的一种规范。 JSON包含有关字段名称,类型和其他属性的信息。 你可以使用面向`table`来构建一个JSON字符串包含两个字段,`schema`和`data`。 + +``` python +In [267]: df = pd.DataFrame({'A': [1, 2, 3], + .....: 'B': ['a', 'b', 'c'], + .....: 'C': pd.date_range('2016-01-01', freq='d', periods=3)}, + .....: index=pd.Index(range(3), name='idx')) + .....: + +In [268]: df +Out[268]: + A B C +idx +0 1 a 2016-01-01 +1 2 b 2016-01-02 +2 3 c 2016-01-03 + +In [269]: df.to_json(orient='table', date_format="iso") +Out[269]: '{"schema": {"fields":[{"name":"idx","type":"integer"},{"name":"A","type":"integer"},{"name":"B","type":"string"},{"name":"C","type":"datetime"}],"primaryKey":["idx"],"pandas_version":"0.20.0"}, "data": [{"idx":0,"A":1,"B":"a","C":"2016-01-01T00:00:00.000Z"},{"idx":1,"A":2,"B":"b","C":"2016-01-02T00:00:00.000Z"},{"idx":2,"A":3,"B":"c","C":"2016-01-03T00:00:00.000Z"}]}' + +``` +`schema`字段包含`fields`主键,它本身包含一个列名称到列对的列表,包括`Index`或`MultiIndex`(请参阅下面的类型列表)。 如果(多)索引是唯一的,则`schema`字段也包含一个`primaryKey`字段。 + +第二个字段`data`包含用面向`records`来序列化数据。 索引是包括的,并且任何日期时间都是ISO 8601格式,正如表模式规范所要求的那样。 + +表模式规范中描述了所有支持的全部类型列表。 此表显示了pandas类型的映射: + +Pandas type | Table Schema type +---|--- +int64 | integer +float64 | number +bool | boolean +datetime64[ns] | datetime +timedelta64[ns] | duration +categorical | any +object | str + +关于生成的表模式的一些注意事项: + +- `schema`对象包含`pandas_version`的字段。 它包含模式的pandas方言版本,并将随每个修订增加。 +- 序列化时,所有日期都转换为UTC。 甚至是时区的初始值,也被视为UTC,偏移量为0。 + +``` python +In [270]: from pandas.io.json import build_table_schema + +In [271]: s = pd.Series(pd.date_range('2016', periods=4)) + +In [272]: build_table_schema(s) +Out[272]: +{'fields': [{'name': 'index', 'type': 'integer'}, + {'name': 'values', 'type': 'datetime'}], + 'primaryKey': ['index'], + 'pandas_version': '0.20.0'} + +``` +- 具有时区的日期时间(在序列化之前),包括具有时区名称的附加字段`tz`(例如:`'US / Central'`)。 + +``` python +In [273]: s_tz = pd.Series(pd.date_range('2016', periods=12, + .....: tz='US/Central')) + .....: + +In [274]: build_table_schema(s_tz) +Out[274]: +{'fields': [{'name': 'index', 'type': 'integer'}, + {'name': 'values', 'type': 'datetime', 'tz': 'US/Central'}], + 'primaryKey': ['index'], + 'pandas_version': '0.20.0'} + +``` +- 时间段在序列化之前是转换为时间戳的,因此具有转换为UTC的相同方式。 此外,时间段将包含具有时间段频率的附加字段`freq`,例如:`'A-DEC'`。 + +``` python +In [275]: s_per = pd.Series(1, index=pd.period_range('2016', freq='A-DEC', + .....: periods=4)) + .....: + +In [276]: build_table_schema(s_per) +Out[276]: +{'fields': [{'name': 'index', 'type': 'datetime', 'freq': 'A-DEC'}, + {'name': 'values', 'type': 'integer'}], + 'primaryKey': ['index'], + 'pandas_version': '0.20.0'} + +``` +- 分类使用`any`类型和`enum`约束来列出可能值的集合。 此外,还包括一个`ordered`字段: + +``` python +In [277]: s_cat = pd.Series(pd.Categorical(['a', 'b', 'a'])) + +In [278]: build_table_schema(s_cat) +Out[278]: +{'fields': [{'name': 'index', 'type': 'integer'}, + {'name': 'values', + 'type': 'any', + 'constraints': {'enum': ['a', 'b']}, + 'ordered': False}], + 'primaryKey': ['index'], + 'pandas_version': '0.20.0'} + +``` +- 如果索引是唯一的,则包含`primaryKey`字段,它包含了标签数组: + +``` python +In [279]: s_dupe = pd.Series([1, 2], index=[1, 1]) + +In [280]: build_table_schema(s_dupe) +Out[280]: +{'fields': [{'name': 'index', 'type': 'integer'}, + {'name': 'values', 'type': 'integer'}], + 'pandas_version': '0.20.0'} + +``` +- `primaryKey `的形式与多索引相同,但在这种情况下,`primaryKey`是一个数组: + +``` python +In [281]: s_multi = pd.Series(1, index=pd.MultiIndex.from_product([('a', 'b'), + .....: (0, 1)])) + .....: + +In [282]: build_table_schema(s_multi) +Out[282]: +{'fields': [{'name': 'level_0', 'type': 'string'}, + {'name': 'level_1', 'type': 'integer'}, + {'name': 'values', 'type': 'integer'}], + 'primaryKey': FrozenList(['level_0', 'level_1']), + 'pandas_version': '0.20.0'} + +``` +- 默认命名大致遵循以下规则: + + - 对于series,使用`object.name`。 如果没有,那么名称就是`values` + - 对于`DataFrames`,使用列名称的字符串化版本 + - 对于`Index`(不是`MultiIndex`),使用`index.name`,如果为None,则使用回退`index`。 + - 对于`MultiIndex`,使用`mi.names`。 如果任何级别没有名称,则使用`level_`。 + +*New in version 0.23.0.* + +`read_json`也接受`orient ='table'`作为参数。 这允许以可循环移动的方式保存诸如dtypes和索引名称之类的元数据。 + +``` python +In [283]: df = pd.DataFrame({'foo': [1, 2, 3, 4], + .....: 'bar': ['a', 'b', 'c', 'd'], + .....: 'baz': pd.date_range('2018-01-01', freq='d', periods=4), + .....: 'qux': pd.Categorical(['a', 'b', 'c', 'c']) + .....: }, index=pd.Index(range(4), name='idx')) + .....: + +In [284]: df +Out[284]: + foo bar baz qux +idx +0 1 a 2018-01-01 a +1 2 b 2018-01-02 b +2 3 c 2018-01-03 c +3 4 d 2018-01-04 c + +In [285]: df.dtypes +Out[285]: +foo int64 +bar object +baz datetime64[ns] +qux category +dtype: object + +In [286]: df.to_json('test.json', orient='table') + +In [287]: new_df = pd.read_json('test.json', orient='table') + +In [288]: new_df +Out[288]: + foo bar baz qux +idx +0 1 a 2018-01-01 a +1 2 b 2018-01-02 b +2 3 c 2018-01-03 c +3 4 d 2018-01-04 c + +In [289]: new_df.dtypes +Out[289]: +foo int64 +bar object +baz datetime64[ns] +qux category +dtype: object + +``` +请注意,作为 [Index](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.html#pandas.Index) 名称的文字字符串'index'是不能循环移动的,也不能在 [MultiIndex](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html#pandas.MultiIndex) 中用以`'level_'`开头的任何名称。 这些默认情况下在 [DataFrame.to_json()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html#pandas.DataFrame.to_json) 中用于指示缺失值和后续读取无法区分的目的。 + +``` python +In [290]: df.index.name = 'index' + +In [291]: df.to_json('test.json', orient='table') + +In [292]: new_df = pd.read_json('test.json', orient='table') + +In [293]: print(new_df.index.name) +None + +``` + +## HTML + +### 读取HTML的内容 + +::: danger 警告: + +我们**强烈建议**你阅读 [HTML Table Parsing gotchas](https://www.pypandas.cn/docs/user_guide/io.html#io-html-gotchas)里面相关的围绕BeautifulSoup4/html5lib/lxml解析器部分的问题。 + +::: + +顶级的`read_html()`函数能接受HTML字符串/文件/URL格式,并且能解析HTML 表格为pandas`DataFrames`的列表,让我们看看下面的几个例子。 + +::: tip 注意: + +`read_html`返回的是一个`DataFrame`对象的`list`,即便在HTML页面里只包含单个表格。 + +::: + +读取一个没有选项的URL: + +```python +In [294]: url = 'https://www.fdic.gov/bank/individual/failed/banklist.html' + +In [295]: dfs = pd.read_html(url) + +In [296]: dfs +Out[296]: +[ Bank Name City ST CERT Acquiring Institution Closing Date Updated Date + 0 The Enloe State Bank Cooper TX 10716 Legend Bank, N. A. May 31, 2019 June 18, 2019 + 1 Washington Federal Bank for Savings Chicago IL 30570 Royal Savings Bank December 15, 2017 February 1, 2019 + 2 The Farmers and Merchants State Bank of Argonia Argonia KS 17719 Conway Bank October 13, 2017 February 21, 2018 + 3 Fayette County Bank Saint Elmo IL 1802 United Fidelity Bank, fsb May 26, 2017 January 29, 2019 + 4 Guaranty Bank, (d/b/a BestBank in Georgia & Mi... Milwaukee WI 30003 First-Citizens Bank & Trust Company May 5, 2017 March 22, 2018 + .. ... ... .. ... ... ... ... + 551 Superior Bank, FSB Hinsdale IL 32646 Superior Federal, FSB July 27, 2001 August 19, 2014 + 552 Malta National Bank Malta OH 6629 North Valley Bank May 3, 2001 November 18, 2002 + 553 First Alliance Bank & Trust Co. Manchester NH 34264 Southern New Hampshire Bank & Trust February 2, 2001 February 18, 2003 + 554 National State Bank of Metropolis Metropolis IL 3815 Banterra Bank of Marion December 14, 2000 March 17, 2005 + 555 Bank of Honolulu Honolulu HI 21029 Bank of the Orient October 13, 2000 March 17, 2005 + + [556 rows x 7 columns]] + +``` + +::: tip 注意: + +上面的URL数据修改了每个周一以至于上面的数据结果跟下面的数据结果可能有轻微的不同。 + +::: + +从上面的URL读取文件内容并且传递它给`read_html`作为一个字符串: + +```python +In [297]: with open(file_path, 'r') as f: + .....: dfs = pd.read_html(f.read()) + .....: + +In [298]: dfs +Out[298]: +[ Bank Name City ST CERT Acquiring Institution Closing Date Updated Date + 0 Banks of Wisconsin d/b/a Bank of Kenosha Kenosha WI 35386 North Shore Bank, FSB May 31, 2013 May 31, 2013 + 1 Central Arizona Bank Scottsdale AZ 34527 Western State Bank May 14, 2013 May 20, 2013 + 2 Sunrise Bank Valdosta GA 58185 Synovus Bank May 10, 2013 May 21, 2013 + 3 Pisgah Community Bank Asheville NC 58701 Capital Bank, N.A. May 10, 2013 May 14, 2013 + 4 Douglas County Bank Douglasville GA 21649 Hamilton State Bank April 26, 2013 May 16, 2013 + .. ... ... .. ... ... ... ... + 500 Superior Bank, FSB Hinsdale IL 32646 Superior Federal, FSB July 27, 2001 June 5, 2012 + 501 Malta National Bank Malta OH 6629 North Valley Bank May 3, 2001 November 18, 2002 + 502 First Alliance Bank & Trust Co. Manchester NH 34264 Southern New Hampshire Bank & Trust February 2, 2001 February 18, 2003 + 503 National State Bank of Metropolis Metropolis IL 3815 Banterra Bank of Marion December 14, 2000 March 17, 2005 + 504 Bank of Honolulu Honolulu HI 21029 Bank of the Orient October 13, 2000 March 17, 2005 + + [505 rows x 7 columns]] + +``` + +甚至如果你想,你还可以传递一个`StringIO`的实例: + +```python +In [299]: with open(file_path, 'r') as f: + .....: sio = StringIO(f.read()) + .....: + +In [300]: dfs = pd.read_html(sio) + +In [301]: dfs +Out[301]: +[ Bank Name City ST CERT Acquiring Institution Closing Date Updated Date + 0 Banks of Wisconsin d/b/a Bank of Kenosha Kenosha WI 35386 North Shore Bank, FSB May 31, 2013 May 31, 2013 + 1 Central Arizona Bank Scottsdale AZ 34527 Western State Bank May 14, 2013 May 20, 2013 + 2 Sunrise Bank Valdosta GA 58185 Synovus Bank May 10, 2013 May 21, 2013 + 3 Pisgah Community Bank Asheville NC 58701 Capital Bank, N.A. May 10, 2013 May 14, 2013 + 4 Douglas County Bank Douglasville GA 21649 Hamilton State Bank April 26, 2013 May 16, 2013 + .. ... ... .. ... ... ... ... + 500 Superior Bank, FSB Hinsdale IL 32646 Superior Federal, FSB July 27, 2001 June 5, 2012 + 501 Malta National Bank Malta OH 6629 North Valley Bank May 3, 2001 November 18, 2002 + 502 First Alliance Bank & Trust Co. Manchester NH 34264 Southern New Hampshire Bank & Trust February 2, 2001 February 18, 2003 + 503 National State Bank of Metropolis Metropolis IL 3815 Banterra Bank of Marion December 14, 2000 March 17, 2005 + 504 Bank of Honolulu Honolulu HI 21029 Bank of the Orient October 13, 2000 March 17, 2005 + + [505 rows x 7 columns]] + +``` + +::: tip 注意: + +以下的例子在IPython的程序中不会运行,因为有太多的网络接入函数减缓了文档的创建。如果你的程序报错或者例子不运行,请立即向[ pandas GitHub issues page](https://www.github.com/pandas-dev/pandas/issues) 上报。 + +::: + +读取一个URL并匹配表格里面所包含的具体文本内容: + +```python +match = 'Metcalf Bank' +df_list = pd.read_html(url, match=match) + +``` + +指定一个标题行(通过默认的\或者\定位并伴随一个\被用来作为列的索引,如果是多行含有\,则多索引就会被创建);如果已经指定,标题行则从数据减去已解析的标题元素中获取(\元素)。 + +```python +dfs = pd.read_html(url, header=0) + +``` + +指定一个索引列: + +```python +dfs = pd.read_html(url, index_col=0) + +``` + +指定跳过行的数量: + +```python +dfs = pd.read_html(url, skiprows=0) + +``` + +指定使用列表来跳过行的数量(`xrange`(只在Python 2 中)也有效): + +```python +dfs = pd.read_html(url, skiprows=range(2)) + +``` + +指定一个HTML属性: + +```python +dfs1 = pd.read_html(url, attrs={'id': 'table'}) +dfs2 = pd.read_html(url, attrs={'class': 'sortable'}) +print(np.array_equal(dfs1[0], dfs2[0])) # Should be True + +``` +指定值将会被转换为NaN(非数值): + +```python +dfs = pd.read_html(url, na_values=['No Acquirer']) + +``` + +*New in version 0.19.* + +指定是否保持默认的NaN值的设置: + +```python +dfs = pd.read_html(url, keep_default_na=False) + +``` + +*New in version 0.19.* + +指定列的转换器。这对于有前置零的数字文本数据很有用。默认情况下,数值列会转换成数值类型且前置零会丢失。为了避免这种情况,我们能转换这些列为字符串。 + +```python +url_mcc = 'https://en.wikipedia.org/wiki/Mobile_country_code' +dfs = pd.read_html(url_mcc, match='Telekom Albania', header=0, + converters={'MNC': str}) + +``` + +*New in version 0.19.* + +把上面的一些结合使用: + +```python +dfs = pd.read_html(url, match='Metcalf Bank', index_col=0) + +``` + +读取pandas`to_html`输出(同时一些精确的浮点会失去): + +```python +df = pd.DataFrame(np.random.randn(2, 2)) +s = df.to_html(float_format='{0:.40g}'.format) +dfin = pd.read_html(s, index_col=0) + +``` + +如果`lxml`后端是你提供的唯一解析器,那么它将在解析失败时报错。如果你能提供的解析器只有一个就选字符串,但是传递一个字符串列表会是很好的训练,例如,这个函数期望是一个字符串序列。你可以这样使用: + +```python +dfs = pd.read_html(url, 'Metcalf Bank', index_col=0, flavor=['lxml']) + +``` + +或者你可以传递`flavor='lxml'`而不要列表: + +```python +dfs = pd.read_html(url, 'Metcalf Bank', index_col=0, flavor='lxml') + +``` + +然而,如果你已经安装了bs4 和 html5lib并且传递`None`或`['lxml', 'bs4']`,那么极大可能会解析成功。注意*一旦解析成功了,函数将会返回。* + +```python +dfs = pd.read_html(url, 'Metcalf Bank', index_col=0, flavor=['lxml', 'bs4']) + +``` + +### 写入HTML文件 + +`DataFrame`对象具有实例的方法`to_html`,它能渲染`DataFrame`的内容为HTML表格。这个函数的参数同上面的`to_string`方法的一样。 + +::: tip 注意: + +为了简洁起见,这儿显示的不是所有的`DataFrame.to_html`可选项。所有的选项设置见`to_html()`。 + +::: + +```python +In [302]: df = pd.DataFrame(np.random.randn(2, 2)) + +In [303]: df +Out[303]: + 0 1 +0 -0.184744 0.496971 +1 -0.856240 1.857977 + +In [304]: print(df.to_html()) # raw html + + + + + + + + + + + + + + + + + + + + +
01
0-0.1847440.496971
1-0.8562401.857977
+ +``` + +HTML: + +| **-** | **0** | **1** | +| --- | --- | --- | +| 0 | -0.184744 | 0.496971 | +| 1 | -0.856240 | 1.857977 | + +```python +In [302]: df = pd.DataFrame(np.random.randn(2, 2)) + +In [303]: df +Out[303]: + 0 1 +0 -0.184744 0.496971 +1 -0.856240 1.857977 + +In [304]: print(df.to_html()) # raw html + + + + + + + + + + + + + + + + + + + + +
01
0-0.1847440.496971
1-0.8562401.857977
+``` +`columns `参数将限制列的显示: + +```python +In [305]: print(df.to_html(columns=[0])) + + + + + + + + + + + + + + + + + +
0
0-0.184744
1-0.856240
+ +``` +HTML: + +| **-** | **0** | +| --- | --- | +| 0 | -0.184744 | +| 1 | -0.856240 | + +`float_format`采用可调用的 Python来控制浮点值的精确度: + +```python +In [306]: print(df.to_html(float_format='{0:.10f}'.format)) + + + + + + + + + + + + + + + + + + + + +
01
0-0.18474385760.4969711327
1-0.85623967631.8579766508
+ +``` + +HTML: + +| **-** | **0** | **1** | +| --- | --- | --- | +| 0 | -0.1847438576 | 0.4969711327 | +| 1 | -0.8562396763 | 1.8579766508 | + +默认情况下,`bold_rows`可以加粗行标签,但是你可以关掉它: + +```python +In [307]: print(df.to_html(bold_rows=False)) + + + + + + + + + + + + + + + + + + + + +
01
0-0.1847440.496971
1-0.8562401.857977
+ +``` + +| **-** | **0** | **1** | +| --- | --- | --- | +| 0 | -0.184744 | 0.496971 | +| 1 | -0.856240 | 1.857977 | + +`classes `参数提供了能生成HTML表的CSS类的功能。注意这些类是已添加到现有的`'dataframe' `类中的。 + +```python +In [308]: print(df.to_html(classes=['awesome_table_class', 'even_more_awesome_class'])) + + + + + + + + + + + + + + + + + + + + +
01
0-0.1847440.496971
1-0.8562401.857977
+ +``` +`render_links`参数提供了向包含URL的单元格添加超链接的功能。 + +*New in version 0.24.* + +```python +In [309]: url_df = pd.DataFrame({ + .....: 'name': ['Python', 'Pandas'], + .....: 'url': ['https://www.python.org/', 'http://pandas.pydata.org']}) + .....: + +In [310]: print(url_df.to_html(render_links=True)) + + + + + + + + + + + + + + + + + + + + +
nameurl
0Pythonhttps://www.python.org/
1Pandashttp://pandas.pydata.org
+ +``` + +HTML: + +| **-** | **name** | **url** | +| --- | --- | --- | +| 0 | Python | [https://www.python.org/](https://www.python.org/) | +| 1 | Pandas | [http://pandas.pydata.org](http://pandas.pydata.org) | + +最后,`escape`参数允许你控制是否对生成的 HTML字符“<”, “>”和 “&”进行转义(默认是`True`)。因此,获取不转义的HTML字符就设置为`escape=False`。 + +```python +In [311]: df = pd.DataFrame({'a': list('&<>'), 'b': np.random.randn(3)}) + +``` +转义的: + +```python +In [312]: print(df.to_html()) + + + + + + + + + + + + + + + + + + + + + + + + + +
ab
0&-0.474063
1<-0.230305
2>-0.400654
+ +``` +| **-** | **a** | **b** | +| --- | --- | --- | +| 0 | & | -0.474063 | +| 1 | < | -0.230305 | +| 2 | > | -0.400654 | + +不转义的: + +```python +In [313]: print(df.to_html(escape=False)) + + + + + + + + + + + + + + + + + + + + + + + + + +
ab
0&-0.474063
1<-0.230305
2>-0.400654
+ +``` + +| **-** | **a** | **b** | +| --- | --- | --- | +| 0 | & | -0.474063 | +| 1 | < | -0.230305 | +| 2 | > | -0.400654 | + +::: tip 注意: + +一些浏览器在渲染上面的两个HTML表格的时候可能看不出区别。 + +::: + +### HTML表格解析陷阱 + +在使用顶级的pandas io函数`read_html`来解析HTML表格的时候,围绕这些库,存在一些版本的问题。 + +**[lxml](https://lxml.de/)问题**: + +- 优点: + - [lxml](https://lxml.de/) 是非常快的。 + - [lxml](https://lxml.de/)要求Cython正确安装。 + +- 缺点: + - [lxml](https://lxml.de/) 不能保证它的解析结果除非使用[严格有效地标记](https://validator.w3.org/docs/help.html#validation_basics)。 + - 鉴于上述情况,我们选择允许用户使用 [lxml](https://lxml.de/) 作为后端,但是如果 [lxml](https://lxml.de/) 解析失败,**这个后端将使用[html5lib](https://github.com/html5lib/html5lib-python)**。 + - 因此,强烈推荐你安装**[BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup)**和**[html5lib](https://github.com/html5lib/html5lib-python)**这两个库。这样即使[lxml](https://lxml.de/)解析失败,你仍然能够得到一个有效的结果(前提是其他所有内容都有效)。 + +**[BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup)使用[lxml](https://lxml.de/)作为后端的问题**: + +- 以上问题仍然会存在因为**[BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup)**本质上是一个围绕后端解析的包装器。 + +**[BeautifulSoup4](https://www.crummy.com/software/BeautifulSoup)使用[html5lib](https://github.com/html5lib/html5lib-python)作为后端的问题**: + +- 优点: + - [html5lib](https://github.com/html5lib/html5lib-python)比[lxml](https://lxml.de/)宽容得多,所以会以一种更理智的方式处理*现实生活中的标记*,而不是仅仅,比如在未通知你的情况下删除元素。 + - [html5lib](https://github.com/html5lib/html5lib-python)*能自动从无效标记中生成有效的 HTML5 标记*。这在解析HTML表格的时候是相当重要的,因为它保证了它是有效的文件。然而这不意味着它是“正确的“,因为修复标记的过程没有一个定义。 + - [html5lib](https://github.com/html5lib/html5lib-python)是纯净的Python,除了它自己的安装步骤没有其他的步骤。 + +- 缺点: + - 使用[html5lib](https://github.com/html5lib/html5lib-python)最大的缺点就是太慢了。但是考虑到网络上许多表格并不足以如解析算法运行时的那么重要,它更可能像是正在通过网络上的URL读取原始文本过程中的瓶颈。例如当IO(输入-输出) 时,对于非常大的表,事实可能并非如此。 + +## Excel 文件 + +[read_excel()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel)方法使用Python的`xlrd`模块来读取Excel 2003(`.xls`)版的文件,而Excel 2007+ (`.xlsx`)版本的是用`xlrd`或者`openpyxl`模块来读取的。[to_excel()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html#pandas.DataFrame.to_excel)方法则是用来把`DataFrame`数据存储为Excel格式。一般来说,它的语法同使用[csv](https://www.pypandas.cn/docs/user_guide/io.html#io-read-csv-table)数据是类似的,更多高级的用法可以参考[cookbook](https://www.pypandas.cn/docs/user_guide/cookbook.html#cookbook-excel)。 + +### 读取 Excel 文件 + +在大多数基本的使用案例中,`read_excel`会读取Excel文件通过一个路径,并且`sheet_name `会表明需要解析哪一张表格。 + +```python +# Returns a DataFrame +pd.read_excel('path_to_file.xls', sheet_name='Sheet1') + +``` + +#### `ExcelFile` 类 + +为了更方便地读取同一个文件的多张表格,`ExcelFile`类可用来打包文件并传递给`read_excel`。因为仅需读取一次内存,所以这种方式读取一个文件的多张表格会有性能上的优势。 + +```python +xlsx = pd.ExcelFile('path_to_file.xls') +df = pd.read_excel(xlsx, 'Sheet1') + +``` + +`ExcelFile`类也能用来作为上下文管理器。 + +```python +with pd.ExcelFile('path_to_file.xls') as xls: + df1 = pd.read_excel(xls, 'Sheet1') + df2 = pd.read_excel(xls, 'Sheet2') + +``` + +`sheet_names`属性能将文件中的所有表格名字生成一组列表。 + +`ExcelFile`一个主要的用法就是用来解析多张表格的不同参数: + +```python +data = {} +# For when Sheet1's format differs from Sheet2 +with pd.ExcelFile('path_to_file.xls') as xls: + data['Sheet1'] = pd.read_excel(xls, 'Sheet1', index_col=None, + na_values=['NA']) + data['Sheet2'] = pd.read_excel(xls, 'Sheet2', index_col=1) + +``` + +注意如果所有的表格解析同一个参数,那么这组表格名的列表能轻易地传递给`read_excel`且不会有性能上地损失。 + +```python +# using the ExcelFile class +data = {} +with pd.ExcelFile('path_to_file.xls') as xls: + data['Sheet1'] = pd.read_excel(xls, 'Sheet1', index_col=None, + na_values=['NA']) + data['Sheet2'] = pd.read_excel(xls, 'Sheet2', index_col=None, + na_values=['NA']) + +# equivalent using the read_excel function +data = pd.read_excel('path_to_file.xls', ['Sheet1', 'Sheet2'], + index_col=None, na_values=['NA']) + +``` + +`ExcelFile`也能同`xlrd.book.Book`对象作为一个参数被调用。这种方法让用户可以控制Excel文件被如何读取。例如,表格可以根据需求加载通过调用`xlrd.open_workbook()`伴随`on_demand=True`。 + +```python +import xlrd +xlrd_book = xlrd.open_workbook('path_to_file.xls', on_demand=True) +with pd.ExcelFile(xlrd_book) as xls: + df1 = pd.read_excel(xls, 'Sheet1') + df2 = pd.read_excel(xls, 'Sheet2') + +``` + +#### 指定表格 + +::: tip 注意 + +第二个参数是`sheet_name`,不要同`ExcelFile.sheet_names`搞混淆。 + +::: + +::: tip 注意 + +ExcelFile's的属性`sheet_names`提供的是多张表格所生成的列表。 + +::: + +- `sheet_name`参数允许指定单张表格或多张表格被读取。 + +- `sheet_name`的默认值是0,这表明读取的是第一张表格。 + +- 在工作簿里面,使用字符串指向特定的表格名称。 + +- 使用整数指向表格的索引,索引遵守Python的约定是从0开始的。 + +- 无论是使用一组字符串还是整数的列表,返回的都是指定表格的字典。 + +- 使用`None`值则会返回所有可用表格的一组字典。 + +```python +# Returns a DataFrame +pd.read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA']) + +``` + +使用表格索引: + +```python +# Returns a DataFrame +pd.read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA']) + +``` + +使用所有默认值: + +```python +# Returns a DataFrame +pd.read_excel('path_to_file.xls') + +``` + +使用None获取所有表格: + +```python +# Returns a dictionary of DataFrames +pd.read_excel('path_to_file.xls', sheet_name=None) + +``` + +使用列表获取多张表格: + +```python +# Returns the 1st and 4th sheet, as a dictionary of DataFrames. +pd.read_excel('path_to_file.xls', sheet_name=['Sheet1', 3]) + +``` + +`read_excel`能读取不止一张表格,通过`sheet_name`能设置为读取表格名称的列表,表格位置的列表,还能设置为`None`来读取所有表格。多张表格能通过表格索引或表格名称分别使用整数或字符串来指定读取。 + +#### `MultiIndex`读取 + +`read_excel`能用`MultiIndex`读取多个索引,通过`index_col`方法来传递列的列表和`header`将行的列表传递给`MultiIndex`的列。无论是`index`还是`columns`,如果已经具有序列化的层级名称,则可以通过指定组成层级的行/列来读取它们。 + +例如,用`MultiIndex`读取没有名称的索引: + +```python +In [314]: df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [5, 6, 7, 8]}, + .....: index=pd.MultiIndex.from_product([['a', 'b'], ['c', 'd']])) + .....: + +In [315]: df.to_excel('path_to_file.xlsx') + +In [316]: df = pd.read_excel('path_to_file.xlsx', index_col=[0, 1]) + +In [317]: df +Out[317]: + a b +a c 1 5 + d 2 6 +b c 3 7 + d 4 8 + +``` + +如果索引具有层级名称,它们将使用相同的参数进行解析: + +```python +In [318]: df.index = df.index.set_names(['lvl1', 'lvl2']) + +In [319]: df.to_excel('path_to_file.xlsx') + +In [320]: df = pd.read_excel('path_to_file.xlsx', index_col=[0, 1]) + +In [321]: df +Out[321]: + a b +lvl1 lvl2 +a c 1 5 + d 2 6 +b c 3 7 + d 4 8 + +``` + +如果源文件具有`MultiIndex`索引和多列,那么可以使用`index_col`和`header`指定列表的每个值。 + +```python +In [322]: df.columns = pd.MultiIndex.from_product([['a'], ['b', 'd']], + .....: names=['c1', 'c2']) + .....: + +In [323]: df.to_excel('path_to_file.xlsx') + +In [324]: df = pd.read_excel('path_to_file.xlsx', index_col=[0, 1], header=[0, 1]) + +In [325]: df +Out[325]: +c1 a +c2 b d +lvl1 lvl2 +a c 1 5 + d 2 6 +b c 3 7 + d 4 8 + +``` + +#### 解析特定的列 + +常常会有这样的情况,当用户想要插入几列数据到Excel表格里面作为临时计算,但是你又不想要读取这些列的时候,`read_excel`提供的`usecols`方法就派上用场了,它让你可以解析指定的列。 + +*Deprecated since version 0.24.0.* + +不推荐`usecols`方法使用单个整数值,请在`usecols`中使用包括从0开始的整数列表。 + +如果`usecols`是一个整数,那么它将被认为是暗示解析最后一列。 + +```python +pd.read_excel('path_to_file.xls', 'Sheet1', usecols=2) + +``` + +你也可以将逗号分隔的一组Excel列和范围指定为字符串: + +```python +pd.read_excel('path_to_file.xls', 'Sheet1', usecols='A,C:E') + +``` + +如果`usecols`是一组整数列,那么将认为是解析的文件列索引。 + +```python +pd.read_excel('path_to_file.xls', 'Sheet1', usecols=[0, 2, 3]) + +``` + +元素的顺序是可以忽略的,因此`usecols=[0, 1]`是等价于`[1, 0]`的。 + +*New in version 0.24.* + +如果`usecols`是字符串列表,那么可以认为每个字符串对应的就是表格的每一个列名,列名是由`name`中的用户提供或从文档标题行推断出来。这些字符串定义了那些列将要被解析: + +```python +pd.read_excel('path_to_file.xls', 'Sheet1', usecols=['foo', 'bar']) + +``` + +元素的顺序同样被忽略,因此`usecols=['baz', 'joe']`等同于`['joe', 'baz']`。 + +*New in version 0.24.* + +如果`usecols`是可调用的,那么该调用函数将会根据列名来调用,也会返回根据可调用函数为`True`的列名。 + +```python +pd.read_excel('path_to_file.xls', 'Sheet1', usecols=lambda x: x.isalpha()) + +``` + +#### 解析日期 + +当读取excel文件的时候,像日期时间的值通常会自动转换为恰当的dtype(数据类型)。但是如果你有一列字符串看起来很像日期(实际上并不是excel里面的日期格式),那么你就能使用`parse_dates`方法来解析这些字符串为日期: + +```python +pd.read_excel('path_to_file.xls', 'Sheet1', parse_dates=['date_strings']) + +``` + +#### 单元格转换 + +Excel里面的单元格内容是可以通过`converters`方法来进行转换的。例如,把一列转换为布尔值: + +```python +pd.read_excel('path_to_file.xls', 'Sheet1', converters={'MyBools': bool}) + +``` + +这个方法可以处理缺失值并且能对缺失的数据进行如期的转换。由于转换是在单元格之间发生而不是整列,因此不能保证dtype为数组。例如一列含有缺失值的整数是不能转换为具有整数dtype的数组,因为NaN严格的被认为是浮点数。你能够手动地标记缺失数据为恢复整数dtype: + +```python +def cfun(x): + return int(x) if x else -1 + + +pd.read_excel('path_to_file.xls', 'Sheet1', converters={'MyInts': cfun}) + +``` + +#### 数据类型规范 + +*New in version 0.20.* + +作为另一个种转换器,使用*dtype*能指定整列地类型,它能让字典映射列名为数据类型。使用`str`或`object`来转译不能判断类型的数据: + +```python +pd.read_excel('path_to_file.xls', dtype={'MyInts': 'int64', 'MyText': str}) + +``` + +### 写入Excel文件 + +#### 写入Excel文件到磁盘 + +你可以使用`to_excel`方法把`DataFrame`对象写入到Excel文件的一张表格中。它的参数大部分同前面`to_csv `提到的相同,第一个参数是excel文件的名字,而可选的第二个参数是`DataFrame`应该写入的表格名称,例如: + +```python +df.to_excel('path_to_file.xlsx', sheet_name='Sheet1') + +``` + +文件以`.xls` 结尾的将用`xlwt`写入,而那些以`.xlsx`结尾的则使用`xlsxwriter`(如果可用的话)或`openpyxl`来写入。 + +`DataFrame `将尝试以模拟REPL(“读取-求值-输出" 循环的简写)输出的方式写入。`index_label`将代替第一行放置到第二行,你也能放置它到第一行通过在`to_excel()`里设置`merge_cells`选项为`False`: + +```python +df.to_excel('path_to_file.xlsx', index_label='label', merge_cells=False) + +``` + +为了把`DataFrames`数据分开写入Excel文件的不同表格中,可以使用`ExcelWriter`方法。 + +```python +with pd.ExcelWriter('path_to_file.xlsx') as writer: + df1.to_excel(writer, sheet_name='Sheet1') + df2.to_excel(writer, sheet_name='Sheet2') + +``` + +::: tip 注意 + +为了从`read_excel`内部获取更多点的性能,Excel存储所有数值型数据为浮点数。但这会产生意外的情况当读取数据的时候,如果没有损失信息的话(`1.0 --> 1`),pandas默认的转换整数为浮点数。你可以通过`convert_float=False`禁止这种行为,这可能会在性能上有轻微的优化。 + +::: + +#### 写入Excel文件到内存 + +Pandas支持写入Excel文件到类缓存区对象如`StringIO`或`BytesIO`,使用`ExcelWriter`方法。 + +```python +# Safe import for either Python 2.x or 3.x +try: + from io import BytesIO +except ImportError: + from cStringIO import StringIO as BytesIO + +bio = BytesIO() + +# By setting the 'engine' in the ExcelWriter constructor. +writer = pd.ExcelWriter(bio, engine='xlsxwriter') +df.to_excel(writer, sheet_name='Sheet1') + +# Save the workbook +writer.save() + +# Seek to the beginning and read to copy the workbook to a variable in memory +bio.seek(0) +workbook = bio.read() + +``` +::: tip 注意 + +虽然`engine`是可选方法,但是推荐使用。设置engine决定了工作簿生成的版本。设置`engine='xlrd'`将生成 Excel 2003版的工作簿(xls)。而使用`'openpyxl'`或`'xlsxwriter'`将生成Excel 2007版的工作簿(xlsx)。如果省略,将直接生成Excel 2007版的。 + +::: + +### Excel写入引擎 + +Pandas选择Excel写入有两种方式: + +1. 使用`engine`参数 +2. 文件名的扩展(通过默认的配置方式指定) + +默认的,pandas使用[ XlsxWriter](https://xlsxwriter.readthedocs.io/)为`.xlsx`,使用[openpyxl](https://openpyxl.readthedocs.io/)为`.xlsm`,并且使用[xlwt](http://www.python-excel.org/)为`.xls`文件。如果你安装了多个引擎,你可以通过[setting the config options](https://www.pypandas.cn/docs/user_guide/options.html#options)`io.excel.xlsx.writer`和`io.excel.xls.writer`方法设置默认引擎。如果[ XlsxWriter](https://xlsxwriter.readthedocs.io/)不可用,pandas将回退使用[openpyxl](https://openpyxl.readthedocs.io/)为`xlsx`文件。 + +为了指定你想要使用的写入方式,你可以设置引擎的主要参数为`to_excel`和`ExcelWriter`。内置引擎是: + +- `openpyxl`: 要求2.4或者更高的版本。 +- `xlsxwriter` +- `xlwt` + +```python +# By setting the 'engine' in the DataFrame 'to_excel()' methods. +df.to_excel('path_to_file.xlsx', sheet_name='Sheet1', engine='xlsxwriter') + +# By setting the 'engine' in the ExcelWriter constructor. +writer = pd.ExcelWriter('path_to_file.xlsx', engine='xlsxwriter') + +# Or via pandas configuration. +from pandas import options # noqa: E402 +options.io.excel.xlsx.writer = 'xlsxwriter' + +df.to_excel('path_to_file.xlsx', sheet_name='Sheet1') + +``` + +### 样式 + +通过pandas产生的Excel工作表的样式可以使用`DataFrame`的`to_excel`方法的以下参数进行修改。 + +- `float_format`:格式化字符串用于浮点数(默认是`None`)。 +- `freeze_panes`:两个整数的元组,表示要固化的最底行和最右列。这些参数中的每个都是以1为底,因此(1, 1)将固化第一行和第一列(默认是`None`)。 + +使用[ XlsxWriter](https://xlsxwriter.readthedocs.io/)引擎提供的多种方法来修改用`to_excel`方法创建的Excel工作表的样式。你能在[ XlsxWriter](https://xlsxwriter.readthedocs.io/)文档里面找到绝佳的例子:[https://xlsxwriter.readthedocs.io/working_with_pandas.html](https://xlsxwriter.readthedocs.io/working_with_pandas.html) + +## OpenDocument 电子表格 + +*New in version 0.25.* + +[`read_excel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel "`read_excel`")方法也能使用`odfpy`模块来读取OpenDocument电子表格。读取OpenDocument电子表格的语法和方法同使用`engine='odf'`来操作[Excel files](https://www.pypandas.cn/docs/user_guide/io.html#excel-files "Excel files")的方法类似。 + +```python +# Returns a DataFrame +pd.read_excel('path_to_file.ods', engine='odf') +``` +::: tip 注意 + +目前pandas仅支持读取OpenDocument电子表格,写入是不行的。 + +::: + +## 剪贴板 + +使用`read_clipboard()`方法是一种便捷的获取数据的方式,通过把剪贴的内容暂存,然后传递给`read_csv`方法。例如,你可以复制以下文本来剪贴(在许多操作系统上是CTRL-C): + +```python + A B C +x 1 4 p +y 2 5 q +z 3 6 r +``` + +接着直接使用`DataFrame`来导入数据: + +```python +>>> clipdf = pd.read_clipboard() +>>> clipdf + A B C +x 1 4 p +y 2 5 q +z 3 6 r +``` + +`to_clipboard`方法可以把`DataFrame`内容写入到剪贴板。使用下面的方法你可以粘贴剪贴板的内容到其他应用(在许多系统中用的是CTRL-V)。这里我们解释一下如何使用`DataFrame`把内容写入到剪贴板并读回。 + +```python +>>> df = pd.DataFrame({'A': [1, 2, 3], +... 'B': [4, 5, 6], +... 'C': ['p', 'q', 'r']}, +... index=['x', 'y', 'z']) +>>> df + A B C +x 1 4 p +y 2 5 q +z 3 6 r +>>> df.to_clipboard() +>>> pd.read_clipboard() + A B C +x 1 4 p +y 2 5 q +z 3 6 r +``` + +我们可以看到返回了同样的内容,那就是我们早先写入剪贴板的内容。 + +::: tip 注意 + +要使用上面的这些方法,你可能需要在Linux上面安装(带有PyQt5, PyQt4 or qtpy)的xclip或者xsel 。 + +::: + +## 序列化(Pickling) + +所有的pandas对象都具有`to_pickle`方法,该方法使用Python的` cPickle`模块以序列化格式存储数据结构到磁盘上。 + +```python +In [326]: df +Out[326]: +c1 a +c2 b d +lvl1 lvl2 +a c 1 5 + d 2 6 +b c 3 7 + d 4 8 + +In [327]: df.to_pickle('foo.pkl') +``` + +在`pandas`中命名的`read_pickle`函数能够从文件中加载任意序列化的pandas对象(或者任何其他的序列化对象): + +```python +In [328]: pd.read_pickle('foo.pkl') +Out[328]: +c1 a +c2 b d +lvl1 lvl2 +a c 1 5 + d 2 6 +b c 3 7 + d 4 8 +``` + +::: danger 警告 + +加载来自不信任来源的序列化数据是不安全的。 +参见:https://docs.python.org/3/library/pickle.html + +::: + +::: danger 警告 + +[`read_pickle()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html#pandas.read_pickle "`read_pickle()`")仅在pandas的0.20.3版本及以下版本兼容。 + +::: + +### 压缩序列化文件 + +*New in version 0.20.0.* + +[`read_pickle()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html#pandas.read_pickle "`read_pickle()`"),[`DataFrame.to_pickle()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html#pandas.DataFrame.to_pickle)和[`Series.to_pickle()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_pickle.html#pandas.Series.to_pickle)能够读取和写入压缩的序列化文件。读取和写入所支持的压缩文件类型有`gzip`, `bz2`, `xz`。`zip`文件格式仅支持读取,并且必须仅包含一个要读取的数据文件。 + +压缩类型可以是显式参数,也可以从文件扩展名中推断出来。如果文件名是以`'.gz'`,` '.bz2'`,` '.zip'`, 或者` '.xz'`结尾的,那么可以推断出应分别使用`gzip`, `bz2`,`zip`,或 `xz`压缩类型。 + +```python +In [329]: df = pd.DataFrame({ + .....: 'A': np.random.randn(1000), + .....: 'B': 'foo', + .....: 'C': pd.date_range('20130101', periods=1000, freq='s')}) + .....: + +In [330]: df +Out[330]: + A B C +0 -0.288267 foo 2013-01-01 00:00:00 +1 -0.084905 foo 2013-01-01 00:00:01 +2 0.004772 foo 2013-01-01 00:00:02 +3 1.382989 foo 2013-01-01 00:00:03 +4 0.343635 foo 2013-01-01 00:00:04 +.. ... ... ... +995 -0.220893 foo 2013-01-01 00:16:35 +996 0.492996 foo 2013-01-01 00:16:36 +997 -0.461625 foo 2013-01-01 00:16:37 +998 1.361779 foo 2013-01-01 00:16:38 +999 -1.197988 foo 2013-01-01 00:16:39 + +[1000 rows x 3 columns] +``` +使用显式压缩类型: + +```python +In [331]: df.to_pickle("data.pkl.compress", compression="gzip") + +In [332]: rt = pd.read_pickle("data.pkl.compress", compression="gzip") + +In [333]: rt +Out[333]: + A B C +0 -0.288267 foo 2013-01-01 00:00:00 +1 -0.084905 foo 2013-01-01 00:00:01 +2 0.004772 foo 2013-01-01 00:00:02 +3 1.382989 foo 2013-01-01 00:00:03 +4 0.343635 foo 2013-01-01 00:00:04 +.. ... ... ... +995 -0.220893 foo 2013-01-01 00:16:35 +996 0.492996 foo 2013-01-01 00:16:36 +997 -0.461625 foo 2013-01-01 00:16:37 +998 1.361779 foo 2013-01-01 00:16:38 +999 -1.197988 foo 2013-01-01 00:16:39 + +[1000 rows x 3 columns] +``` +从扩展名推断压缩类型: + +```python +In [334]: df.to_pickle("data.pkl.xz", compression="infer") + +In [335]: rt = pd.read_pickle("data.pkl.xz", compression="infer") + +In [336]: rt +Out[336]: + A B C +0 -0.288267 foo 2013-01-01 00:00:00 +1 -0.084905 foo 2013-01-01 00:00:01 +2 0.004772 foo 2013-01-01 00:00:02 +3 1.382989 foo 2013-01-01 00:00:03 +4 0.343635 foo 2013-01-01 00:00:04 +.. ... ... ... +995 -0.220893 foo 2013-01-01 00:16:35 +996 0.492996 foo 2013-01-01 00:16:36 +997 -0.461625 foo 2013-01-01 00:16:37 +998 1.361779 foo 2013-01-01 00:16:38 +999 -1.197988 foo 2013-01-01 00:16:39 + +[1000 rows x 3 columns] +``` +默认是使用“推断”: + +```python +In [337]: df.to_pickle("data.pkl.gz") + +In [338]: rt = pd.read_pickle("data.pkl.gz") + +In [339]: rt +Out[339]: + A B C +0 -0.288267 foo 2013-01-01 00:00:00 +1 -0.084905 foo 2013-01-01 00:00:01 +2 0.004772 foo 2013-01-01 00:00:02 +3 1.382989 foo 2013-01-01 00:00:03 +4 0.343635 foo 2013-01-01 00:00:04 +.. ... ... ... +995 -0.220893 foo 2013-01-01 00:16:35 +996 0.492996 foo 2013-01-01 00:16:36 +997 -0.461625 foo 2013-01-01 00:16:37 +998 1.361779 foo 2013-01-01 00:16:38 +999 -1.197988 foo 2013-01-01 00:16:39 + +[1000 rows x 3 columns] + +In [340]: df["A"].to_pickle("s1.pkl.bz2") + +In [341]: rt = pd.read_pickle("s1.pkl.bz2") + +In [342]: rt +Out[342]: +0 -0.288267 +1 -0.084905 +2 0.004772 +3 1.382989 +4 0.343635 + ... +995 -0.220893 +996 0.492996 +997 -0.461625 +998 1.361779 +999 -1.197988 +Name: A, Length: 1000, dtype: float64 +``` +## msgpack(一种二进制格式) + +pandas支持`msgpack`格式的对象序列化。他是一种轻量级可移植的二进制格式,同二进制的JSON类似,具有高效的空间利用率以及不错的写入(序列化)和读取(反序列化)性能。 + +::: danger 警告 + +从0.25版本开始,不推荐使用msgpack格式,并且之后的版本也将删除它。推荐使用pyarrow对pandas对象进行在线的转换。 + +::: + +::: danger 警告 + +[`read_msgpack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_msgpack.html#pandas.read_msgpack "`read_msgpack()`")仅在pandas的0.20.3版本及以下版本兼容。 + +::: + +```python +In [343]: df = pd.DataFrame(np.random.rand(5, 2), columns=list('AB')) + +In [344]: df.to_msgpack('foo.msg') + +In [345]: pd.read_msgpack('foo.msg') +Out[345]: + A B +0 0.275432 0.293583 +1 0.842639 0.165381 +2 0.608925 0.778891 +3 0.136543 0.029703 +4 0.318083 0.604870 + +In [346]: s = pd.Series(np.random.rand(5), index=pd.date_range('20130101', periods=5)) +``` + +你可以传递一组对象列表并得到反序列化的结果。 + +```python +In [347]: pd.to_msgpack('foo.msg', df, 'foo', np.array([1, 2, 3]), s) + +In [348]: pd.read_msgpack('foo.msg') +Out[348]: +[ A B + 0 0.275432 0.293583 + 1 0.842639 0.165381 + 2 0.608925 0.778891 + 3 0.136543 0.029703 + 4 0.318083 0.604870, 'foo', array([1, 2, 3]), 2013-01-01 0.330824 + 2013-01-02 0.790825 + 2013-01-03 0.308468 + 2013-01-04 0.092397 + 2013-01-05 0.703091 + Freq: D, dtype: float64] +``` +你能传递`iterator=True`参数来迭代解压后的结果: + +```python +In [349]: for o in pd.read_msgpack('foo.msg', iterator=True): + .....: print(o) + .....: + A B +0 0.275432 0.293583 +1 0.842639 0.165381 +2 0.608925 0.778891 +3 0.136543 0.029703 +4 0.318083 0.604870 +foo +[1 2 3] +2013-01-01 0.330824 +2013-01-02 0.790825 +2013-01-03 0.308468 +2013-01-04 0.092397 +2013-01-05 0.703091 +Freq: D, dtype: float64 +``` +你也能传递`append=True`参数,给现有的包添加写入: + +```python +In [350]: df.to_msgpack('foo.msg', append=True) + +In [351]: pd.read_msgpack('foo.msg') +Out[351]: +[ A B + 0 0.275432 0.293583 + 1 0.842639 0.165381 + 2 0.608925 0.778891 + 3 0.136543 0.029703 + 4 0.318083 0.604870, 'foo', array([1, 2, 3]), 2013-01-01 0.330824 + 2013-01-02 0.790825 + 2013-01-03 0.308468 + 2013-01-04 0.092397 + 2013-01-05 0.703091 + Freq: D, dtype: float64, A B + 0 0.275432 0.293583 + 1 0.842639 0.165381 + 2 0.608925 0.778891 + 3 0.136543 0.029703 + 4 0.318083 0.604870] +``` +不像其他io方法,`to_msgpack`既可以基于每个对象使用`df.to_msgpack()`方法,也可以在混合pandas对象的时候使用顶层`pd.to_msgpack(...)`方法,该方法可以让你打包任意的Python列表、字典、标量的集合。 + +```python +In [352]: pd.to_msgpack('foo2.msg', {'dict': [{'df': df}, {'string': 'foo'}, + .....: {'scalar': 1.}, {'s': s}]}) + .....: + +In [353]: pd.read_msgpack('foo2.msg') +Out[353]: +{'dict': ({'df': A B + 0 0.275432 0.293583 + 1 0.842639 0.165381 + 2 0.608925 0.778891 + 3 0.136543 0.029703 + 4 0.318083 0.604870}, + {'string': 'foo'}, + {'scalar': 1.0}, + {'s': 2013-01-01 0.330824 + 2013-01-02 0.790825 + 2013-01-03 0.308468 + 2013-01-04 0.092397 + 2013-01-05 0.703091 + Freq: D, dtype: float64})} +``` + +### 读/写API + +Msgpacks也能读写字符串。 + +```python +In [354]: df.to_msgpack() +Out[354]: b'\x84\xa3typ\xadblock_manager\xa5klass\xa9DataFrame\xa4axes\x92\x86\xa3typ\xa5index\xa5klass\xa5Index\xa4name\xc0\xa5dtype\xa6object\xa4data\x92\xa1A\xa1B\xa8compress\xc0\x86\xa3typ\xabrange_index\xa5klass\xaaRangeIndex\xa4name\xc0\xa5start\x00\xa4stop\x05\xa4step\x01\xa6blocks\x91\x86\xa4locs\x86\xa3typ\xa7ndarray\xa5shape\x91\x02\xa4ndim\x01\xa5dtype\xa5int64\xa4data\xd8\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\xa8compress\xc0\xa6values\xc7P\x00\xc84 \x84\xac\xa0\xd1?\x0f\xa4.\xb5\xe6\xf6\xea?\xb9\x85\x9aLO|\xe3?\xac\xf0\xd7\x81>z\xc1?\\\xca\x97\ty[\xd4?\x9c\x9b\x8a:\x11\xca\xd2?\x14zX\xd01+\xc5?4=\x19b\xad\xec\xe8?\xc0!\xe9\xf4\x8ej\x9e?\xa7>_\xac\x17[\xe3?\xa5shape\x92\x02\x05\xa5dtype\xa7float64\xa5klass\xaaFloatBlock\xa8compress\xc0' +``` +此外你可以连接字符串生成一个原始的对象列表。 + +```python +In [355]: pd.read_msgpack(df.to_msgpack() + s.to_msgpack()) +Out[355]: +[ A B + 0 0.275432 0.293583 + 1 0.842639 0.165381 + 2 0.608925 0.778891 + 3 0.136543 0.029703 + 4 0.318083 0.604870, 2013-01-01 0.330824 + 2013-01-02 0.790825 + 2013-01-03 0.308468 + 2013-01-04 0.092397 + 2013-01-05 0.703091 + Freq: D, dtype: float64] +``` +## HDF5(PyTables) (一种以.h5结尾的分层数据格式) + +`HDFStore`是一个能读写pandas的类似字典的对象,它能使用高性能的HDF5格式,该格式是用优秀的[PyTables](https://www.pytables.org/ "PyTables")库写的。一些更高级的用法参考[cookbook](https://www.pypandas.cn/docs/user_guide/cookbook.html#cookbook-hdf "cookbook")。 + +::: danger 警告 + +pandas要求使用的`PyTables`版本要 > = 3.0.0。当使用索引来检索存储的时候,`PyTables`< 3.2的版本会出现索引bug。如果返回一个结果的子集,那么你就需要升级`PyTables` 的版本 >= 3.2才行。先前创建的存储数据将会使用更新后的版本再次写入。 + +::: + +```python +In [356]: store = pd.HDFStore('store.h5') + +In [357]: print(store) + +File path: store.h5 +``` + +对象能够被写入文件就像成对的键-值添加到字典里面一样: + +```python +In [358]: index = pd.date_range('1/1/2000', periods=8) + +In [359]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) + +In [360]: df = pd.DataFrame(np.random.randn(8, 3), index=index, + .....: columns=['A', 'B', 'C']) + .....: + +# store.put('s', s) is an equivalent method +In [361]: store['s'] = s + +In [362]: store['df'] = df + +In [363]: store +Out[363]: + +File path: store.h5 +``` +在当前或者之后的Python会话中,你都能检索存储的对象: + +```python +# store.get('df') is an equivalent method +In [364]: store['df'] +Out[364]: + A B C +2000-01-01 -0.426936 -1.780784 0.322691 +2000-01-02 1.638174 -2.184251 0.049673 +2000-01-03 -1.022803 0.889445 2.827717 +2000-01-04 1.767446 -1.305266 -0.378355 +2000-01-05 0.486743 0.954551 0.859671 +2000-01-06 -1.170458 -1.211386 -0.852728 +2000-01-07 -0.450781 1.064650 1.014927 +2000-01-08 -0.810399 0.254343 -0.875526 + +# dotted (attribute) access provides get as well +In [365]: store.df +Out[365]: + A B C +2000-01-01 -0.426936 -1.780784 0.322691 +2000-01-02 1.638174 -2.184251 0.049673 +2000-01-03 -1.022803 0.889445 2.827717 +2000-01-04 1.767446 -1.305266 -0.378355 +2000-01-05 0.486743 0.954551 0.859671 +2000-01-06 -1.170458 -1.211386 -0.852728 +2000-01-07 -0.450781 1.064650 1.014927 +2000-01-08 -0.810399 0.254343 -0.875526 +``` + +使用键删除指定的对象: + +```python +# store.remove('df') is an equivalent method +In [366]: del store['df'] + +In [367]: store +Out[367]: + +File path: store.h5 +``` + +关闭存储对象并使用环境管理器: + +```python +In [368]: store.close() + +In [369]: store +Out[369]: + +File path: store.h5 + +In [370]: store.is_open +Out[370]: False + +# Working with, and automatically closing the store using a context manager +In [371]: with pd.HDFStore('store.h5') as store: + .....: store.keys() + .....: +``` + +### 读/写 API + +`HDFStore `支持顶层的API,用`read_hdf`来读取,和使用`to_hdf`来写入,类似于`read_csv` 和`to_csv`的用法。 + +```python +In [372]: df_tl = pd.DataFrame({'A': list(range(5)), 'B': list(range(5))}) + +In [373]: df_tl.to_hdf('store_tl.h5', 'table', append=True) + +In [374]: pd.read_hdf('store_tl.h5', 'table', where=['index>2']) +Out[374]: + A B +3 3 3 +4 4 4 +``` +HDFStore默认不会删除全是缺失值的行,但是通过设置`dropna=True`参数就能改变。 + +```python +In [375]: df_with_missing = pd.DataFrame({'col1': [0, np.nan, 2], + .....: 'col2': [1, np.nan, np.nan]}) + .....: + +In [376]: df_with_missing +Out[376]: + col1 col2 +0 0.0 1.0 +1 NaN NaN +2 2.0 NaN + +In [377]: df_with_missing.to_hdf('file.h5', 'df_with_missing', + .....: format='table', mode='w') + .....: + +In [378]: pd.read_hdf('file.h5', 'df_with_missing') +Out[378]: + col1 col2 +0 0.0 1.0 +1 NaN NaN +2 2.0 NaN + +In [379]: df_with_missing.to_hdf('file.h5', 'df_with_missing', + .....: format='table', mode='w', dropna=True) + .....: + +In [380]: pd.read_hdf('file.h5', 'df_with_missing') +Out[380]: + col1 col2 +0 0.0 1.0 +2 2.0 NaN +``` +### 固定格式 + +上面的例子表明了使用`put`进行存储的情况,该存储将`HDF5`以固定数组格式写入`PyTables`,这就是所谓的`fixed`格式。这些类型的存储一旦被写入后将**不能**再添加数据(虽然你能轻易地删除它们并再次写入),**也不能**查询;必须全部检索它们。它们也不支持没有唯一列名的数据表。`fixed`格式提供了非常快速的写入功能,并且比`table`存储在读取方面更快捷。默认的指定格式是使用`put` 或者`to_hdf` 亦或通过` format='fixed'`或` format='f'`格式。 + +::: danger 警告 + +如果你尝试使用`where`来检索,`fixed`格式将会报错` TypeError`: + +```python +>>> pd.DataFrame(np.random.randn(10, 2)).to_hdf('test_fixed.h5', 'df') +>>> pd.read_hdf('test_fixed.h5', 'df', where='index>5') +TypeError: cannot pass a where specification when reading a fixed format. + this store must be selected in its entirety +``` +::: + +### 表格格式 + +`HDFStore `支持在磁盘上使用另一种`PyTables`格式,即`table`格式。从概念上来讲,`table`在外形上同具有行和列的DataFrame极度相似。`table`也能被添加到同样的或其他的会话中。此外,删除和查询操作也是支持的。通过指定格式为`format='table'`或`format='t'`到`append`方法或`put`或者`to_hdf`。 + +`put/append/to_hdf`方法中使用的格式也可以设置为可选`pd.set_option('io.hdf.default_format','table')`,以默认的`table`格式存储。 + +```python +In [381]: store = pd.HDFStore('store.h5') + +In [382]: df1 = df[0:4] + +In [383]: df2 = df[4:] + +# append data (creates a table automatically) +In [384]: store.append('df', df1) + +In [385]: store.append('df', df2) + +In [386]: store +Out[386]: + +File path: store.h5 + +# select the entire object +In [387]: store.select('df') +Out[387]: + A B C +2000-01-01 -0.426936 -1.780784 0.322691 +2000-01-02 1.638174 -2.184251 0.049673 +2000-01-03 -1.022803 0.889445 2.827717 +2000-01-04 1.767446 -1.305266 -0.378355 +2000-01-05 0.486743 0.954551 0.859671 +2000-01-06 -1.170458 -1.211386 -0.852728 +2000-01-07 -0.450781 1.064650 1.014927 +2000-01-08 -0.810399 0.254343 -0.875526 + +# the type of stored data +In [388]: store.root.df._v_attrs.pandas_type +Out[388]: 'frame_table' +``` +::: tip 注意 + +你也可以通过创建`table`来传递`format='table'`或者` format='t`到`put`操作。 + +::: + +### 分层键 + +存储的键能够指定为字符串,这些分层的路径名就像这样的格式(例如:`foo/bar/bah`)。它将生成子存储的层次结构(或者在PyTables中叫做`Groups` )。键可以不带前面的'/'指定而且**总是**单独的(例如:'foo' 指的就是'/foo')。删除操作能够删除所有子存储及**之后**的数据,所以要小心该操作。 + +```python +In [389]: store.put('foo/bar/bah', df) + +In [390]: store.append('food/orange', df) + +In [391]: store.append('food/apple', df) + +In [392]: store +Out[392]: + +File path: store.h5 + +# a list of keys are returned +In [393]: store.keys() +Out[393]: ['/df', '/food/apple', '/food/orange', '/foo/bar/bah'] + +# remove all nodes under this level +In [394]: store.remove('food') + +In [395]: store +Out[395]: + +File path: store.h5 +``` + +你能遍历组层次结构使用`walk`方法,该方法将为每个组键及其内容的相对键生成一个元组。 + +*New in version 0.24.0.* + +```python +In [396]: for (path, subgroups, subkeys) in store.walk(): + .....: for subgroup in subgroups: + .....: print('GROUP: {}/{}'.format(path, subgroup)) + .....: for subkey in subkeys: + .....: key = '/'.join([path, subkey]) + .....: print('KEY: {}'.format(key)) + .....: print(store.get(key)) + .....: +GROUP: /foo +KEY: /df + A B C +2000-01-01 -0.426936 -1.780784 0.322691 +2000-01-02 1.638174 -2.184251 0.049673 +2000-01-03 -1.022803 0.889445 2.827717 +2000-01-04 1.767446 -1.305266 -0.378355 +2000-01-05 0.486743 0.954551 0.859671 +2000-01-06 -1.170458 -1.211386 -0.852728 +2000-01-07 -0.450781 1.064650 1.014927 +2000-01-08 -0.810399 0.254343 -0.875526 +GROUP: /foo/bar +KEY: /foo/bar/bah + A B C +2000-01-01 -0.426936 -1.780784 0.322691 +2000-01-02 1.638174 -2.184251 0.049673 +2000-01-03 -1.022803 0.889445 2.827717 +2000-01-04 1.767446 -1.305266 -0.378355 +2000-01-05 0.486743 0.954551 0.859671 +2000-01-06 -1.170458 -1.211386 -0.852728 +2000-01-07 -0.450781 1.064650 1.014927 +2000-01-08 -0.810399 0.254343 -0.875526 +``` +::: danger 警告 + +分层键对于存储在根节点下的项目,无法使用如上的方法将其作为点(属性)进行检索。 + +```python +In [8]: store.foo.bar.bah +AttributeError: 'HDFStore' object has no attribute 'foo' + +# you can directly access the actual PyTables node but using the root node +In [9]: store.root.foo.bar.bah +Out[9]: +/foo/bar/bah (Group) '' + children := ['block0_items' (Array), 'block0_values' (Array), 'axis0' (Array), 'axis1' (Array)] +``` + +相反,使用基于显式字符串的键: + +```python +In [397]: store['foo/bar/bah'] +Out[397]: + A B C +2000-01-01 -0.426936 -1.780784 0.322691 +2000-01-02 1.638174 -2.184251 0.049673 +2000-01-03 -1.022803 0.889445 2.827717 +2000-01-04 1.767446 -1.305266 -0.378355 +2000-01-05 0.486743 0.954551 0.859671 +2000-01-06 -1.170458 -1.211386 -0.852728 +2000-01-07 -0.450781 1.064650 1.014927 +2000-01-08 -0.810399 0.254343 -0.875526 +``` +::: + +### 存储类型 + +#### 在表格中存储混合类型 + +支持混合数据类型存储。字符串使用添加列的最大尺寸以固定宽度进行存储。后面尝试添加更长的字符串将会报错``ValueError``。 + +添加参数``min_itemsize={`values`: size}``将给字符串列设置一个更大的最小值。目前支持的存储类型有 ``floats,strings, ints, bools, datetime64`` 。对于字符串列,添加参数 ``nan_rep = 'nan'``将改变磁盘上默认的nan值(转变为*np.nan*),原本默认是*nan*。 + +``` python +In [398]: df_mixed = pd.DataFrame({'A': np.random.randn(8), + .....: 'B': np.random.randn(8), + .....: 'C': np.array(np.random.randn(8), dtype='float32'), + .....: 'string': 'string', + .....: 'int': 1, + .....: 'bool': True, + .....: 'datetime64': pd.Timestamp('20010102')}, + .....: index=list(range(8))) + .....: + +In [399]: df_mixed.loc[df_mixed.index[3:5], + .....: ['A', 'B', 'string', 'datetime64']] = np.nan + .....: + +In [400]: store.append('df_mixed', df_mixed, min_itemsize={'values': 50}) + +In [401]: df_mixed1 = store.select('df_mixed') + +In [402]: df_mixed1 +Out[402]: + A B C string int bool datetime64 +0 -0.980856 0.298656 0.151508 string 1 True 2001-01-02 +1 -0.906920 -1.294022 0.587939 string 1 True 2001-01-02 +2 0.988185 -0.618845 0.043096 string 1 True 2001-01-02 +3 NaN NaN 0.362451 NaN 1 True NaT +4 NaN NaN 1.356269 NaN 1 True NaT +5 -0.772889 -0.340872 1.798994 string 1 True 2001-01-02 +6 -0.043509 -0.303900 0.567265 string 1 True 2001-01-02 +7 0.768606 -0.871948 -0.044348 string 1 True 2001-01-02 + +In [403]: df_mixed1.dtypes.value_counts() +Out[403]: +float64 2 +float32 1 +datetime64[ns] 1 +int64 1 +bool 1 +object 1 +dtype: int64 + +# we have provided a minimum string column size +In [404]: store.root.df_mixed.table +Out[404]: +/df_mixed/table (Table(8,)) '' + description := { + "index": Int64Col(shape=(), dflt=0, pos=0), + "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1), + "values_block_1": Float32Col(shape=(1,), dflt=0.0, pos=2), + "values_block_2": Int64Col(shape=(1,), dflt=0, pos=3), + "values_block_3": Int64Col(shape=(1,), dflt=0, pos=4), + "values_block_4": BoolCol(shape=(1,), dflt=False, pos=5), + "values_block_5": StringCol(itemsize=50, shape=(1,), dflt=b'', pos=6)} + byteorder := 'little' + chunkshape := (689,) + autoindex := True + colindexes := { + "index": Index(6, medium, shuffle, zlib(1)).is_csi=False} + +``` + +#### 存储多层索引数据表 + +存储多层索引``DataFrames``为表格与从同类索引 ``DataFrames``中存储/选取是非常类似的。 + +``` python +In [405]: index = pd.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'], + .....: ['one', 'two', 'three']], + .....: codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], + .....: [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]], + .....: names=['foo', 'bar']) + .....: + +In [406]: df_mi = pd.DataFrame(np.random.randn(10, 3), index=index, + .....: columns=['A', 'B', 'C']) + .....: + +In [407]: df_mi +Out[407]: + A B C +foo bar +foo one 0.031885 0.641045 0.479460 + two -0.630652 -0.182400 -0.789979 + three -0.282700 -0.813404 1.252998 +bar one 0.758552 0.384775 -1.133177 + two -1.002973 -1.644393 -0.311536 +baz two -0.615506 -0.084551 -1.318575 + three 0.923929 -0.105981 0.429424 +qux one -1.034590 0.542245 -0.384429 + two 0.170697 -0.200289 1.220322 + three -1.001273 0.162172 0.376816 + +In [408]: store.append('df_mi', df_mi) + +In [409]: store.select('df_mi') +Out[409]: + A B C +foo bar +foo one 0.031885 0.641045 0.479460 + two -0.630652 -0.182400 -0.789979 + three -0.282700 -0.813404 1.252998 +bar one 0.758552 0.384775 -1.133177 + two -1.002973 -1.644393 -0.311536 +baz two -0.615506 -0.084551 -1.318575 + three 0.923929 -0.105981 0.429424 +qux one -1.034590 0.542245 -0.384429 + two 0.170697 -0.200289 1.220322 + three -1.001273 0.162172 0.376816 + +# the levels are automatically included as data columns +In [410]: store.select('df_mi', 'foo=bar') +Out[410]: + A B C +foo bar +bar one 0.758552 0.384775 -1.133177 + two -1.002973 -1.644393 -0.311536 + +``` + +### 查询 + +#### 查询表格 + +``select`` 和 ``delete`` 操作有一个可选项即能指定选择/删除仅有数据的子集。 这允许用户拥有一个很大的磁盘表并仅检索一部分数据。 + +在底层里使用``Term`` 类指定查询为布尔表达式。 + +- 支持的 ``DataFrames``索引器有 ``index`` 和 ``columns`` . +- 如果指定为``data_columns``,这些将作为额外的索引器。 + +有效的比较运算符有: + +``=, ==, !=, >, >=, <, <=`` + +有效的布尔表达式包含如下几种: + +- ``|`` : 选择 +- ``&`` : 并列 +- ``(`` 和 ``)`` : 用来分组 + +这些规则同在pandas的索引中使用布尔表达式是类似的。 + +::: tip 注意 + +- ``=`` 将自动扩展为比较运算符 ``==`` +- ``~`` 不是运算符,且只在有限的条件下使用 +- 如果传递的表达式时列表/元组,他们将通过 ``&``符号合并 + +::: + +以下都是有效的表达式: +- ``'index >= date'`` +- ``"columns = ['A', 'D']"`` +- ``"columns in ['A', 'D']"`` +- ``'columns = A'`` +- ``'columns == A'`` +- ``"~(columns = ['A', 'B'])"`` +- ``'index > df.index[3] & string = "bar"'`` +- ``'(index > df.index[3] & index <= df.index[6]) | string = "bar"'`` +- ``"ts >= Timestamp('2012-02-01')"`` +- ``"major_axis>=20130101"`` + +`indexers`在子表达式的左边的有: +`columns`, `major_axis`, `ts` + +(在比较运算符后面)子表达式可以是: +- 能被求值的函数,比如:``Timestamp('2012-02-01')`` +- 字符串,比如: ``"bar"`` +- 类似日期,比如: ``20130101``或者 ``"20130101"`` +- 列表,比如: ``"['A', 'B']"`` +- 以本地命名空间定义的变量,比如:``date`` + +::: tip 注意 + +在查询表达式中插入字符串进行查询是不推荐的。如果将带有%的字符串分配给变量,然后在表达式中使用该变量。那么,这样做 + +``` python +string = "HolyMoly'" +store.select('df', 'index == string') + +``` +来代替下面这样 + +``` python +string = "HolyMoly'" +store.select('df', 'index == %s' % string) + +``` + +因为后者将 **不会** 起作用并引起 ``SyntaxError``。注意 ``string``变量的双引号里面有一个单引号。 + + +如果你一定要插入,使用说明符格式 ``'%r'`` + +``` python +store.select('df', 'index == %r' % string) + +``` + +它将会引用变量 ``string``. + +::: + +这儿有一些例子: + +``` python +In [411]: dfq = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'), + .....: index=pd.date_range('20130101', periods=10)) + .....: + +In [412]: store.append('dfq', dfq, format='table', data_columns=True) + +``` +使用布尔表达式同内联求值函数。 + +``` python +In [413]: store.select('dfq', "index>pd.Timestamp('20130104') & columns=['A', 'B']") +Out[413]: + A B +2013-01-05 0.450263 0.755221 +2013-01-06 0.019915 0.300003 +2013-01-07 1.878479 -0.026513 +2013-01-08 3.272320 0.077044 +2013-01-09 -0.398346 0.507286 +2013-01-10 0.516017 -0.501550 + +``` + +内联列引用 + +``` python +In [414]: store.select('dfq', where="A>0 or C>0") +Out[414]: + A B C D +2013-01-01 -0.161614 -1.636805 0.835417 0.864817 +2013-01-02 0.843452 -0.122918 -0.026122 -1.507533 +2013-01-03 0.335303 -1.340566 -1.024989 1.125351 +2013-01-05 0.450263 0.755221 -1.506656 0.808794 +2013-01-06 0.019915 0.300003 -0.727093 -1.119363 +2013-01-07 1.878479 -0.026513 0.573793 0.154237 +2013-01-08 3.272320 0.077044 0.397034 -0.613983 +2013-01-10 0.516017 -0.501550 0.138212 0.218366 + +``` +关键词``columns`` 能用来筛选列字段并返回为列表,这等价于传递``'columns=list_of_columns_to_filter'``: + +``` python +In [415]: store.select('df', "columns=['A', 'B']") +Out[415]: + A B +2000-01-01 -0.426936 -1.780784 +2000-01-02 1.638174 -2.184251 +2000-01-03 -1.022803 0.889445 +2000-01-04 1.767446 -1.305266 +2000-01-05 0.486743 0.954551 +2000-01-06 -1.170458 -1.211386 +2000-01-07 -0.450781 1.064650 +2000-01-08 -0.810399 0.254343 + +``` + +``start`` and ``stop`` 参数能指定总的搜索范围。这些是根据表中的总行数得出来的。 + +::: tip 注意 + +如果查询表达式有未知的引用变量,那么``select`` 将会报错 ``ValueError`` 。通常这就意味着你正在尝试选取的一列并**不在**当前数据列中。 + +如果查询表达式无效,那么``select``将会报错``SyntaxError`` 。 + +::: + +#### timedelta64[ns]的用法 + +你能使用``timedelta64[ns]``进行存储和查询。使用``()``来指定查询的条目,浮点数可以带符号(和小数),timedelta的单位可以是``D,s,ms,us,ns``。看示例: + +```python +In [416]: from datetime import timedelta + +In [417]: dftd = pd.DataFrame({'A': pd.Timestamp('20130101'), + .....: 'B': [pd.Timestamp('20130101') + timedelta(days=i, + .....: seconds=10) + .....: for i in range(10)]}) + .....: + +In [418]: dftd['C'] = dftd['A'] - dftd['B'] + +In [419]: dftd +Out[419]: + A B C +0 2013-01-01 2013-01-01 00:00:10 -1 days +23:59:50 +1 2013-01-01 2013-01-02 00:00:10 -2 days +23:59:50 +2 2013-01-01 2013-01-03 00:00:10 -3 days +23:59:50 +3 2013-01-01 2013-01-04 00:00:10 -4 days +23:59:50 +4 2013-01-01 2013-01-05 00:00:10 -5 days +23:59:50 +5 2013-01-01 2013-01-06 00:00:10 -6 days +23:59:50 +6 2013-01-01 2013-01-07 00:00:10 -7 days +23:59:50 +7 2013-01-01 2013-01-08 00:00:10 -8 days +23:59:50 +8 2013-01-01 2013-01-09 00:00:10 -9 days +23:59:50 +9 2013-01-01 2013-01-10 00:00:10 -10 days +23:59:50 + +In [420]: store.append('dftd', dftd, data_columns=True) + +In [421]: store.select('dftd', "C<'-3.5D'") +Out[421]: + A B C +4 2013-01-01 2013-01-05 00:00:10 -5 days +23:59:50 +5 2013-01-01 2013-01-06 00:00:10 -6 days +23:59:50 +6 2013-01-01 2013-01-07 00:00:10 -7 days +23:59:50 +7 2013-01-01 2013-01-08 00:00:10 -8 days +23:59:50 +8 2013-01-01 2013-01-09 00:00:10 -9 days +23:59:50 +9 2013-01-01 2013-01-10 00:00:10 -10 days +23:59:50 +``` + +#### 索引 + +你能在表格中已经有数据的情况下(在``append/put``操作之后)使用``create_table_index``创建/修改表格的索引。给表格创建索引是**强**推荐的操作。当你使用带有索引的``select``当作``where``查询条件的时候,这将极大的加快你的查询速度。 + +::: tip 注意 + +索引会自动创建在可索引对象和任意你指定的数据列。你可以传递``index=False`` 到``append``来关闭这个操作。 + +::: + +```python +# we have automagically already created an index (in the first section) +In [422]: i = store.root.df.table.cols.index.index + +In [423]: i.optlevel, i.kind +Out[423]: (6, 'medium') + +# change an index by passing new parameters +In [424]: store.create_table_index('df', optlevel=9, kind='full') + +In [425]: i = store.root.df.table.cols.index.index + +In [426]: i.optlevel, i.kind +Out[426]: (9, 'full') +``` + +通常当有大量数据添加保存的时候,关闭添加列的索引创建,等结束后再创建是非常有效的。 + +```python +In [427]: df_1 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) + +In [428]: df_2 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) + +In [429]: st = pd.HDFStore('appends.h5', mode='w') + +In [430]: st.append('df', df_1, data_columns=['B'], index=False) + +In [431]: st.append('df', df_2, data_columns=['B'], index=False) + +In [432]: st.get_storer('df').table +Out[432]: +/df/table (Table(20,)) '' + description := { + "index": Int64Col(shape=(), dflt=0, pos=0), + "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1), + "B": Float64Col(shape=(), dflt=0.0, pos=2)} + byteorder := 'little' + chunkshape := (2730,) +``` + +当完成添加后再创建索引。 + +```python +In [433]: st.create_table_index('df', columns=['B'], optlevel=9, kind='full') + +In [434]: st.get_storer('df').table +Out[434]: +/df/table (Table(20,)) '' + description := { + "index": Int64Col(shape=(), dflt=0, pos=0), + "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1), + "B": Float64Col(shape=(), dflt=0.0, pos=2)} + byteorder := 'little' + chunkshape := (2730,) + autoindex := True + colindexes := { + "B": Index(9, full, shuffle, zlib(1)).is_csi=True} + +In [435]: st.close() +``` +看[这里](https://stackoverflow.com/questions/17893370/ptrepack-sortby-needs-full-index "这里")关于如何在现存的表格中创建完全分类索引(CSI)。 + +#### 通过数据列查询 + +你可以指定(并建立索引)某些你希望能够执行查询的列(除了可索引的列,你始终可以查询这些列)。例如,假设你要在磁盘上执行此常规操作,仅返回与该查询匹配的帧。你可以指定``data_columns = True``来强制所有列为``data_columns``。 + +```python +In [436]: df_dc = df.copy() + +In [437]: df_dc['string'] = 'foo' + +In [438]: df_dc.loc[df_dc.index[4:6], 'string'] = np.nan + +In [439]: df_dc.loc[df_dc.index[7:9], 'string'] = 'bar' + +In [440]: df_dc['string2'] = 'cool' + +In [441]: df_dc.loc[df_dc.index[1:3], ['B', 'C']] = 1.0 + +In [442]: df_dc +Out[442]: + A B C string string2 +2000-01-01 -0.426936 -1.780784 0.322691 foo cool +2000-01-02 1.638174 1.000000 1.000000 foo cool +2000-01-03 -1.022803 1.000000 1.000000 foo cool +2000-01-04 1.767446 -1.305266 -0.378355 foo cool +2000-01-05 0.486743 0.954551 0.859671 NaN cool +2000-01-06 -1.170458 -1.211386 -0.852728 NaN cool +2000-01-07 -0.450781 1.064650 1.014927 foo cool +2000-01-08 -0.810399 0.254343 -0.875526 bar cool + +# on-disk operations +In [443]: store.append('df_dc', df_dc, data_columns=['B', 'C', 'string', 'string2']) + +In [444]: store.select('df_dc', where='B > 0') +Out[444]: + A B C string string2 +2000-01-02 1.638174 1.000000 1.000000 foo cool +2000-01-03 -1.022803 1.000000 1.000000 foo cool +2000-01-05 0.486743 0.954551 0.859671 NaN cool +2000-01-07 -0.450781 1.064650 1.014927 foo cool +2000-01-08 -0.810399 0.254343 -0.875526 bar cool + +# getting creative +In [445]: store.select('df_dc', 'B > 0 & C > 0 & string == foo') +Out[445]: + A B C string string2 +2000-01-02 1.638174 1.00000 1.000000 foo cool +2000-01-03 -1.022803 1.00000 1.000000 foo cool +2000-01-07 -0.450781 1.06465 1.014927 foo cool + +# this is in-memory version of this type of selection +In [446]: df_dc[(df_dc.B > 0) & (df_dc.C > 0) & (df_dc.string == 'foo')] +Out[446]: + A B C string string2 +2000-01-02 1.638174 1.00000 1.000000 foo cool +2000-01-03 -1.022803 1.00000 1.000000 foo cool +2000-01-07 -0.450781 1.06465 1.014927 foo cool + +# we have automagically created this index and the B/C/string/string2 +# columns are stored separately as ``PyTables`` columns +In [447]: store.root.df_dc.table +Out[447]: +/df_dc/table (Table(8,)) '' + description := { + "index": Int64Col(shape=(), dflt=0, pos=0), + "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1), + "B": Float64Col(shape=(), dflt=0.0, pos=2), + "C": Float64Col(shape=(), dflt=0.0, pos=3), + "string": StringCol(itemsize=3, shape=(), dflt=b'', pos=4), + "string2": StringCol(itemsize=4, shape=(), dflt=b'', pos=5)} + byteorder := 'little' + chunkshape := (1680,) + autoindex := True + colindexes := { + "index": Index(6, medium, shuffle, zlib(1)).is_csi=False, + "B": Index(6, medium, shuffle, zlib(1)).is_csi=False, + "C": Index(6, medium, shuffle, zlib(1)).is_csi=False, + "string": Index(6, medium, shuffle, zlib(1)).is_csi=False, + "string2": Index(6, medium, shuffle, zlib(1)).is_csi=False} +``` +把很多列变成*数据列*会存在性能下降的情况,因此它取决于用户。此外,在第一次添加/插入操作后你不能改变数据列(也不能索引)(当然,你能读取数据和创建一个新表!)。 + +#### 迭代器 + +你能传递``iterator=True``或者``chunksize=number_in_a_chunk``给``select`` 和``select_as_multiple``,然后在结果中返回一个迭代器。默认一个块返回50,000行。 + +```python +In [448]: for df in store.select('df', chunksize=3): + .....: print(df) + .....: + A B C +2000-01-01 -0.426936 -1.780784 0.322691 +2000-01-02 1.638174 -2.184251 0.049673 +2000-01-03 -1.022803 0.889445 2.827717 + A B C +2000-01-04 1.767446 -1.305266 -0.378355 +2000-01-05 0.486743 0.954551 0.859671 +2000-01-06 -1.170458 -1.211386 -0.852728 + A B C +2000-01-07 -0.450781 1.064650 1.014927 +2000-01-08 -0.810399 0.254343 -0.875526 +``` + +::: tip 注意 + +你也能使用``read_hdf`` 打开迭代器,然后迭代结束会自动关闭保存。 + +```python +for df in pd.read_hdf('store.h5', 'df', chunksize=3): + print(df) +``` + +::: + +注意,chunksize主要适用于**源**行。因此,如果你正在进行查询,chunksize将细分表中的总行和应用的查询,并在大小可能不相等的块上返回一个迭代器。 + +这是生成查询并使用它创建大小相等的返回块的方法。 + +```python +In [449]: dfeq = pd.DataFrame({'number': np.arange(1, 11)}) + +In [450]: dfeq +Out[450]: + number +0 1 +1 2 +2 3 +3 4 +4 5 +5 6 +6 7 +7 8 +8 9 +9 10 + +In [451]: store.append('dfeq', dfeq, data_columns=['number']) + +In [452]: def chunks(l, n): + .....: return [l[i:i + n] for i in range(0, len(l), n)] + .....: + +In [453]: evens = [2, 4, 6, 8, 10] + +In [454]: coordinates = store.select_as_coordinates('dfeq', 'number=evens') + +In [455]: for c in chunks(coordinates, 2): + .....: print(store.select('dfeq', where=c)) + .....: + number +1 2 +3 4 + number +5 6 +7 8 + number +9 10 +``` + +#### 高级查询 + +##### 选取单列 + +使用``select_column``方法可以找到单个可索引列或数据列。例如,这将让你非常快速地得到索引。它会返回一个由行号索引的``Series``结果。目前不接受``where``选择器。 + +```python +In [456]: store.select_column('df_dc', 'index') +Out[456]: +0 2000-01-01 +1 2000-01-02 +2 2000-01-03 +3 2000-01-04 +4 2000-01-05 +5 2000-01-06 +6 2000-01-07 +7 2000-01-08 +Name: index, dtype: datetime64[ns] + +In [457]: store.select_column('df_dc', 'string') +Out[457]: +0 foo +1 foo +2 foo +3 foo +4 NaN +5 NaN +6 foo +7 bar +Name: string, dtype: object +``` + +##### 选取坐标 + +有时候你想要得到查询的坐标(又叫做 索引的定位),使用``Int64Index``将返回结果的定位。这些坐标也可以传递给之后的``where``操作。 + +```python +In [458]: df_coord = pd.DataFrame(np.random.randn(1000, 2), + .....: index=pd.date_range('20000101', periods=1000)) + .....: + +In [459]: store.append('df_coord', df_coord) + +In [460]: c = store.select_as_coordinates('df_coord', 'index > 20020101') + +In [461]: c +Out[461]: +Int64Index([732, 733, 734, 735, 736, 737, 738, 739, 740, 741, + ... + 990, 991, 992, 993, 994, 995, 996, 997, 998, 999], + dtype='int64', length=268) + +In [462]: store.select('df_coord', where=c) +Out[462]: + 0 1 +2002-01-02 0.440865 -0.151651 +2002-01-03 -1.195089 0.285093 +2002-01-04 -0.925046 0.386081 +2002-01-05 -1.942756 0.277699 +2002-01-06 0.811776 0.528965 +... ... ... +2002-09-22 1.061729 0.618085 +2002-09-23 -0.209744 0.677197 +2002-09-24 -1.808184 0.185667 +2002-09-25 -0.208629 0.928603 +2002-09-26 1.579717 -1.259530 + +[268 rows x 2 columns] +``` + +##### 使用位置遮罩选取 + +有时你的查询可能涉及到创建要选择的行列表。通常这个``mask``将得到索引操作的``index``结果。下面这个例子显示了选取日期索引的月份等于5的操作。 + +```python +In [463]: df_mask = pd.DataFrame(np.random.randn(1000, 2), + .....: index=pd.date_range('20000101', periods=1000)) + .....: + +In [464]: store.append('df_mask', df_mask) + +In [465]: c = store.select_column('df_mask', 'index') + +In [466]: where = c[pd.DatetimeIndex(c).month == 5].index + +In [467]: store.select('df_mask', where=where) +Out[467]: + 0 1 +2000-05-01 -1.199892 1.073701 +2000-05-02 -1.058552 0.658487 +2000-05-03 -0.015418 0.452879 +2000-05-04 1.737818 0.426356 +2000-05-05 -0.711668 -0.021266 +... ... ... +2002-05-27 0.656196 0.993383 +2002-05-28 -0.035399 -0.269286 +2002-05-29 0.704503 2.574402 +2002-05-30 -1.301443 2.770770 +2002-05-31 -0.807599 0.420431 + +[93 rows x 2 columns] +``` + +##### 存储对象 + +如果你想检查存储对象,可以通过``get_storer``找到。你能使用这种编程方法获得一个对象的行数。 + +```python +In [468]: store.get_storer('df_dc').nrows +Out[468]: 8 +``` + +#### 多表查询 + +``append_to_multiple``和``select_as_multiple``方法能一次性执行多表的添加/选取操作。这个方法是让一个表(称为选择器表)索引大多数/所有列,并执行查询。其他表是带索引的数据表,它会匹配选择器表的索引。然后,你能在选择器表执行非常快速的查询并返回大量数据。这个方法类似于有个非常宽的表,但能高效的查询。 + +``append_to_multiple``方法根据``d``把单个DataFrame划分为多个表,这里的d指的是字典,即将表名映射到该表中所需的“列”列表。如果使用*None*代替列表,则该表将具有给定DataFrame的其余未指定列。``selector``参数定义了哪张表是选择器表(即你可以从中执行查询的)。``dropna``参数将删除输入``DataFrame ``的行来确保表格是同步的。这意味着如果其中一张表写入的一行全是``np.NaN``,那么将从所有表中删除这行。 + +如果``dropna``是False,则**用户负责同步表**。记住全是``np.Nan``的行是不会写入HDFStore,因此如果你选择``dropna=False``,一些表会比其他表具有更多行,而且``select_as_multiple ``将不会有作用或返回意外的结果。 + +```python +In [469]: df_mt = pd.DataFrame(np.random.randn(8, 6), + .....: index=pd.date_range('1/1/2000', periods=8), + .....: columns=['A', 'B', 'C', 'D', 'E', 'F']) + .....: + +In [470]: df_mt['foo'] = 'bar' + +In [471]: df_mt.loc[df_mt.index[1], ('A', 'B')] = np.nan + +# you can also create the tables individually +In [472]: store.append_to_multiple({'df1_mt': ['A', 'B'], 'df2_mt': None}, + .....: df_mt, selector='df1_mt') + .....: + +In [473]: store +Out[473]: + +File path: store.h5 + +# individual tables were created +In [474]: store.select('df1_mt') +Out[474]: + A B +2000-01-01 0.475158 0.427905 +2000-01-02 NaN NaN +2000-01-03 -0.201829 0.651656 +2000-01-04 -0.766427 -1.852010 +2000-01-05 1.642910 -0.055583 +2000-01-06 0.187880 1.536245 +2000-01-07 -1.801014 0.244721 +2000-01-08 3.055033 -0.683085 + +In [475]: store.select('df2_mt') +Out[475]: + C D E F foo +2000-01-01 1.846285 -0.044826 0.074867 0.156213 bar +2000-01-02 0.446978 -0.323516 0.311549 -0.661368 bar +2000-01-03 -2.657254 0.649636 1.520717 1.604905 bar +2000-01-04 -0.201100 -2.107934 -0.450691 -0.748581 bar +2000-01-05 0.543779 0.111444 0.616259 -0.679614 bar +2000-01-06 0.831475 -0.566063 1.130163 -1.004539 bar +2000-01-07 0.745984 1.532560 0.229376 0.526671 bar +2000-01-08 -0.922301 2.760888 0.515474 -0.129319 bar + +# as a multiple +In [476]: store.select_as_multiple(['df1_mt', 'df2_mt'], where=['A>0', 'B>0'], + .....: selector='df1_mt') + .....: +Out[476]: + A B C D E F foo +2000-01-01 0.475158 0.427905 1.846285 -0.044826 0.074867 0.156213 bar +2000-01-06 0.187880 1.536245 0.831475 -0.566063 1.130163 -1.004539 bar +``` + +### 从表中删除 + +你能够通过``where``指定有选择地从表中删除数据。删除行,重点理解``PyTables``删除行是通过先抹去行,接着**删除**后面的数据。因此,根据数据的方向来删除会是非常耗时的操作。所以为了获得最佳性能,首先让要删除的数据维度可索引是很有必要的。 + +数据根据(在磁盘上)可索引项来排序,这里有个简单的用例。你能存储面板数据(也叫时间序列-截面数据),在``major_axis``中存储日期,而``minor_axis``中存储ids。数据像下面这样交错: + +- date_1 + - id_1 + - id_2 + - . + - id_n +- date_2 + - id_1 + - . + - id_n + +应该清楚的是在 ``major_axis`` 上的删除操作将非常快,正如数据块被删除,接着后面的数据也会移动。另一方面,在``minor_axis`` 的操作将非常耗时。在这种情况下,几乎可以肯定使用``where``操作来选取所有除开缺失数据的列重写表格会更快。 + +::: danger 警告 + +请注意HDF5**不会自动回收空间**在h5文件中。于是重复地删除(或者移除节点)再添加操作**将会导致文件体积增大**。 + +要重新打包和清理文件,请使用[ptrepack](https://www.pypandas.cn/docs/user_guide/io.html#io-hdf5-ptrepack "ptrepack")。 + +::: + +### 注意事项 + +#### 压缩 + +``PyTables`` 允许存储地数据被压缩。这适用于所有类型地存储,不仅仅是表格。这两个参数 + ``complevel``和 ``complib``可用来控制压缩。 + +``complevel``指定数据会以何种方式压缩。 + +``complib`` 指定要使用的压缩库。如果没有指定,那将使用默认的 ``zlib``库。压缩库通常会从压缩率或速度两方面来优化,而结果取决于数据类型。选择哪种压缩类型取决于你的具体需求和数据。下面是支持的压缩库列表: + +- [zlib](https://zlib.net/): 默认的压缩库。经典的压缩方式,能获得好的压缩率但是速度有点慢。 +- [lzo](https://www.oberhumer.com/opensource/lzo/): 快速地压缩和解压。 +- [bzip2](http://bzip.org/): 不错的压缩率。 +- [blosc](http://www.blosc.org/): 快速地压缩和解压。 + +*New in version 0.20.2:* 支持另一种blosc压缩机: + +- [blosc:blosclz](http://www.blosc.org/) 这是默认地``blosc``压缩机 +- [blosc:lz4](https://fastcompression.blogspot.dk/p/lz4.html): +一款紧凑、快速且流行的压缩机。 +- [blosc:lz4hc](https://fastcompression.blogspot.dk/p/lz4.html): +调整后的LZ4版本可产生更好的压缩比,但会牺牲速度。 +- [blosc:snappy](https://google.github.io/snappy/): +一款在很多地方使用的流行压缩机。 +- [blosc:zlib](https://zlib.net/): 经典款;虽然比前一款速度慢,但是可实现更好的压缩比。 +- [blosc:zstd](https://facebook.github.io/zstd/): 极其平衡的编解码器;它是以上所有压缩机中提供最佳压缩比的,且速度相当快。 + +如果 ``complib``定义为其他的,不在上表中的库 ,那么就会出现 ``ValueError``。 + +::: tip 注意 + +如果你的平台上缺失指定的 ``complib`` 库,压缩机会使用默认 ``zlib``库。 + +::: + +文件中所有的对象都可以启用压缩: + +``` python +store_compressed = pd.HDFStore('store_compressed.h5', complevel=9, + complib='blosc:blosclz') + +``` +或者在未启用压缩的存储中进行即时压缩(这仅适用于表格): + +``` python +store.append('df', df, complib='zlib', complevel=5) + +``` + +#### ptrepack + +``PyTables``不是在一开始的时候开启压缩,而是在表被写入后再压缩,这提供了更好的写入性能。你能使用 ``PyTables`` 提供的实用程序``ptrepack``实现。此外,事实上在``ptrepack`` 之后会改变压缩等级。 + +``` +ptrepack --chunkshape=auto --propindexes --complevel=9 --complib=blosc in.h5 out.h5 + +``` + +另外, ``ptrepack in.h5 out.h5`` 将重新打包文件让你可以重用之前删除的空间。或者,它能简单的删除文件并再次写入亦或使用 ``copy`` 方法。 + +#### 注意事项 + +::: danger 警告 + +``HDFStore`` **不是一个安全的写入线程**. ``PyTables`` 的底层仅支持(通过线程或进程的)并发读取。如果你要同时读取和写入,那么你需要单个进程的单个线程里序列化这些操作,否则会破坏你的数据。更多信息参见([GH2397](https://github.com/pandas-dev/pandas/issues/2397))。 + +::: + +- 如果你用锁来管理多个进程间的写入, 那么你可能要在释放写锁之前使用[``fsync()``](https://docs.python.org/3/library/os.html#os.fsync) 。方便起见,你能用 ``store.flush(fsync=True)`` 操作。 +- 一旦 ``table``创建的列(DataFrame)固定了; 那只有相同的列才可以添加数据。 +- 注意时区 (例如, ``pytz.timezone('US/Eastern')``)在不同的时区版本间不相等。 +因此,如果使用时区库的一个版本将数据本地化到HDFStore中的特定时区,并且使用另一个版本更新该数据,则由于这些时区不相等,因此数据将转换为UTC。使用相同版本的时区库或在更新的时区定义中使用 ``tz_convert``。 + +::: danger 警告 + +如果列名没能用作属性选择器,那么``PyTables`` 将显示``NaturalNameWarning`` 。 +自然标识符仅包括字母、数字和下划线,且不能以数字开头。其他标识符不能用``where`` 从句,这通常不是个好主意。 + +::: + +### 数据类型 + +``HDFStore`` 将对象数据类型映射到 ``PyTables`` 的底层数据类型。这意味着以下的已知类型都有效: + +Type | Represents missing values +---|--- +floating : float64, float32, float16 | np.nan +integer : int64, int32, int8, uint64,uint32, uint8 | +boolean | +datetime64[ns] | NaT +timedelta64[ns] | NaT +categorical : see the section below | +object : strings | np.nan + +不支持``unicode`` 列,这会出现 **映射失败**. + +#### 数据类别 + +你可以写入含``category`` 类型的数据到 ``HDFStore``。如果它是对象数组,那查询方式是一样的。然而, 含``category``的数据会以更高效的方式存储。 + +``` python +In [477]: dfcat = pd.DataFrame({'A': pd.Series(list('aabbcdba')).astype('category'), + .....: 'B': np.random.randn(8)}) + .....: + +In [478]: dfcat +Out[478]: + A B +0 a 1.706605 +1 a 1.373485 +2 b -0.758424 +3 b -0.116984 +4 c -0.959461 +5 d -1.517439 +6 b -0.453150 +7 a -0.827739 + +In [479]: dfcat.dtypes +Out[479]: +A category +B float64 +dtype: object + +In [480]: cstore = pd.HDFStore('cats.h5', mode='w') + +In [481]: cstore.append('dfcat', dfcat, format='table', data_columns=['A']) + +In [482]: result = cstore.select('dfcat', where="A in ['b', 'c']") + +In [483]: result +Out[483]: + A B +2 b -0.758424 +3 b -0.116984 +4 c -0.959461 +6 b -0.453150 + +In [484]: result.dtypes +Out[484]: +A category +B float64 +dtype: object + +``` + +#### 字符串列 + +**min_itemsize** + +对于字符串列, ``HDFStore``的底层使用的固定列宽(列的大小)。字符串列大小的计算方式是: **在第一个添加的时候**,传递给 ``HDFStore``(该列)数据长度的最大值。 随后的添加可能会引入**更大**一列字符串,这超过了该列所能容纳的内容,这将引发异常(不然,你可以悄悄地截断这些列,让信息丢失)。之后将放松这一点,允许用户指定截断。 + +在第一个表创建的时候,传递 ``min_itemsize`` 将优先指定特定字符串列的最小长度。``min_itemsize``可以是整数或将列名映射为整数的字典。 你可以将``values``作为键传递,以允许所有可索引对象或data_columns具有此min_itemsize。 + +传递 ``min_itemsize``字典将导致所有可传递列自动创建 data_columns。 + +::: tip 注意 + +如果你没有传递任意 ``data_columns``,那么``min_itemsize``将会传递任意字符串的最大长度。 + +::: + +``` python +In [485]: dfs = pd.DataFrame({'A': 'foo', 'B': 'bar'}, index=list(range(5))) + +In [486]: dfs +Out[486]: + A B +0 foo bar +1 foo bar +2 foo bar +3 foo bar +4 foo bar + +# A and B have a size of 30 +In [487]: store.append('dfs', dfs, min_itemsize=30) + +In [488]: store.get_storer('dfs').table +Out[488]: +/dfs/table (Table(5,)) '' + description := { + "index": Int64Col(shape=(), dflt=0, pos=0), + "values_block_0": StringCol(itemsize=30, shape=(2,), dflt=b'', pos=1)} + byteorder := 'little' + chunkshape := (963,) + autoindex := True + colindexes := { + "index": Index(6, medium, shuffle, zlib(1)).is_csi=False} + +# A is created as a data_column with a size of 30 +# B is size is calculated +In [489]: store.append('dfs2', dfs, min_itemsize={'A': 30}) + +In [490]: store.get_storer('dfs2').table +Out[490]: +/dfs2/table (Table(5,)) '' + description := { + "index": Int64Col(shape=(), dflt=0, pos=0), + "values_block_0": StringCol(itemsize=3, shape=(1,), dflt=b'', pos=1), + "A": StringCol(itemsize=30, shape=(), dflt=b'', pos=2)} + byteorder := 'little' + chunkshape := (1598,) + autoindex := True + colindexes := { + "index": Index(6, medium, shuffle, zlib(1)).is_csi=False, + "A": Index(6, medium, shuffle, zlib(1)).is_csi=False} + +``` + +**nan_rep** + +字符串列将序列化 ``np.nan`` (缺失值)以 ``nan_rep`` 的字符串形式。默认的字符串值为``nan``。你可能会无意中将实际的``nan``值转换为缺失值。 + +``` python +In [491]: dfss = pd.DataFrame({'A': ['foo', 'bar', 'nan']}) + +In [492]: dfss +Out[492]: + A +0 foo +1 bar +2 nan + +In [493]: store.append('dfss', dfss) + +In [494]: store.select('dfss') +Out[494]: + A +0 foo +1 bar +2 NaN + +# here you need to specify a different nan rep +In [495]: store.append('dfss2', dfss, nan_rep='_nan_') + +In [496]: store.select('dfss2') +Out[496]: + A +0 foo +1 bar +2 nan + +``` + +### 外部兼容性 + +``HDFStore``以特定格式写入``table``对象,这些格式适用于产生无损往返的pandas对象。 对于外部兼容性, ``HDFStore`` 能读取本地的 ``PyTables`` 格式表格。 + +可以编写一个``HDFStore`` 对象,该对象可以使用``rhdf5`` 库 ([Package website](https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html))轻松导入到``R`` 中。创建表格存储可以像这样: + +``` python +In [497]: df_for_r = pd.DataFrame({"first": np.random.rand(100), + .....: "second": np.random.rand(100), + .....: "class": np.random.randint(0, 2, (100, ))}, + .....: index=range(100)) + .....: + +In [498]: df_for_r.head() +Out[498]: + first second class +0 0.366979 0.794525 0 +1 0.296639 0.635178 1 +2 0.395751 0.359693 0 +3 0.484648 0.970016 1 +4 0.810047 0.332303 0 + +In [499]: store_export = pd.HDFStore('export.h5') + +In [500]: store_export.append('df_for_r', df_for_r, data_columns=df_dc.columns) + +In [501]: store_export +Out[501]: + +File path: export.h5 + +``` +在这个R文件中使用``rhdf5``库能读入数据到``data.frame``对象中。下面这个示例函数从值中读取相应的列名和数据值,再组合它们到``data.frame``中: + +``` R +# Load values and column names for all datasets from corresponding nodes and +# insert them into one data.frame object. + +library(rhdf5) + +loadhdf5data <- function(h5File) { + +listing <- h5ls(h5File) +# Find all data nodes, values are stored in *_values and corresponding column +# titles in *_items +data_nodes <- grep("_values", listing$name) +name_nodes <- grep("_items", listing$name) +data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/") +name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/") +columns = list() +for (idx in seq(data_paths)) { + # NOTE: matrices returned by h5read have to be transposed to obtain + # required Fortran order! + data <- data.frame(t(h5read(h5File, data_paths[idx]))) + names <- t(h5read(h5File, name_paths[idx])) + entry <- data.frame(data) + colnames(entry) <- names + columns <- append(columns, entry) +} + +data <- data.frame(columns) + +return(data) +} + +``` + +现在你能导入 ``DataFrame`` 到R中: + +``` R +> data = loadhdf5data("transfer.hdf5") +> head(data) + first second class +1 0.4170220047 0.3266449 0 +2 0.7203244934 0.5270581 0 +3 0.0001143748 0.8859421 1 +4 0.3023325726 0.3572698 1 +5 0.1467558908 0.9085352 1 +6 0.0923385948 0.6233601 1 + +``` + +::: tip 注意 + +R函数列出了整个HDF5文件的内容,并从所有匹配的节点组合了``data.frame`` 对象,因此,如果你已将多个``DataFrame``对象存储到单个HDF5文件中,那么只能用它作为起点。 + +::: + +### 性能 + +- 同``fixed``存储相比较,``tables`` 格式会有写入的性能损失。这样的好处就是便于(大量的数据)能添加/删除和查询。与常规存储相比,写入时间通常更长。但是查询的时间就相当快,特别是在有索引的轴上。 +- 你可以传递 ``chunksize=`` 给``append``, 指定写入块的大小(默认是50000)。这将极大地降低写入时地内存使用情况。 +- 你可以将`` expectedrows =``传递给第一个``append``来设置``PyTables``将预期的总行数。 这将优化读取/写入性能。 +- 重复的行可以写入表格,但是在选取的时候会进行筛选(会选最后一项;然后表在主要、次要对上是唯一的)。 +- 如果你企图存储已经序列化的PyTables类型数据(而不是存储为本地数据),那将引发A ``PerformanceWarning`` 。更多信息和解决办法参见[Here](https://stackoverflow.com/questions/14355151/how-to-make-pandas-hdfstore-put-operation-faster/14370190#14370190)。 + +## Feather + +*New in version 0.20.0.* + +Feather provides binary columnar serialization for data frames. It is designed to make reading and writing data +frames efficient, and to make sharing data across data analysis languages easy. + +Feather is designed to faithfully serialize and de-serialize DataFrames, supporting all of the pandas +dtypes, including extension dtypes such as categorical and datetime with tz. + +Several caveats. + +- This is a newer library, and the format, though stable, is not guaranteed to be backward compatible +to the earlier versions. +- The format will NOT write an ``Index``, or ``MultiIndex`` for the +``DataFrame`` and will raise an error if a non-default one is provided. You +can ``.reset_index()`` to store the index or ``.reset_index(drop=True)`` to +ignore it. +- Duplicate column names and non-string columns names are not supported +- Non supported types include ``Period`` and actual Python object types. These will raise a helpful error message +on an attempt at serialization. + +See the [Full Documentation](https://github.com/wesm/feather). + +``` python +In [502]: df = pd.DataFrame({'a': list('abc'), + .....: 'b': list(range(1, 4)), + .....: 'c': np.arange(3, 6).astype('u1'), + .....: 'd': np.arange(4.0, 7.0, dtype='float64'), + .....: 'e': [True, False, True], + .....: 'f': pd.Categorical(list('abc')), + .....: 'g': pd.date_range('20130101', periods=3), + .....: 'h': pd.date_range('20130101', periods=3, tz='US/Eastern'), + .....: 'i': pd.date_range('20130101', periods=3, freq='ns')}) + .....: + +In [503]: df +Out[503]: + a b c d e f g h i +0 a 1 3 4.0 True a 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00.000000000 +1 b 2 4 5.0 False b 2013-01-02 2013-01-02 00:00:00-05:00 2013-01-01 00:00:00.000000001 +2 c 3 5 6.0 True c 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-01 00:00:00.000000002 + +In [504]: df.dtypes +Out[504]: +a object +b int64 +c uint8 +d float64 +e bool +f category +g datetime64[ns] +h datetime64[ns, US/Eastern] +i datetime64[ns] +dtype: object + +``` + +Write to a feather file. + +``` python +In [505]: df.to_feather('example.feather') + +``` + +Read from a feather file. + +``` python +In [506]: result = pd.read_feather('example.feather') + +In [507]: result +Out[507]: + a b c d e f g h i +0 a 1 3 4.0 True a 2013-01-01 2013-01-01 00:00:00-05:00 2013-01-01 00:00:00.000000000 +1 b 2 4 5.0 False b 2013-01-02 2013-01-02 00:00:00-05:00 2013-01-01 00:00:00.000000001 +2 c 3 5 6.0 True c 2013-01-03 2013-01-03 00:00:00-05:00 2013-01-01 00:00:00.000000002 + +# we preserve dtypes +In [508]: result.dtypes +Out[508]: +a object +b int64 +c uint8 +d float64 +e bool +f category +g datetime64[ns] +h datetime64[ns, US/Eastern] +i datetime64[ns] +dtype: object + +``` + +## Parquet + +*New in version 0.21.0.* + +[Apache Parquet](https://parquet.apache.org/) provides a partitioned binary columnar serialization for data frames. It is designed to +make reading and writing data frames efficient, and to make sharing data across data analysis +languages easy. Parquet can use a variety of compression techniques to shrink the file size as much as possible +while still maintaining good read performance. + +Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, supporting all of the pandas +dtypes, including extension dtypes such as datetime with tz. + +Several caveats. + +- Duplicate column names and non-string columns names are not supported. +- The ``pyarrow`` engine always writes the index to the output, but ``fastparquet`` only writes non-default +indexes. This extra column can cause problems for non-Pandas consumers that are not expecting it. You can +force including or omitting indexes with the ``index`` argument, regardless of the underlying engine. +- Index level names, if specified, must be strings. +- Categorical dtypes can be serialized to parquet, but will de-serialize as ``object`` dtype. +- Non supported types include ``Period`` and actual Python object types. These will raise a helpful error message +on an attempt at serialization. + +You can specify an ``engine`` to direct the serialization. This can be one of ``pyarrow``, or ``fastparquet``, or ``auto``. +If the engine is NOT specified, then the ``pd.options.io.parquet.engine`` option is checked; if this is also ``auto``, +then ``pyarrow`` is tried, and falling back to ``fastparquet``. + +See the documentation for [pyarrow](https://arrow.apache.org/docs/python/) and [fastparquet](https://fastparquet.readthedocs.io/en/latest/). + +::: tip Note + +These engines are very similar and should read/write nearly identical parquet format files. +Currently ``pyarrow`` does not support timedelta data, ``fastparquet>=0.1.4`` supports timezone aware datetimes. +These libraries differ by having different underlying dependencies (``fastparquet`` by using ``numba``, while ``pyarrow`` uses a c-library). + +::: + +``` python +In [509]: df = pd.DataFrame({'a': list('abc'), + .....: 'b': list(range(1, 4)), + .....: 'c': np.arange(3, 6).astype('u1'), + .....: 'd': np.arange(4.0, 7.0, dtype='float64'), + .....: 'e': [True, False, True], + .....: 'f': pd.date_range('20130101', periods=3), + .....: 'g': pd.date_range('20130101', periods=3, tz='US/Eastern')}) + .....: + +In [510]: df +Out[510]: + a b c d e f g +0 a 1 3 4.0 True 2013-01-01 2013-01-01 00:00:00-05:00 +1 b 2 4 5.0 False 2013-01-02 2013-01-02 00:00:00-05:00 +2 c 3 5 6.0 True 2013-01-03 2013-01-03 00:00:00-05:00 + +In [511]: df.dtypes +Out[511]: +a object +b int64 +c uint8 +d float64 +e bool +f datetime64[ns] +g datetime64[ns, US/Eastern] +dtype: object + +``` + +Write to a parquet file. + +``` python +In [512]: df.to_parquet('example_pa.parquet', engine='pyarrow') + +In [513]: df.to_parquet('example_fp.parquet', engine='fastparquet') + +``` + +Read from a parquet file. + +``` python +In [514]: result = pd.read_parquet('example_fp.parquet', engine='fastparquet') + +In [515]: result = pd.read_parquet('example_pa.parquet', engine='pyarrow') + +In [516]: result.dtypes +Out[516]: +a object +b int64 +c uint8 +d float64 +e bool +f datetime64[ns] +g datetime64[ns, US/Eastern] +dtype: object + +``` + +Read only certain columns of a parquet file. + +``` python +In [517]: result = pd.read_parquet('example_fp.parquet', + .....: engine='fastparquet', columns=['a', 'b']) + .....: + +In [518]: result = pd.read_parquet('example_pa.parquet', + .....: engine='pyarrow', columns=['a', 'b']) + .....: + +In [519]: result.dtypes +Out[519]: +a object +b int64 +dtype: object + +``` + +### Handling indexes + +Serializing a ``DataFrame`` to parquet may include the implicit index as one or +more columns in the output file. Thus, this code: + +``` python +In [520]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}) + +In [521]: df.to_parquet('test.parquet', engine='pyarrow') + +``` + +creates a parquet file with three columns if you use ``pyarrow`` for serialization: +``a``, ``b``, and ``__index_level_0__``. If you’re using ``fastparquet``, the +index [may or may not](https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write) +be written to the file. + +This unexpected extra column causes some databases like Amazon Redshift to reject +the file, because that column doesn’t exist in the target table. + +If you want to omit a dataframe’s indexes when writing, pass ``index=False`` to +[``to_parquet()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet): + +``` python +In [522]: df.to_parquet('test.parquet', index=False) + +``` + +This creates a parquet file with just the two expected columns, ``a`` and ``b``. +If your ``DataFrame`` has a custom index, you won’t get it back when you load +this file into a ``DataFrame``. + +Passing ``index=True`` will always write the index, even if that’s not the +underlying engine’s default behavior. + +### Partitioning Parquet files + +*New in version 0.24.0.* + +Parquet supports partitioning of data based on the values of one or more columns. + +``` python +In [523]: df = pd.DataFrame({'a': [0, 0, 1, 1], 'b': [0, 1, 0, 1]}) + +In [524]: df.to_parquet(fname='test', engine='pyarrow', + .....: partition_cols=['a'], compression=None) + .....: + +``` + +The *fname* specifies the parent directory to which data will be saved. +The *partition_cols* are the column names by which the dataset will be partitioned. +Columns are partitioned in the order they are given. The partition splits are +determined by the unique values in the partition columns. +The above example creates a partitioned dataset that may look like: + +``` +test +├── a=0 +│ ├── 0bac803e32dc42ae83fddfd029cbdebc.parquet +│ └── ... +└── a=1 + ├── e6ab24a4f45147b49b54a662f0c412a3.parquet + └── ... + +``` + +## SQL queries + +The ``pandas.io.sql`` module provides a collection of query wrappers to both +facilitate data retrieval and to reduce dependency on DB-specific API. Database abstraction +is provided by SQLAlchemy if installed. In addition you will need a driver library for +your database. Examples of such drivers are [psycopg2](http://initd.org/psycopg/) +for PostgreSQL or [pymysql](https://github.com/PyMySQL/PyMySQL) for MySQL. +For [SQLite](https://docs.python.org/3/library/sqlite3.html) this is +included in Python’s standard library by default. +You can find an overview of supported drivers for each SQL dialect in the +[SQLAlchemy docs](https://docs.sqlalchemy.org/en/latest/dialects/index.html). + +If SQLAlchemy is not installed, a fallback is only provided for sqlite (and +for mysql for backwards compatibility, but this is deprecated and will be +removed in a future version). +This mode requires a Python database adapter which respect the [Python +DB-API](https://www.python.org/dev/peps/pep-0249/). + +See also some [cookbook examples](cookbook.html#cookbook-sql) for some advanced strategies. + +The key functions are: + +Method | Description +---|--- +[read_sql_table](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql)(table_name, con[, schema, …]) | Read SQL database table into a DataFrame. +[read_sql_query](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html#pandas.read_sql_query)(sql, con[, index_col, …]) | Read SQL query into a DataFrame. +[read_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql)(sql, con[, index_col, …]) | Read SQL query or database table into a DataFrame. +[DataFrame.to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql)(self, name, con[, schema, …]) | Write records stored in a DataFrame to a SQL database. + +::: tip Note + +The function [``read_sql()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql) is a convenience wrapper around +[``read_sql_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html#pandas.read_sql_table) and [``read_sql_query()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html#pandas.read_sql_query) (and for +backward compatibility) and will delegate to specific function depending on +the provided input (database table name or sql query). +Table names do not need to be quoted if they have special characters. + +::: + +In the following example, we use the [SQlite](https://www.sqlite.org/) SQL database +engine. You can use a temporary SQLite database where data are stored in +“memory”. + +To connect with SQLAlchemy you use the ``create_engine()`` function to create an engine +object from database URI. You only need to create the engine once per database you are +connecting to. +For more information on ``create_engine()`` and the URI formatting, see the examples +below and the SQLAlchemy [documentation](https://docs.sqlalchemy.org/en/latest/core/engines.html) + +``` python +In [525]: from sqlalchemy import create_engine + +# Create your engine. +In [526]: engine = create_engine('sqlite:///:memory:') + +``` + +If you want to manage your own connections you can pass one of those instead: + +``` python +with engine.connect() as conn, conn.begin(): + data = pd.read_sql_table('data', conn) + +``` + +### Writing DataFrames + +Assuming the following data is in a ``DataFrame`` ``data``, we can insert it into +the database using [``to_sql()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql). + +id | Date | Col_1 | Col_2 | Col_3 +---|---|---|---|--- +26 | 2012-10-18 | X | 25.7 | True +42 | 2012-10-19 | Y | -12.4 | False +63 | 2012-10-20 | Z | 5.73 | True + +``` python +In [527]: data +Out[527]: + id Date Col_1 Col_2 Col_3 +0 26 2010-10-18 X 27.50 True +1 42 2010-10-19 Y -12.50 False +2 63 2010-10-20 Z 5.73 True + +In [528]: data.to_sql('data', engine) + +``` + +With some databases, writing large DataFrames can result in errors due to +packet size limitations being exceeded. This can be avoided by setting the +``chunksize`` parameter when calling ``to_sql``. For example, the following +writes ``data`` to the database in batches of 1000 rows at a time: + +``` python +In [529]: data.to_sql('data_chunked', engine, chunksize=1000) + +``` + +#### SQL data types + +[``to_sql()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql) will try to map your data to an appropriate +SQL data type based on the dtype of the data. When you have columns of dtype +``object``, pandas will try to infer the data type. + +You can always override the default type by specifying the desired SQL type of +any of the columns by using the ``dtype`` argument. This argument needs a +dictionary mapping column names to SQLAlchemy types (or strings for the sqlite3 +fallback mode). +For example, specifying to use the sqlalchemy ``String`` type instead of the +default ``Text`` type for string columns: + +``` python +In [530]: from sqlalchemy.types import String + +In [531]: data.to_sql('data_dtype', engine, dtype={'Col_1': String}) + +``` + +::: tip Note + +Due to the limited support for timedelta’s in the different database +flavors, columns with type ``timedelta64`` will be written as integer +values as nanoseconds to the database and a warning will be raised. + +::: + +::: tip Note + +Columns of ``category`` dtype will be converted to the dense representation +as you would get with ``np.asarray(categorical)`` (e.g. for string categories +this gives an array of strings). +Because of this, reading the database table back in does **not** generate +a categorical. + +::: + +### Datetime data types + +Using SQLAlchemy, [``to_sql()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql) is capable of writing +datetime data that is timezone naive or timezone aware. However, the resulting +data stored in the database ultimately depends on the supported data type +for datetime data of the database system being used. + +The following table lists supported data types for datetime data for some +common databases. Other database dialects may have different data types for +datetime data. + +Database | SQL Datetime Types | Timezone Support +---|---|--- +SQLite | TEXT | No +MySQL | TIMESTAMP or DATETIME | No +PostgreSQL | TIMESTAMP or TIMESTAMP WITH TIME ZONE | Yes + +When writing timezone aware data to databases that do not support timezones, +the data will be written as timezone naive timestamps that are in local time +with respect to the timezone. + +[``read_sql_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html#pandas.read_sql_table) is also capable of reading datetime data that is +timezone aware or naive. When reading ``TIMESTAMP WITH TIME ZONE`` types, pandas +will convert the data to UTC. + +#### Insertion method + +*New in version 0.24.0.* + +The parameter ``method`` controls the SQL insertion clause used. +Possible values are: + +- ``None``: Uses standard SQL ``INSERT`` clause (one per row). +- ``'multi'``: Pass multiple values in a single ``INSERT`` clause. +It uses a special SQL syntax not supported by all backends. +This usually provides better performance for analytic databases +like Presto and Redshift, but has worse performance for +traditional SQL backend if the table contains many columns. +For more information check the SQLAlchemy [documention](http://docs.sqlalchemy.org/en/latest/core/dml.html#sqlalchemy.sql.expression.Insert.values.params.*args). +- callable with signature ``(pd_table, conn, keys, data_iter)``: +This can be used to implement a more performant insertion method based on +specific backend dialect features. + +Example of a callable using PostgreSQL [COPY clause](https://www.postgresql.org/docs/current/static/sql-copy.html): + +``` python +# Alternative to_sql() *method* for DBs that support COPY FROM +import csv +from io import StringIO + +def psql_insert_copy(table, conn, keys, data_iter): + # gets a DBAPI connection that can provide a cursor + dbapi_conn = conn.connection + with dbapi_conn.cursor() as cur: + s_buf = StringIO() + writer = csv.writer(s_buf) + writer.writerows(data_iter) + s_buf.seek(0) + + columns = ', '.join('"{}"'.format(k) for k in keys) + if table.schema: + table_name = '{}.{}'.format(table.schema, table.name) + else: + table_name = table.name + + sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format( + table_name, columns) + cur.copy_expert(sql=sql, file=s_buf) + +``` + +### Reading tables + +[``read_sql_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html#pandas.read_sql_table) will read a database table given the +table name and optionally a subset of columns to read. + +::: tip Note + +In order to use [``read_sql_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html#pandas.read_sql_table), you **must** have the +SQLAlchemy optional dependency installed. + +::: + +``` python +In [532]: pd.read_sql_table('data', engine) +Out[532]: + index id Date Col_1 Col_2 Col_3 +0 0 26 2010-10-18 X 27.50 True +1 1 42 2010-10-19 Y -12.50 False +2 2 63 2010-10-20 Z 5.73 True + +``` + +You can also specify the name of the column as the ``DataFrame`` index, +and specify a subset of columns to be read. + +``` python +In [533]: pd.read_sql_table('data', engine, index_col='id') +Out[533]: + index Date Col_1 Col_2 Col_3 +id +26 0 2010-10-18 X 27.50 True +42 1 2010-10-19 Y -12.50 False +63 2 2010-10-20 Z 5.73 True + +In [534]: pd.read_sql_table('data', engine, columns=['Col_1', 'Col_2']) +Out[534]: + Col_1 Col_2 +0 X 27.50 +1 Y -12.50 +2 Z 5.73 + +``` + +And you can explicitly force columns to be parsed as dates: + +``` python +In [535]: pd.read_sql_table('data', engine, parse_dates=['Date']) +Out[535]: + index id Date Col_1 Col_2 Col_3 +0 0 26 2010-10-18 X 27.50 True +1 1 42 2010-10-19 Y -12.50 False +2 2 63 2010-10-20 Z 5.73 True + +``` + +If needed you can explicitly specify a format string, or a dict of arguments +to pass to [``pandas.to_datetime()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html#pandas.to_datetime): + +``` python +pd.read_sql_table('data', engine, parse_dates={'Date': '%Y-%m-%d'}) +pd.read_sql_table('data', engine, + parse_dates={'Date': {'format': '%Y-%m-%d %H:%M:%S'}}) + +``` + +You can check if a table exists using ``has_table()`` + +### Schema support + +Reading from and writing to different schema’s is supported through the ``schema`` +keyword in the [``read_sql_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html#pandas.read_sql_table) and [``to_sql()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql) +functions. Note however that this depends on the database flavor (sqlite does not +have schema’s). For example: + +``` python +df.to_sql('table', engine, schema='other_schema') +pd.read_sql_table('table', engine, schema='other_schema') + +``` + +### Querying + +You can query using raw SQL in the [``read_sql_query()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html#pandas.read_sql_query) function. +In this case you must use the SQL variant appropriate for your database. +When using SQLAlchemy, you can also pass SQLAlchemy Expression language constructs, +which are database-agnostic. + +``` python +In [536]: pd.read_sql_query('SELECT * FROM data', engine) +Out[536]: + index id Date Col_1 Col_2 Col_3 +0 0 26 2010-10-18 00:00:00.000000 X 27.50 1 +1 1 42 2010-10-19 00:00:00.000000 Y -12.50 0 +2 2 63 2010-10-20 00:00:00.000000 Z 5.73 1 + +``` + +Of course, you can specify a more “complex” query. + +``` python +In [537]: pd.read_sql_query("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine) +Out[537]: + id Col_1 Col_2 +0 42 Y -12.5 + +``` + +The [``read_sql_query()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html#pandas.read_sql_query) function supports a ``chunksize`` argument. +Specifying this will return an iterator through chunks of the query result: + +``` python +In [538]: df = pd.DataFrame(np.random.randn(20, 3), columns=list('abc')) + +In [539]: df.to_sql('data_chunks', engine, index=False) + +``` + +``` python +In [540]: for chunk in pd.read_sql_query("SELECT * FROM data_chunks", + .....: engine, chunksize=5): + .....: print(chunk) + .....: + a b c +0 -0.900850 -0.323746 0.037100 +1 0.057533 -0.032842 0.550902 +2 1.026623 1.035455 -0.965140 +3 -0.252405 -1.255987 0.639156 +4 1.076701 -0.309155 -0.800182 + a b c +0 -0.206623 0.496077 -0.219935 +1 0.631362 -1.166743 1.808368 +2 0.023531 0.987573 0.471400 +3 -0.982250 -0.192482 1.195452 +4 -1.758855 0.477551 1.412567 + a b c +0 -1.120570 1.232764 0.417814 +1 1.688089 -0.037645 -0.269582 +2 0.646823 -0.603366 1.592966 +3 0.724019 -0.515606 -0.180920 +4 0.038244 -2.292866 -0.114634 + a b c +0 -0.970230 -0.963257 -0.128304 +1 0.498621 -1.496506 0.701471 +2 -0.272608 -0.119424 -0.882023 +3 -0.253477 0.714395 0.664179 +4 0.897140 0.455791 1.549590 + +``` + +You can also run a plain query without creating a ``DataFrame`` with +``execute()``. This is useful for queries that don’t return values, +such as INSERT. This is functionally equivalent to calling ``execute`` on the +SQLAlchemy engine or db connection object. Again, you must use the SQL syntax +variant appropriate for your database. + +``` python +from pandas.io import sql +sql.execute('SELECT * FROM table_name', engine) +sql.execute('INSERT INTO table_name VALUES(?, ?, ?)', engine, + params=[('id', 1, 12.2, True)]) + +``` + +### Engine connection examples + +To connect with SQLAlchemy you use the ``create_engine()`` function to create an engine +object from database URI. You only need to create the engine once per database you are +connecting to. + +``` python +from sqlalchemy import create_engine + +engine = create_engine('postgresql://scott:tiger@localhost:5432/mydatabase') + +engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo') + +engine = create_engine('oracle://scott:tiger@127.0.0.1:1521/sidname') + +engine = create_engine('mssql+pyodbc://mydsn') + +# sqlite:/// +# where is relative: +engine = create_engine('sqlite:///foo.db') + +# or absolute, starting with a slash: +engine = create_engine('sqlite:////absolute/path/to/foo.db') + +``` + +For more information see the examples the SQLAlchemy [documentation](https://docs.sqlalchemy.org/en/latest/core/engines.html) + +### Advanced SQLAlchemy queries + +You can use SQLAlchemy constructs to describe your query. + +Use ``sqlalchemy.text()`` to specify query parameters in a backend-neutral way + +``` python +In [541]: import sqlalchemy as sa + +In [542]: pd.read_sql(sa.text('SELECT * FROM data where Col_1=:col1'), + .....: engine, params={'col1': 'X'}) + .....: +Out[542]: + index id Date Col_1 Col_2 Col_3 +0 0 26 2010-10-18 00:00:00.000000 X 27.5 1 + +``` + +If you have an SQLAlchemy description of your database you can express where conditions using SQLAlchemy expressions + +``` python +In [543]: metadata = sa.MetaData() + +In [544]: data_table = sa.Table('data', metadata, + .....: sa.Column('index', sa.Integer), + .....: sa.Column('Date', sa.DateTime), + .....: sa.Column('Col_1', sa.String), + .....: sa.Column('Col_2', sa.Float), + .....: sa.Column('Col_3', sa.Boolean), + .....: ) + .....: + +In [545]: pd.read_sql(sa.select([data_table]).where(data_table.c.Col_3 is True), engine) +Out[545]: +Empty DataFrame +Columns: [index, Date, Col_1, Col_2, Col_3] +Index: [] + +``` + +You can combine SQLAlchemy expressions with parameters passed to [``read_sql()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html#pandas.read_sql) using ``sqlalchemy.bindparam()`` + +``` python +In [546]: import datetime as dt + +In [547]: expr = sa.select([data_table]).where(data_table.c.Date > sa.bindparam('date')) + +In [548]: pd.read_sql(expr, engine, params={'date': dt.datetime(2010, 10, 18)}) +Out[548]: + index Date Col_1 Col_2 Col_3 +0 1 2010-10-19 Y -12.50 False +1 2 2010-10-20 Z 5.73 True + +``` + +### Sqlite fallback + +The use of sqlite is supported without using SQLAlchemy. +This mode requires a Python database adapter which respect the [Python +DB-API](https://www.python.org/dev/peps/pep-0249/). + +You can create connections like so: + +``` python +import sqlite3 +con = sqlite3.connect(':memory:') + +``` + +And then issue the following queries: + +``` python +data.to_sql('data', con) +pd.read_sql_query("SELECT * FROM data", con) + +``` + +## Google BigQuery + +::: danger Warning + +Starting in 0.20.0, pandas has split off Google BigQuery support into the +separate package ``pandas-gbq``. You can ``pip install pandas-gbq`` to get it. + +::: + +The ``pandas-gbq`` package provides functionality to read/write from Google BigQuery. + +pandas integrates with this external package. if ``pandas-gbq`` is installed, you can +use the pandas methods ``pd.read_gbq`` and ``DataFrame.to_gbq``, which will call the +respective functions from ``pandas-gbq``. + +Full documentation can be found [here](https://pandas-gbq.readthedocs.io/). + +## Stata format + +### Writing to stata format + +The method ``to_stata()`` will write a DataFrame +into a .dta file. The format version of this file is always 115 (Stata 12). + +``` python +In [549]: df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) + +In [550]: df.to_stata('stata.dta') + +``` + +Stata data files have limited data type support; only strings with +244 or fewer characters, ``int8``, ``int16``, ``int32``, ``float32`` +and ``float64`` can be stored in ``.dta`` files. Additionally, +Stata reserves certain values to represent missing data. Exporting a +non-missing value that is outside of the permitted range in Stata for +a particular data type will retype the variable to the next larger +size. For example, ``int8`` values are restricted to lie between -127 +and 100 in Stata, and so variables with values above 100 will trigger +a conversion to ``int16``. ``nan`` values in floating points data +types are stored as the basic missing data type (``.`` in Stata). + +::: tip Note + +It is not possible to export missing data values for integer data types. + +::: + +The Stata writer gracefully handles other data types including ``int64``, +``bool``, ``uint8``, ``uint16``, ``uint32`` by casting to +the smallest supported type that can represent the data. For example, data +with a type of ``uint8`` will be cast to ``int8`` if all values are less than +100 (the upper bound for non-missing ``int8`` data in Stata), or, if values are +outside of this range, the variable is cast to ``int16``. + +::: danger Warning + +Conversion from ``int64`` to ``float64`` may result in a loss of precision +if ``int64`` values are larger than 2**53. + +::: + +::: danger Warning + +``StataWriter`` and +``to_stata()`` only support fixed width +strings containing up to 244 characters, a limitation imposed by the version +115 dta file format. Attempting to write Stata dta files with strings +longer than 244 characters raises a ``ValueError``. + +::: + +### Reading from Stata format + +The top-level function ``read_stata`` will read a dta file and return +either a ``DataFrame`` or a ``StataReader`` that can +be used to read the file incrementally. + +``` python +In [551]: pd.read_stata('stata.dta') +Out[551]: + index A B +0 0 1.031231 0.196447 +1 1 0.190188 0.619078 +2 2 0.036658 -0.100501 +3 3 0.201772 1.763002 +4 4 0.454977 -1.958922 +5 5 -0.628529 0.133171 +6 6 -1.274374 2.518925 +7 7 -0.517547 -0.360773 +8 8 0.877961 -1.881598 +9 9 -0.699067 -1.566913 + +``` + +Specifying a ``chunksize`` yields a +``StataReader`` instance that can be used to +read ``chunksize`` lines from the file at a time. The ``StataReader`` +object can be used as an iterator. + +``` python +In [552]: reader = pd.read_stata('stata.dta', chunksize=3) + +In [553]: for df in reader: + .....: print(df.shape) + .....: +(3, 3) +(3, 3) +(3, 3) +(1, 3) + +``` + +For more fine-grained control, use ``iterator=True`` and specify +``chunksize`` with each call to +``read()``. + +``` python +In [554]: reader = pd.read_stata('stata.dta', iterator=True) + +In [555]: chunk1 = reader.read(5) + +In [556]: chunk2 = reader.read(5) + +``` + +Currently the ``index`` is retrieved as a column. + +The parameter ``convert_categoricals`` indicates whether value labels should be +read and used to create a ``Categorical`` variable from them. Value labels can +also be retrieved by the function ``value_labels``, which requires ``read()`` +to be called before use. + +The parameter ``convert_missing`` indicates whether missing value +representations in Stata should be preserved. If ``False`` (the default), +missing values are represented as ``np.nan``. If ``True``, missing values are +represented using ``StataMissingValue`` objects, and columns containing missing +values will have ``object`` data type. + +::: tip Note + +[``read_stata()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_stata.html#pandas.read_stata) and +``StataReader`` support .dta formats 113-115 +(Stata 10-12), 117 (Stata 13), and 118 (Stata 14). + +::: + +::: tip Note + +Setting ``preserve_dtypes=False`` will upcast to the standard pandas data types: +``int64`` for all integer types and ``float64`` for floating point data. By default, +the Stata data types are preserved when importing. + +::: + +#### Categorical data + +``Categorical`` data can be exported to Stata data files as value labeled data. +The exported data consists of the underlying category codes as integer data values +and the categories as value labels. Stata does not have an explicit equivalent +to a ``Categorical`` and information about whether the variable is ordered +is lost when exporting. + +::: danger Warning + +Stata only supports string value labels, and so ``str`` is called on the +categories when exporting data. Exporting ``Categorical`` variables with +non-string categories produces a warning, and can result a loss of +information if the ``str`` representations of the categories are not unique. + +::: + +Labeled data can similarly be imported from Stata data files as ``Categorical`` +variables using the keyword argument ``convert_categoricals`` (``True`` by default). +The keyword argument ``order_categoricals`` (``True`` by default) determines +whether imported ``Categorical`` variables are ordered. + +::: tip Note + +When importing categorical data, the values of the variables in the Stata +data file are not preserved since ``Categorical`` variables always +use integer data types between ``-1`` and ``n-1`` where ``n`` is the number +of categories. If the original values in the Stata data file are required, +these can be imported by setting ``convert_categoricals=False``, which will +import original data (but not the variable labels). The original values can +be matched to the imported categorical data since there is a simple mapping +between the original Stata data values and the category codes of imported +Categorical variables: missing values are assigned code ``-1``, and the +smallest original value is assigned ``0``, the second smallest is assigned +``1`` and so on until the largest original value is assigned the code ``n-1``. + +::: + +::: tip Note + +Stata supports partially labeled series. These series have value labels for +some but not all data values. Importing a partially labeled series will produce +a ``Categorical`` with string categories for the values that are labeled and +numeric categories for values with no label. + +::: + +## SAS formats + +The top-level function [``read_sas()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sas.html#pandas.read_sas) can read (but not write) SAS +*xport* (.XPT) and (since v0.18.0) *SAS7BDAT* (.sas7bdat) format files. + +SAS files only contain two value types: ASCII text and floating point +values (usually 8 bytes but sometimes truncated). For xport files, +there is no automatic type conversion to integers, dates, or +categoricals. For SAS7BDAT files, the format codes may allow date +variables to be automatically converted to dates. By default the +whole file is read and returned as a ``DataFrame``. + +Specify a ``chunksize`` or use ``iterator=True`` to obtain reader +objects (``XportReader`` or ``SAS7BDATReader``) for incrementally +reading the file. The reader objects also have attributes that +contain additional information about the file and its variables. + +Read a SAS7BDAT file: + +``` python +df = pd.read_sas('sas_data.sas7bdat') + +``` + +Obtain an iterator and read an XPORT file 100,000 lines at a time: + +``` python +def do_something(chunk): + pass + +rdr = pd.read_sas('sas_xport.xpt', chunk=100000) +for chunk in rdr: + do_something(chunk) + +``` + +The [specification](https://support.sas.com/techsup/technote/ts140.pdf) for the xport file format is available from the SAS +web site. + +No official documentation is available for the SAS7BDAT format. + +## Other file formats + +pandas itself only supports IO with a limited set of file formats that map +cleanly to its tabular data model. For reading and writing other file formats +into and from pandas, we recommend these packages from the broader community. + +### netCDF + +[xarray](https://xarray.pydata.org/) provides data structures inspired by the pandas ``DataFrame`` for working +with multi-dimensional datasets, with a focus on the netCDF file format and +easy conversion to and from pandas. + +## Performance considerations + +This is an informal comparison of various IO methods, using pandas +0.20.3. Timings are machine dependent and small differences should be +ignored. + +``` python +In [1]: sz = 1000000 +In [2]: df = pd.DataFrame({'A': np.random.randn(sz), 'B': [1] * sz}) + +In [3]: df.info() + +RangeIndex: 1000000 entries, 0 to 999999 +Data columns (total 2 columns): +A 1000000 non-null float64 +B 1000000 non-null int64 +dtypes: float64(1), int64(1) +memory usage: 15.3 MB + +``` + +Given the next test set: + +``` python +from numpy.random import randn + +sz = 1000000 +df = pd.DataFrame({'A': randn(sz), 'B': [1] * sz}) + + +def test_sql_write(df): + if os.path.exists('test.sql'): + os.remove('test.sql') + sql_db = sqlite3.connect('test.sql') + df.to_sql(name='test_table', con=sql_db) + sql_db.close() + + +def test_sql_read(): + sql_db = sqlite3.connect('test.sql') + pd.read_sql_query("select * from test_table", sql_db) + sql_db.close() + + +def test_hdf_fixed_write(df): + df.to_hdf('test_fixed.hdf', 'test', mode='w') + + +def test_hdf_fixed_read(): + pd.read_hdf('test_fixed.hdf', 'test') + + +def test_hdf_fixed_write_compress(df): + df.to_hdf('test_fixed_compress.hdf', 'test', mode='w', complib='blosc') + + +def test_hdf_fixed_read_compress(): + pd.read_hdf('test_fixed_compress.hdf', 'test') + + +def test_hdf_table_write(df): + df.to_hdf('test_table.hdf', 'test', mode='w', format='table') + + +def test_hdf_table_read(): + pd.read_hdf('test_table.hdf', 'test') + + +def test_hdf_table_write_compress(df): + df.to_hdf('test_table_compress.hdf', 'test', mode='w', + complib='blosc', format='table') + + +def test_hdf_table_read_compress(): + pd.read_hdf('test_table_compress.hdf', 'test') + + +def test_csv_write(df): + df.to_csv('test.csv', mode='w') + + +def test_csv_read(): + pd.read_csv('test.csv', index_col=0) + + +def test_feather_write(df): + df.to_feather('test.feather') + + +def test_feather_read(): + pd.read_feather('test.feather') + + +def test_pickle_write(df): + df.to_pickle('test.pkl') + + +def test_pickle_read(): + pd.read_pickle('test.pkl') + + +def test_pickle_write_compress(df): + df.to_pickle('test.pkl.compress', compression='xz') + + +def test_pickle_read_compress(): + pd.read_pickle('test.pkl.compress', compression='xz') + +``` + +When writing, the top-three functions in terms of speed are are +``test_pickle_write``, ``test_feather_write`` and ``test_hdf_fixed_write_compress``. + +``` python +In [14]: %timeit test_sql_write(df) +2.37 s ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +In [15]: %timeit test_hdf_fixed_write(df) +194 ms ± 65.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) + +In [26]: %timeit test_hdf_fixed_write_compress(df) +119 ms ± 2.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) + +In [16]: %timeit test_hdf_table_write(df) +623 ms ± 125 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +In [27]: %timeit test_hdf_table_write_compress(df) +563 ms ± 23.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +In [17]: %timeit test_csv_write(df) +3.13 s ± 49.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +In [30]: %timeit test_feather_write(df) +103 ms ± 5.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) + +In [31]: %timeit test_pickle_write(df) +109 ms ± 3.72 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) + +In [32]: %timeit test_pickle_write_compress(df) +3.33 s ± 55.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +``` + +When reading, the top three are ``test_feather_read``, ``test_pickle_read`` and +``test_hdf_fixed_read``. + +``` python +In [18]: %timeit test_sql_read() +1.35 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +In [19]: %timeit test_hdf_fixed_read() +14.3 ms ± 438 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) + +In [28]: %timeit test_hdf_fixed_read_compress() +23.5 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + +In [20]: %timeit test_hdf_table_read() +35.4 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + +In [29]: %timeit test_hdf_table_read_compress() +42.6 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) + +In [22]: %timeit test_csv_read() +516 ms ± 27.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +In [33]: %timeit test_feather_read() +4.06 ms ± 115 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) + +In [34]: %timeit test_pickle_read() +6.5 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) + +In [35]: %timeit test_pickle_read_compress() +588 ms ± 3.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) + +``` + +Space on disk (in bytes) + +``` +34816000 Aug 21 18:00 test.sql +24009240 Aug 21 18:00 test_fixed.hdf + 7919610 Aug 21 18:00 test_fixed_compress.hdf +24458892 Aug 21 18:00 test_table.hdf + 8657116 Aug 21 18:00 test_table_compress.hdf +28520770 Aug 21 18:00 test.csv +16000248 Aug 21 18:00 test.feather +16000848 Aug 21 18:00 test.pkl + 7554108 Aug 21 18:00 test.pkl.compress + +``` diff --git a/Python/pandas/user_guide/merging.md b/Python/pandas/user_guide/merging.md new file mode 100644 index 00000000..4a87d98f --- /dev/null +++ b/Python/pandas/user_guide/merging.md @@ -0,0 +1,1415 @@ +# Merge, join, and concatenate + +pandas provides various facilities for easily combining together Series or +DataFrame with various kinds of set logic for the indexes +and relational algebra functionality in the case of join / merge-type +operations. + +## Concatenating objects + +The [``concat()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html#pandas.concat) function (in the main pandas namespace) does all of +the heavy lifting of performing concatenation operations along an axis while +performing optional set logic (union or intersection) of the indexes (if any) on +the other axes. Note that I say “if any” because there is only a single possible +axis of concatenation for Series. + +Before diving into all of the details of ``concat`` and what it can do, here is +a simple example: + +``` python +In [1]: df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], + ...: 'B': ['B0', 'B1', 'B2', 'B3'], + ...: 'C': ['C0', 'C1', 'C2', 'C3'], + ...: 'D': ['D0', 'D1', 'D2', 'D3']}, + ...: index=[0, 1, 2, 3]) + ...: + +In [2]: df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], + ...: 'B': ['B4', 'B5', 'B6', 'B7'], + ...: 'C': ['C4', 'C5', 'C6', 'C7'], + ...: 'D': ['D4', 'D5', 'D6', 'D7']}, + ...: index=[4, 5, 6, 7]) + ...: + +In [3]: df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'], + ...: 'B': ['B8', 'B9', 'B10', 'B11'], + ...: 'C': ['C8', 'C9', 'C10', 'C11'], + ...: 'D': ['D8', 'D9', 'D10', 'D11']}, + ...: index=[8, 9, 10, 11]) + ...: + +In [4]: frames = [df1, df2, df3] + +In [5]: result = pd.concat(frames) +``` + +![merging_concat_basic](https://static.pypandas.cn/public/static/images/merging_concat_basic.png) + +Like its sibling function on ndarrays, ``numpy.concatenate``, ``pandas.concat`` +takes a list or dict of homogeneously-typed objects and concatenates them with +some configurable handling of “what to do with the other axes”: + +``` python +pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, + levels=None, names=None, verify_integrity=False, copy=True) +``` + +- ``objs`` : a sequence or mapping of Series or DataFrame objects. If a +dict is passed, the sorted keys will be used as the *keys* argument, unless +it is passed, in which case the values will be selected (see below). Any None +objects will be dropped silently unless they are all None in which case a +ValueError will be raised. +- ``axis`` : {0, 1, …}, default 0. The axis to concatenate along. +- ``join`` : {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on +other axis(es). Outer for union and inner for intersection. +- ``ignore_index`` : boolean, default False. If True, do not use the index +values on the concatenation axis. The resulting axis will be labeled 0, …, +n - 1. This is useful if you are concatenating objects where the +concatenation axis does not have meaningful indexing information. Note +the index values on the other axes are still respected in the join. +- ``keys`` : sequence, default None. Construct hierarchical index using the +passed keys as the outermost level. If multiple levels passed, should +contain tuples. +- ``levels`` : list of sequences, default None. Specific levels (unique values) +to use for constructing a MultiIndex. Otherwise they will be inferred from the +keys. +- ``names`` : list, default None. Names for the levels in the resulting +hierarchical index. +- ``verify_integrity`` : boolean, default False. Check whether the new +concatenated axis contains duplicates. This can be very expensive relative +to the actual data concatenation. +- ``copy`` : boolean, default True. If False, do not copy data unnecessarily. + +Without a little bit of context many of these arguments don’t make much sense. +Let’s revisit the above example. Suppose we wanted to associate specific keys +with each of the pieces of the chopped up DataFrame. We can do this using the +``keys`` argument: + +``` python +In [6]: result = pd.concat(frames, keys=['x', 'y', 'z']) +``` + +![merging_concat_keys](https://static.pypandas.cn/public/static/images/merging_concat_keys.png) + +As you can see (if you’ve read the rest of the documentation), the resulting +object’s index has a [hierarchical index](advanced.html#advanced-hierarchical). This +means that we can now select out each chunk by key: + +``` python +In [7]: result.loc['y'] +Out[7]: + A B C D +4 A4 B4 C4 D4 +5 A5 B5 C5 D5 +6 A6 B6 C6 D6 +7 A7 B7 C7 D7 +``` + +It’s not a stretch to see how this can be very useful. More detail on this +functionality below. + +::: tip Note + +It is worth noting that [``concat()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html#pandas.concat) (and therefore +``append()``) makes a full copy of the data, and that constantly +reusing this function can create a significant performance hit. If you need +to use the operation over several datasets, use a list comprehension. + +::: + +``` python +frames = [ process_your_file(f) for f in files ] +result = pd.concat(frames) +``` + +### Set logic on the other axes + +When gluing together multiple DataFrames, you have a choice of how to handle +the other axes (other than the one being concatenated). This can be done in +the following two ways: + +- Take the union of them all, ``join='outer'``. This is the default +option as it results in zero information loss. +- Take the intersection, ``join='inner'``. + +Here is an example of each of these methods. First, the default ``join='outer'`` +behavior: + +``` python +In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'], + ...: 'D': ['D2', 'D3', 'D6', 'D7'], + ...: 'F': ['F2', 'F3', 'F6', 'F7']}, + ...: index=[2, 3, 6, 7]) + ...: + +In [9]: result = pd.concat([df1, df4], axis=1, sort=False) +``` + +![merging_concat_axis1](https://static.pypandas.cn/public/static/images/merging_concat_axis1.png) + +::: danger Warning + +*Changed in version 0.23.0.* + +The default behavior with ``join='outer'`` is to sort the other axis +(columns in this case). In a future version of pandas, the default will +be to not sort. We specified ``sort=False`` to opt in to the new +behavior now. + +::: + +Here is the same thing with ``join='inner'``: + +``` python +In [10]: result = pd.concat([df1, df4], axis=1, join='inner') +``` + +![merging_concat_axis1_inner](https://static.pypandas.cn/public/static/images/merging_concat_axis1_inner.png) + +Lastly, suppose we just wanted to reuse the *exact index* from the original +DataFrame: + +``` python +In [11]: result = pd.concat([df1, df4], axis=1).reindex(df1.index) +``` + +Similarly, we could index before the concatenation: + +``` python +In [12]: pd.concat([df1, df4.reindex(df1.index)], axis=1) +Out[12]: + A B C D B D F +0 A0 B0 C0 D0 NaN NaN NaN +1 A1 B1 C1 D1 NaN NaN NaN +2 A2 B2 C2 D2 B2 D2 F2 +3 A3 B3 C3 D3 B3 D3 F3 +``` + +![merging_concat_axis1_join_axes](https://static.pypandas.cn/public/static/images/merging_concat_axis1_join_axes.png) + +### Concatenating using ``append`` + +A useful shortcut to [``concat()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html#pandas.concat) are the [``append()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html#pandas.DataFrame.append) +instance methods on ``Series`` and ``DataFrame``. These methods actually predated +``concat``. They concatenate along ``axis=0``, namely the index: + +``` python +In [13]: result = df1.append(df2) +``` + +![merging_append1](https://static.pypandas.cn/public/static/images/merging_append1.png) + +In the case of ``DataFrame``, the indexes must be disjoint but the columns do not +need to be: + +``` python +In [14]: result = df1.append(df4, sort=False) +``` + +![merging_append2](https://static.pypandas.cn/public/static/images/merging_append2.png) + +``append`` may take multiple objects to concatenate: + +``` python +In [15]: result = df1.append([df2, df3]) +``` + +![merging_append3](https://static.pypandas.cn/public/static/images/merging_append3.png) + +::: tip Note + +Unlike the ``append()`` method, which appends to the original list +and returns ``None``, [``append()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html#pandas.DataFrame.append) here **does not** modify +``df1`` and returns its copy with ``df2`` appended. + +::: + +### Ignoring indexes on the concatenation axis + +For ``DataFrame`` objects which don’t have a meaningful index, you may wish +to append them and ignore the fact that they may have overlapping indexes. To +do this, use the ``ignore_index`` argument: + +``` python +In [16]: result = pd.concat([df1, df4], ignore_index=True, sort=False) +``` + +![merging_concat_ignore_index](https://static.pypandas.cn/public/static/images/merging_concat_ignore_index.png) + +This is also a valid argument to [``DataFrame.append()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html#pandas.DataFrame.append): + +``` python +In [17]: result = df1.append(df4, ignore_index=True, sort=False) +``` + +![merging_append_ignore_index](https://static.pypandas.cn/public/static/images/merging_append_ignore_index.png) + +### Concatenating with mixed ndims + +You can concatenate a mix of ``Series`` and ``DataFrame`` objects. The +``Series`` will be transformed to ``DataFrame`` with the column name as +the name of the ``Series``. + +``` python +In [18]: s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X') + +In [19]: result = pd.concat([df1, s1], axis=1) +``` + +![merging_concat_mixed_ndim](https://static.pypandas.cn/public/static/images/merging_concat_mixed_ndim.png) + +::: tip Note + +Since we’re concatenating a ``Series`` to a ``DataFrame``, we could have +achieved the same result with [``DataFrame.assign()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html#pandas.DataFrame.assign). To concatenate an +arbitrary number of pandas objects (``DataFrame`` or ``Series``), use +``concat``. + +::: + +If unnamed ``Series`` are passed they will be numbered consecutively. + +``` python +In [20]: s2 = pd.Series(['_0', '_1', '_2', '_3']) + +In [21]: result = pd.concat([df1, s2, s2, s2], axis=1) +``` + +![merging_concat_unnamed_series](https://static.pypandas.cn/public/static/images/merging_concat_unnamed_series.png) + +Passing ``ignore_index=True`` will drop all name references. + +``` python +In [22]: result = pd.concat([df1, s1], axis=1, ignore_index=True) +``` + +![merging_concat_series_ignore_index](https://static.pypandas.cn/public/static/images/merging_concat_series_ignore_index.png) + +### More concatenating with group keys + +A fairly common use of the ``keys`` argument is to override the column names +when creating a new ``DataFrame`` based on existing ``Series``. +Notice how the default behaviour consists on letting the resulting ``DataFrame`` +inherit the parent ``Series``’ name, when these existed. + +``` python +In [23]: s3 = pd.Series([0, 1, 2, 3], name='foo') + +In [24]: s4 = pd.Series([0, 1, 2, 3]) + +In [25]: s5 = pd.Series([0, 1, 4, 5]) + +In [26]: pd.concat([s3, s4, s5], axis=1) +Out[26]: + foo 0 1 +0 0 0 0 +1 1 1 1 +2 2 2 4 +3 3 3 5 +``` + +Through the ``keys`` argument we can override the existing column names. + +``` python +In [27]: pd.concat([s3, s4, s5], axis=1, keys=['red', 'blue', 'yellow']) +Out[27]: + red blue yellow +0 0 0 0 +1 1 1 1 +2 2 2 4 +3 3 3 5 +``` + +Let’s consider a variation of the very first example presented: + +``` python +In [28]: result = pd.concat(frames, keys=['x', 'y', 'z']) +``` + +![merging_concat_group_keys2](https://static.pypandas.cn/public/static/images/merging_concat_group_keys2.png) + +You can also pass a dict to ``concat`` in which case the dict keys will be used +for the ``keys`` argument (unless other keys are specified): + +``` python +In [29]: pieces = {'x': df1, 'y': df2, 'z': df3} + +In [30]: result = pd.concat(pieces) +``` + +![merging_concat_dict](https://static.pypandas.cn/public/static/images/merging_concat_dict.png) + +``` python +In [31]: result = pd.concat(pieces, keys=['z', 'y']) +``` + +![merging_concat_dict_keys](https://static.pypandas.cn/public/static/images/merging_concat_dict_keys.png) + +The MultiIndex created has levels that are constructed from the passed keys and +the index of the ``DataFrame`` pieces: + +``` python +In [32]: result.index.levels +Out[32]: FrozenList([['z', 'y'], [4, 5, 6, 7, 8, 9, 10, 11]]) +``` + +If you wish to specify other levels (as will occasionally be the case), you can +do so using the ``levels`` argument: + +``` python +In [33]: result = pd.concat(pieces, keys=['x', 'y', 'z'], + ....: levels=[['z', 'y', 'x', 'w']], + ....: names=['group_key']) + ....: +``` + +![merging_concat_dict_keys_names](https://static.pypandas.cn/public/static/images/merging_concat_dict_keys_names.png) + +``` python +In [34]: result.index.levels +Out[34]: FrozenList([['z', 'y', 'x', 'w'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) +``` + +This is fairly esoteric, but it is actually necessary for implementing things +like GroupBy where the order of a categorical variable is meaningful. + +### Appending rows to a DataFrame + +While not especially efficient (since a new object must be created), you can +append a single row to a ``DataFrame`` by passing a ``Series`` or dict to +``append``, which returns a new ``DataFrame`` as above. + +``` python +In [35]: s2 = pd.Series(['X0', 'X1', 'X2', 'X3'], index=['A', 'B', 'C', 'D']) + +In [36]: result = df1.append(s2, ignore_index=True) +``` + +![merging_append_series_as_row](https://static.pypandas.cn/public/static/images/merging_append_series_as_row.png) + +You should use ``ignore_index`` with this method to instruct DataFrame to +discard its index. If you wish to preserve the index, you should construct an +appropriately-indexed DataFrame and append or concatenate those objects. + +You can also pass a list of dicts or Series: + +``` python +In [37]: dicts = [{'A': 1, 'B': 2, 'C': 3, 'X': 4}, + ....: {'A': 5, 'B': 6, 'C': 7, 'Y': 8}] + ....: + +In [38]: result = df1.append(dicts, ignore_index=True, sort=False) +``` + +![merging_append_dits](https://static.pypandas.cn/public/static/images/merging_append_dits.png) + +## Database-style DataFrame or named Series joining/merging + +pandas has full-featured, **high performance** in-memory join operations +idiomatically very similar to relational databases like SQL. These methods +perform significantly better (in some cases well over an order of magnitude +better) than other open source implementations (like ``base::merge.data.frame`` +in R). The reason for this is careful algorithmic design and the internal layout +of the data in ``DataFrame``. + +See the [cookbook](cookbook.html#cookbook-merge) for some advanced strategies. + +Users who are familiar with SQL but new to pandas might be interested in a +[comparison with SQL](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join). + +pandas provides a single function, [``merge()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html#pandas.merge), as the entry point for +all standard database join operations between ``DataFrame`` or named ``Series`` objects: + +``` python +pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, + left_index=False, right_index=False, sort=True, + suffixes=('_x', '_y'), copy=True, indicator=False, + validate=None) +``` + +- ``left``: A DataFrame or named Series object. +- ``right``: Another DataFrame or named Series object. +- ``on``: Column or index level names to join on. Must be found in both the left +and right DataFrame and/or Series objects. If not passed and ``left_index`` and +``right_index`` are ``False``, the intersection of the columns in the +DataFrames and/or Series will be inferred to be the join keys. +- ``left_on``: Columns or index levels from the left DataFrame or Series to use as +keys. Can either be column names, index level names, or arrays with length +equal to the length of the DataFrame or Series. +- ``right_on``: Columns or index levels from the right DataFrame or Series to use as +keys. Can either be column names, index level names, or arrays with length +equal to the length of the DataFrame or Series. +- ``left_index``: If ``True``, use the index (row labels) from the left +DataFrame or Series as its join key(s). In the case of a DataFrame or Series with a MultiIndex +(hierarchical), the number of levels must match the number of join keys +from the right DataFrame or Series. +- ``right_index``: Same usage as ``left_index`` for the right DataFrame or Series +- ``how``: One of ``'left'``, ``'right'``, ``'outer'``, ``'inner'``. Defaults +to ``inner``. See below for more detailed description of each method. +- ``sort``: Sort the result DataFrame by the join keys in lexicographical +order. Defaults to ``True``, setting to ``False`` will improve performance +substantially in many cases. +- ``suffixes``: A tuple of string suffixes to apply to overlapping +columns. Defaults to ``('_x', '_y')``. +- ``copy``: Always copy data (default ``True``) from the passed DataFrame or named Series +objects, even when reindexing is not necessary. Cannot be avoided in many +cases but may improve performance / memory usage. The cases where copying +can be avoided are somewhat pathological but this option is provided +nonetheless. +- ``indicator``: Add a column to the output DataFrame called ``_merge`` +with information on the source of each row. ``_merge`` is Categorical-type +and takes on a value of ``left_only`` for observations whose merge key +only appears in ``'left'`` DataFrame or Series, ``right_only`` for observations whose +merge key only appears in ``'right'`` DataFrame or Series, and ``both`` if the +observation’s merge key is found in both. +- ``validate`` : string, default None. +If specified, checks if merge is of specified type. + + - “one_to_one” or “1:1”: checks if merge keys are unique in both + left and right datasets. + - “one_to_many” or “1:m”: checks if merge keys are unique in left + dataset. + - “many_to_one” or “m:1”: checks if merge keys are unique in right + dataset. + - “many_to_many” or “m:m”: allowed, but does not result in checks. + +*New in version 0.21.0.* + +::: tip Note + +Support for specifying index levels as the ``on``, ``left_on``, and +``right_on`` parameters was added in version 0.23.0. +Support for merging named ``Series`` objects was added in version 0.24.0. + +::: + +The return type will be the same as ``left``. If ``left`` is a ``DataFrame`` or named ``Series`` +and ``right`` is a subclass of ``DataFrame``, the return type will still be ``DataFrame``. + +``merge`` is a function in the pandas namespace, and it is also available as a +``DataFrame`` instance method [``merge()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge), with the calling +``DataFrame`` being implicitly considered the left object in the join. + +The related [``join()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join) method, uses ``merge`` internally for the +index-on-index (by default) and column(s)-on-index join. If you are joining on +index only, you may wish to use ``DataFrame.join`` to save yourself some typing. + +### Brief primer on merge methods (relational algebra) + +Experienced users of relational databases like SQL will be familiar with the +terminology used to describe join operations between two SQL-table like +structures (``DataFrame`` objects). There are several cases to consider which +are very important to understand: + +- **one-to-one** joins: for example when joining two ``DataFrame`` objects on +their indexes (which must contain unique values). +- **many-to-one** joins: for example when joining an index (unique) to one or +more columns in a different ``DataFrame``. +- **many-to-many** joins: joining columns on columns. + +::: tip Note + +When joining columns on columns (potentially a many-to-many join), any +indexes on the passed ``DataFrame`` objects **will be discarded**. + +::: + +It is worth spending some time understanding the result of the **many-to-many** +join case. In SQL / standard relational algebra, if a key combination appears +more than once in both tables, the resulting table will have the **Cartesian +product** of the associated data. Here is a very basic example with one unique +key combination: + +``` python +In [39]: left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], + ....: 'A': ['A0', 'A1', 'A2', 'A3'], + ....: 'B': ['B0', 'B1', 'B2', 'B3']}) + ....: + +In [40]: right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], + ....: 'C': ['C0', 'C1', 'C2', 'C3'], + ....: 'D': ['D0', 'D1', 'D2', 'D3']}) + ....: + +In [41]: result = pd.merge(left, right, on='key') +``` + +![merging_merge_on_key](https://static.pypandas.cn/public/static/images/merging_merge_on_key.png) + +Here is a more complicated example with multiple join keys. Only the keys +appearing in ``left`` and ``right`` are present (the intersection), since +``how='inner'`` by default. + +``` python +In [42]: left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], + ....: 'key2': ['K0', 'K1', 'K0', 'K1'], + ....: 'A': ['A0', 'A1', 'A2', 'A3'], + ....: 'B': ['B0', 'B1', 'B2', 'B3']}) + ....: + +In [43]: right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'], + ....: 'key2': ['K0', 'K0', 'K0', 'K0'], + ....: 'C': ['C0', 'C1', 'C2', 'C3'], + ....: 'D': ['D0', 'D1', 'D2', 'D3']}) + ....: + +In [44]: result = pd.merge(left, right, on=['key1', 'key2']) +``` + +![merging_merge_on_key_multiple](https://static.pypandas.cn/public/static/images/merging_merge_on_key_multiple.png) + +The ``how`` argument to ``merge`` specifies how to determine which keys are to +be included in the resulting table. If a key combination **does not appear** in +either the left or right tables, the values in the joined table will be +``NA``. Here is a summary of the ``how`` options and their SQL equivalent names: + +Merge method | SQL Join Name | Description +---|---|--- +left | LEFT OUTER JOIN | Use keys from left frame only +right | RIGHT OUTER JOIN | Use keys from right frame only +outer | FULL OUTER JOIN | Use union of keys from both frames +inner | INNER JOIN | Use intersection of keys from both frames + +``` python +In [45]: result = pd.merge(left, right, how='left', on=['key1', 'key2']) +``` + +![merging_merge_on_key_left](https://static.pypandas.cn/public/static/images/merging_merge_on_key_left.png) + +``` python +In [46]: result = pd.merge(left, right, how='right', on=['key1', 'key2']) +``` + +![merging_merge_on_key_right](https://static.pypandas.cn/public/static/images/merging_merge_on_key_right.png) + +``` python +In [47]: result = pd.merge(left, right, how='outer', on=['key1', 'key2']) +``` + +![merging_merge_on_key_outer](https://static.pypandas.cn/public/static/images/merging_merge_on_key_outer.png) + +``` python +In [48]: result = pd.merge(left, right, how='inner', on=['key1', 'key2']) +``` + +![merging_merge_on_key_inner](https://static.pypandas.cn/public/static/images/merging_merge_on_key_inner.png) + +Here is another example with duplicate join keys in DataFrames: + +``` python +In [49]: left = pd.DataFrame({'A': [1, 2], 'B': [2, 2]}) + +In [50]: right = pd.DataFrame({'A': [4, 5, 6], 'B': [2, 2, 2]}) + +In [51]: result = pd.merge(left, right, on='B', how='outer') +``` + +![merging_merge_on_key_dup](https://static.pypandas.cn/public/static/images/merging_merge_on_key_dup.png) + +::: danger Warning + +Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions, which may result in memory overflow. It is the user’ s responsibility to manage duplicate values in keys before joining large DataFrames. + +::: + +### Checking for duplicate keys + +*New in version 0.21.0.* + +Users can use the ``validate`` argument to automatically check whether there +are unexpected duplicates in their merge keys. Key uniqueness is checked before +merge operations and so should protect against memory overflows. Checking key +uniqueness is also a good way to ensure user data structures are as expected. + +In the following example, there are duplicate values of ``B`` in the right +``DataFrame``. As this is not a one-to-one merge – as specified in the +``validate`` argument – an exception will be raised. + +``` python +In [52]: left = pd.DataFrame({'A' : [1,2], 'B' : [1, 2]}) + +In [53]: right = pd.DataFrame({'A' : [4,5,6], 'B': [2, 2, 2]}) +``` + +``` python +In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one") +... +MergeError: Merge keys are not unique in right dataset; not a one-to-one merge +``` + +If the user is aware of the duplicates in the right ``DataFrame`` but wants to +ensure there are no duplicates in the left DataFrame, one can use the +``validate='one_to_many'`` argument instead, which will not raise an exception. + +``` python +In [54]: pd.merge(left, right, on='B', how='outer', validate="one_to_many") +Out[54]: + A_x B A_y +0 1 1 NaN +1 2 2 4.0 +2 2 2 5.0 +3 2 2 6.0 +``` + +### The merge indicator + +[``merge()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html#pandas.merge) accepts the argument ``indicator``. If ``True``, a +Categorical-type column called ``_merge`` will be added to the output object +that takes on values: + +Observation Origin | _merge value +---|--- +Merge key only in 'left' frame | left_only +Merge key only in 'right' frame | right_only +Merge key in both frames | both + +``` python +In [55]: df1 = pd.DataFrame({'col1': [0, 1], 'col_left': ['a', 'b']}) + +In [56]: df2 = pd.DataFrame({'col1': [1, 2, 2], 'col_right': [2, 2, 2]}) + +In [57]: pd.merge(df1, df2, on='col1', how='outer', indicator=True) +Out[57]: + col1 col_left col_right _merge +0 0 a NaN left_only +1 1 b 2.0 both +2 2 NaN 2.0 right_only +3 2 NaN 2.0 right_only +``` + +The ``indicator`` argument will also accept string arguments, in which case the indicator function will use the value of the passed string as the name for the indicator column. + +``` python +In [58]: pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column') +Out[58]: + col1 col_left col_right indicator_column +0 0 a NaN left_only +1 1 b 2.0 both +2 2 NaN 2.0 right_only +3 2 NaN 2.0 right_only +``` + +### Merge dtypes + +*New in version 0.19.0.* + +Merging will preserve the dtype of the join keys. + +``` python +In [59]: left = pd.DataFrame({'key': [1], 'v1': [10]}) + +In [60]: left +Out[60]: + key v1 +0 1 10 + +In [61]: right = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]}) + +In [62]: right +Out[62]: + key v1 +0 1 20 +1 2 30 +``` + +We are able to preserve the join keys: + +``` python +In [63]: pd.merge(left, right, how='outer') +Out[63]: + key v1 +0 1 10 +1 1 20 +2 2 30 + +In [64]: pd.merge(left, right, how='outer').dtypes +Out[64]: +key int64 +v1 int64 +dtype: object +``` + +Of course if you have missing values that are introduced, then the +resulting dtype will be upcast. + +``` python +In [65]: pd.merge(left, right, how='outer', on='key') +Out[65]: + key v1_x v1_y +0 1 10.0 20 +1 2 NaN 30 + +In [66]: pd.merge(left, right, how='outer', on='key').dtypes +Out[66]: +key int64 +v1_x float64 +v1_y int64 +dtype: object +``` + +*New in version 0.20.0.* + +Merging will preserve ``category`` dtypes of the mergands. See also the section on [categoricals](categorical.html#categorical-merge). + +The left frame. + +``` python +In [67]: from pandas.api.types import CategoricalDtype + +In [68]: X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,))) + +In [69]: X = X.astype(CategoricalDtype(categories=['foo', 'bar'])) + +In [70]: left = pd.DataFrame({'X': X, + ....: 'Y': np.random.choice(['one', 'two', 'three'], + ....: size=(10,))}) + ....: + +In [71]: left +Out[71]: + X Y +0 bar one +1 foo one +2 foo three +3 bar three +4 foo one +5 bar one +6 bar three +7 bar three +8 bar three +9 foo three + +In [72]: left.dtypes +Out[72]: +X category +Y object +dtype: object +``` + +The right frame. + +``` python +In [73]: right = pd.DataFrame({'X': pd.Series(['foo', 'bar'], + ....: dtype=CategoricalDtype(['foo', 'bar'])), + ....: 'Z': [1, 2]}) + ....: + +In [74]: right +Out[74]: + X Z +0 foo 1 +1 bar 2 + +In [75]: right.dtypes +Out[75]: +X category +Z int64 +dtype: object +``` + +The merged result: + +``` python +In [76]: result = pd.merge(left, right, how='outer') + +In [77]: result +Out[77]: + X Y Z +0 bar one 2 +1 bar three 2 +2 bar one 2 +3 bar three 2 +4 bar three 2 +5 bar three 2 +6 foo one 1 +7 foo three 1 +8 foo one 1 +9 foo three 1 + +In [78]: result.dtypes +Out[78]: +X category +Y object +Z int64 +dtype: object +``` + +::: tip Note + +The category dtypes must be *exactly* the same, meaning the same categories and the ordered attribute. +Otherwise the result will coerce to ``object`` dtype. + +::: + +::: tip Note + +Merging on ``category`` dtypes that are the same can be quite performant compared to ``object`` dtype merging. + +::: + +### Joining on index + +[``DataFrame.join()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join) is a convenient method for combining the columns of two +potentially differently-indexed ``DataFrames`` into a single result +``DataFrame``. Here is a very basic example: + +``` python +In [79]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], + ....: 'B': ['B0', 'B1', 'B2']}, + ....: index=['K0', 'K1', 'K2']) + ....: + +In [80]: right = pd.DataFrame({'C': ['C0', 'C2', 'C3'], + ....: 'D': ['D0', 'D2', 'D3']}, + ....: index=['K0', 'K2', 'K3']) + ....: + +In [81]: result = left.join(right) +``` + +![merging_join](https://static.pypandas.cn/public/static/images/merging_join.png) + +``` python +In [82]: result = left.join(right, how='outer') +``` + +![merging_join_outer](https://static.pypandas.cn/public/static/images/merging_join_outer.png) + +The same as above, but with ``how='inner'``. + +``` python +In [83]: result = left.join(right, how='inner') +``` + +![merging_join_inner](https://static.pypandas.cn/public/static/images/merging_join_inner.png) + +The data alignment here is on the indexes (row labels). This same behavior can +be achieved using ``merge`` plus additional arguments instructing it to use the +indexes: + +``` python +In [84]: result = pd.merge(left, right, left_index=True, right_index=True, how='outer') +``` + +![merging_merge_index_outer](https://static.pypandas.cn/public/static/images/merging_merge_index_outer.png) + +``` python +In [85]: result = pd.merge(left, right, left_index=True, right_index=True, how='inner'); +``` + +![merging_merge_index_inner](https://static.pypandas.cn/public/static/images/merging_merge_index_inner.png) + +### Joining key columns on an index + +[``join()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join) takes an optional ``on`` argument which may be a column +or multiple column names, which specifies that the passed ``DataFrame`` is to be +aligned on that column in the ``DataFrame``. These two function calls are +completely equivalent: + +``` python +left.join(right, on=key_or_keys) +pd.merge(left, right, left_on=key_or_keys, right_index=True, + how='left', sort=False) +``` + +Obviously you can choose whichever form you find more convenient. For +many-to-one joins (where one of the ``DataFrame``’s is already indexed by the +join key), using ``join`` may be more convenient. Here is a simple example: + +``` python +In [86]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], + ....: 'B': ['B0', 'B1', 'B2', 'B3'], + ....: 'key': ['K0', 'K1', 'K0', 'K1']}) + ....: + +In [87]: right = pd.DataFrame({'C': ['C0', 'C1'], + ....: 'D': ['D0', 'D1']}, + ....: index=['K0', 'K1']) + ....: + +In [88]: result = left.join(right, on='key') +``` + +![merging_join_key_columns](https://static.pypandas.cn/public/static/images/merging_join_key_columns.png) + +``` python +In [89]: result = pd.merge(left, right, left_on='key', right_index=True, + ....: how='left', sort=False); + ....: +``` + +![merging_merge_key_columns](https://static.pypandas.cn/public/static/images/merging_merge_key_columns.png) + +To join on multiple keys, the passed DataFrame must have a ``MultiIndex``: + +``` python +In [90]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], + ....: 'B': ['B0', 'B1', 'B2', 'B3'], + ....: 'key1': ['K0', 'K0', 'K1', 'K2'], + ....: 'key2': ['K0', 'K1', 'K0', 'K1']}) + ....: + +In [91]: index = pd.MultiIndex.from_tuples([('K0', 'K0'), ('K1', 'K0'), + ....: ('K2', 'K0'), ('K2', 'K1')]) + ....: + +In [92]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], + ....: 'D': ['D0', 'D1', 'D2', 'D3']}, + ....: index=index) + ....: +``` + +Now this can be joined by passing the two key column names: + +``` python +In [93]: result = left.join(right, on=['key1', 'key2']) +``` + +![merging_join_multikeys](https://static.pypandas.cn/public/static/images/merging_join_multikeys.png) + +The default for ``DataFrame.join`` is to perform a left join (essentially a +“VLOOKUP” operation, for Excel users), which uses only the keys found in the +calling DataFrame. Other join types, for example inner join, can be just as +easily performed: + +``` python +In [94]: result = left.join(right, on=['key1', 'key2'], how='inner') +``` + +![merging_join_multikeys_inner](https://static.pypandas.cn/public/static/images/merging_join_multikeys_inner.png) + +As you can see, this drops any rows where there was no match. + +### Joining a single Index to a MultiIndex + +You can join a singly-indexed ``DataFrame`` with a level of a MultiIndexed ``DataFrame``. +The level will match on the name of the index of the singly-indexed frame against +a level name of the MultiIndexed frame. + +``` python +In [95]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], + ....: 'B': ['B0', 'B1', 'B2']}, + ....: index=pd.Index(['K0', 'K1', 'K2'], name='key')) + ....: + +In [96]: index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), + ....: ('K2', 'Y2'), ('K2', 'Y3')], + ....: names=['key', 'Y']) + ....: + +In [97]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], + ....: 'D': ['D0', 'D1', 'D2', 'D3']}, + ....: index=index) + ....: + +In [98]: result = left.join(right, how='inner') +``` + +![merging_join_multiindex_inner](https://static.pypandas.cn/public/static/images/merging_join_multiindex_inner.png) + +This is equivalent but less verbose and more memory efficient / faster than this. + +``` python +In [99]: result = pd.merge(left.reset_index(), right.reset_index(), + ....: on=['key'], how='inner').set_index(['key','Y']) + ....: +``` + +![merging_merge_multiindex_alternative](https://static.pypandas.cn/public/static/images/merging_merge_multiindex_alternative.png) + +### Joining with two MultiIndexes + +This is supported in a limited way, provided that the index for the right +argument is completely used in the join, and is a subset of the indices in +the left argument, as in this example: + +``` python +In [100]: leftindex = pd.MultiIndex.from_product([list('abc'), list('xy'), [1, 2]], + .....: names=['abc', 'xy', 'num']) + .....: + +In [101]: left = pd.DataFrame({'v1': range(12)}, index=leftindex) + +In [102]: left +Out[102]: + v1 +abc xy num +a x 1 0 + 2 1 + y 1 2 + 2 3 +b x 1 4 + 2 5 + y 1 6 + 2 7 +c x 1 8 + 2 9 + y 1 10 + 2 11 + +In [103]: rightindex = pd.MultiIndex.from_product([list('abc'), list('xy')], + .....: names=['abc', 'xy']) + .....: + +In [104]: right = pd.DataFrame({'v2': [100 * i for i in range(1, 7)]}, index=rightindex) + +In [105]: right +Out[105]: + v2 +abc xy +a x 100 + y 200 +b x 300 + y 400 +c x 500 + y 600 + +In [106]: left.join(right, on=['abc', 'xy'], how='inner') +Out[106]: + v1 v2 +abc xy num +a x 1 0 100 + 2 1 100 + y 1 2 200 + 2 3 200 +b x 1 4 300 + 2 5 300 + y 1 6 400 + 2 7 400 +c x 1 8 500 + 2 9 500 + y 1 10 600 + 2 11 600 +``` + +If that condition is not satisfied, a join with two multi-indexes can be +done using the following code. + +``` python +In [107]: leftindex = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'), + .....: ('K1', 'X2')], + .....: names=['key', 'X']) + .....: + +In [108]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2'], + .....: 'B': ['B0', 'B1', 'B2']}, + .....: index=leftindex) + .....: + +In [109]: rightindex = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), + .....: ('K2', 'Y2'), ('K2', 'Y3')], + .....: names=['key', 'Y']) + .....: + +In [110]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], + .....: 'D': ['D0', 'D1', 'D2', 'D3']}, + .....: index=rightindex) + .....: + +In [111]: result = pd.merge(left.reset_index(), right.reset_index(), + .....: on=['key'], how='inner').set_index(['key', 'X', 'Y']) + .....: +``` + +![merging_merge_two_multiindex](https://static.pypandas.cn/public/static/images/merging_merge_two_multiindex.png) + +### Merging on a combination of columns and index levels + +*New in version 0.23.* + +Strings passed as the ``on``, ``left_on``, and ``right_on`` parameters +may refer to either column names or index level names. This enables merging +``DataFrame`` instances on a combination of index levels and columns without +resetting indexes. + +``` python +In [112]: left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1') + +In [113]: left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], + .....: 'B': ['B0', 'B1', 'B2', 'B3'], + .....: 'key2': ['K0', 'K1', 'K0', 'K1']}, + .....: index=left_index) + .....: + +In [114]: right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1') + +In [115]: right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], + .....: 'D': ['D0', 'D1', 'D2', 'D3'], + .....: 'key2': ['K0', 'K0', 'K0', 'K1']}, + .....: index=right_index) + .....: + +In [116]: result = left.merge(right, on=['key1', 'key2']) +``` + +![merge_on_index_and_column](https://static.pypandas.cn/public/static/images/merge_on_index_and_column.png) + +::: tip Note + +When DataFrames are merged on a string that matches an index level in both +frames, the index level is preserved as an index level in the resulting +DataFrame. + +::: + +::: tip Note + +When DataFrames are merged using only some of the levels of a *MultiIndex*, +the extra levels will be dropped from the resulting merge. In order to +preserve those levels, use ``reset_index`` on those level names to move +those levels to columns prior to doing the merge. + +::: + +::: tip Note + +If a string matches both a column name and an index level name, then a +warning is issued and the column takes precedence. This will result in an +ambiguity error in a future version. + +::: + +### Overlapping value columns + +The merge ``suffixes`` argument takes a tuple of list of strings to append to +overlapping column names in the input ``DataFrame``s to disambiguate the result +columns: + +``` python +In [117]: left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]}) + +In [118]: right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]}) + +In [119]: result = pd.merge(left, right, on='k') +``` + +![merging_merge_overlapped](https://static.pypandas.cn/public/static/images/merging_merge_overlapped.png) + +``` python +In [120]: result = pd.merge(left, right, on='k', suffixes=['_l', '_r']) +``` + +![merging_merge_overlapped_suffix](https://static.pypandas.cn/public/static/images/merging_merge_overlapped_suffix.png) + +[``DataFrame.join()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join) has ``lsuffix`` and ``rsuffix`` arguments which behave +similarly. + +``` python +In [121]: left = left.set_index('k') + +In [122]: right = right.set_index('k') + +In [123]: result = left.join(right, lsuffix='_l', rsuffix='_r') +``` + +![merging_merge_overlapped_multi_suffix](https://static.pypandas.cn/public/static/images/merging_merge_overlapped_multi_suffix.png) + +### Joining multiple DataFrames + +A list or tuple of ``DataFrames`` can also be passed to [``join()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html#pandas.DataFrame.join) +to join them together on their indexes. + +``` python +In [124]: right2 = pd.DataFrame({'v': [7, 8, 9]}, index=['K1', 'K1', 'K2']) + +In [125]: result = left.join([right, right2]) +``` + +![merging_join_multi_df](https://static.pypandas.cn/public/static/images/merging_join_multi_df.png) + +### Merging together values within Series or DataFrame columns + +Another fairly common situation is to have two like-indexed (or similarly +indexed) ``Series`` or ``DataFrame`` objects and wanting to “patch” values in +one object from values for matching indices in the other. Here is an example: + +``` python +In [126]: df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, np.nan, np.nan], + .....: [np.nan, 7., np.nan]]) + .....: + +In [127]: df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]], + .....: index=[1, 2]) + .....: +``` + +For this, use the [``combine_first()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine_first.html#pandas.DataFrame.combine_first) method: + +``` python +In [128]: result = df1.combine_first(df2) +``` + +![merging_combine_first](https://static.pypandas.cn/public/static/images/merging_combine_first.png) + +Note that this method only takes values from the right ``DataFrame`` if they are +missing in the left ``DataFrame``. A related method, [``update()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html#pandas.DataFrame.update), +alters non-NA values in place: + +``` python +In [129]: df1.update(df2) +``` + +![merging_update](https://static.pypandas.cn/public/static/images/merging_update.png) + +## Timeseries friendly merging + +### Merging ordered data + +A [``merge_ordered()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_ordered.html#pandas.merge_ordered) function allows combining time series and other +ordered data. In particular it has an optional ``fill_method`` keyword to +fill/interpolate missing data: + +``` python +In [130]: left = pd.DataFrame({'k': ['K0', 'K1', 'K1', 'K2'], + .....: 'lv': [1, 2, 3, 4], + .....: 's': ['a', 'b', 'c', 'd']}) + .....: + +In [131]: right = pd.DataFrame({'k': ['K1', 'K2', 'K4'], + .....: 'rv': [1, 2, 3]}) + .....: + +In [132]: pd.merge_ordered(left, right, fill_method='ffill', left_by='s') +Out[132]: + k lv s rv +0 K0 1.0 a NaN +1 K1 1.0 a 1.0 +2 K2 1.0 a 2.0 +3 K4 1.0 a 3.0 +4 K1 2.0 b 1.0 +5 K2 2.0 b 2.0 +6 K4 2.0 b 3.0 +7 K1 3.0 c 1.0 +8 K2 3.0 c 2.0 +9 K4 3.0 c 3.0 +10 K1 NaN d 1.0 +11 K2 4.0 d 2.0 +12 K4 4.0 d 3.0 +``` + +### Merging asof + +*New in version 0.19.0.* + +A [``merge_asof()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html#pandas.merge_asof) is similar to an ordered left-join except that we match on +nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``, +we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less +than the left’s key. Both DataFrames must be sorted by the key. + +Optionally an asof merge can perform a group-wise merge. This matches the +``by`` key equally, in addition to the nearest match on the ``on`` key. + +For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` +merge them. + +``` python +In [133]: trades = pd.DataFrame({ + .....: 'time': pd.to_datetime(['20160525 13:30:00.023', + .....: '20160525 13:30:00.038', + .....: '20160525 13:30:00.048', + .....: '20160525 13:30:00.048', + .....: '20160525 13:30:00.048']), + .....: 'ticker': ['MSFT', 'MSFT', + .....: 'GOOG', 'GOOG', 'AAPL'], + .....: 'price': [51.95, 51.95, + .....: 720.77, 720.92, 98.00], + .....: 'quantity': [75, 155, + .....: 100, 100, 100]}, + .....: columns=['time', 'ticker', 'price', 'quantity']) + .....: + +In [134]: quotes = pd.DataFrame({ + .....: 'time': pd.to_datetime(['20160525 13:30:00.023', + .....: '20160525 13:30:00.023', + .....: '20160525 13:30:00.030', + .....: '20160525 13:30:00.041', + .....: '20160525 13:30:00.048', + .....: '20160525 13:30:00.049', + .....: '20160525 13:30:00.072', + .....: '20160525 13:30:00.075']), + .....: 'ticker': ['GOOG', 'MSFT', 'MSFT', + .....: 'MSFT', 'GOOG', 'AAPL', 'GOOG', + .....: 'MSFT'], + .....: 'bid': [720.50, 51.95, 51.97, 51.99, + .....: 720.50, 97.99, 720.50, 52.01], + .....: 'ask': [720.93, 51.96, 51.98, 52.00, + .....: 720.93, 98.01, 720.88, 52.03]}, + .....: columns=['time', 'ticker', 'bid', 'ask']) + .....: +``` + +``` python +In [135]: trades +Out[135]: + time ticker price quantity +0 2016-05-25 13:30:00.023 MSFT 51.95 75 +1 2016-05-25 13:30:00.038 MSFT 51.95 155 +2 2016-05-25 13:30:00.048 GOOG 720.77 100 +3 2016-05-25 13:30:00.048 GOOG 720.92 100 +4 2016-05-25 13:30:00.048 AAPL 98.00 100 + +In [136]: quotes +Out[136]: + time ticker bid ask +0 2016-05-25 13:30:00.023 GOOG 720.50 720.93 +1 2016-05-25 13:30:00.023 MSFT 51.95 51.96 +2 2016-05-25 13:30:00.030 MSFT 51.97 51.98 +3 2016-05-25 13:30:00.041 MSFT 51.99 52.00 +4 2016-05-25 13:30:00.048 GOOG 720.50 720.93 +5 2016-05-25 13:30:00.049 AAPL 97.99 98.01 +6 2016-05-25 13:30:00.072 GOOG 720.50 720.88 +7 2016-05-25 13:30:00.075 MSFT 52.01 52.03 +``` + +By default we are taking the asof of the quotes. + +``` python +In [137]: pd.merge_asof(trades, quotes, + .....: on='time', + .....: by='ticker') + .....: +Out[137]: + time ticker price quantity bid ask +0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 +1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 +2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 +3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 +4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN +``` + +We only asof within ``2ms`` between the quote time and the trade time. + +``` python +In [138]: pd.merge_asof(trades, quotes, + .....: on='time', + .....: by='ticker', + .....: tolerance=pd.Timedelta('2ms')) + .....: +Out[138]: + time ticker price quantity bid ask +0 2016-05-25 13:30:00.023 MSFT 51.95 75 51.95 51.96 +1 2016-05-25 13:30:00.038 MSFT 51.95 155 NaN NaN +2 2016-05-25 13:30:00.048 GOOG 720.77 100 720.50 720.93 +3 2016-05-25 13:30:00.048 GOOG 720.92 100 720.50 720.93 +4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN +``` + +We only asof within ``10ms`` between the quote time and the trade time and we +exclude exact matches on time. Note that though we exclude the exact matches +(of the quotes), prior quotes **do** propagate to that point in time. + +``` python +In [139]: pd.merge_asof(trades, quotes, + .....: on='time', + .....: by='ticker', + .....: tolerance=pd.Timedelta('10ms'), + .....: allow_exact_matches=False) + .....: +Out[139]: + time ticker price quantity bid ask +0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN +1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 +2 2016-05-25 13:30:00.048 GOOG 720.77 100 NaN NaN +3 2016-05-25 13:30:00.048 GOOG 720.92 100 NaN NaN +4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN +``` diff --git a/Python/pandas/user_guide/missing_data.md b/Python/pandas/user_guide/missing_data.md new file mode 100644 index 00000000..e1db65e3 --- /dev/null +++ b/Python/pandas/user_guide/missing_data.md @@ -0,0 +1,1477 @@ +# Working with missing data + +In this section, we will discuss missing (also referred to as NA) values in +pandas. + +::: tip Note + +The choice of using ``NaN`` internally to denote missing data was largely +for simplicity and performance reasons. It differs from the MaskedArray +approach of, for example, ``scikits.timeseries``. We are hopeful that +NumPy will soon be able to provide a native NA type solution (similar to R) +performant enough to be used in pandas. + +::: + +See the [cookbook](cookbook.html#cookbook-missing-data) for some advanced strategies. + +## Values considered “missing” + +As data comes in many shapes and forms, pandas aims to be flexible with regard +to handling missing data. While ``NaN`` is the default missing value marker for +reasons of computational speed and convenience, we need to be able to easily +detect this value with data of different types: floating point, integer, +boolean, and general object. In many cases, however, the Python ``None`` will +arise and we wish to also consider that “missing” or “not available” or “NA”. + +::: tip Note + +If you want to consider ``inf`` and ``-inf`` to be “NA” in computations, +you can set ``pandas.options.mode.use_inf_as_na = True``. + +::: + +``` python +In [1]: df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f', 'h'], + ...: columns=['one', 'two', 'three']) + ...: + +In [2]: df['four'] = 'bar' + +In [3]: df['five'] = df['one'] > 0 + +In [4]: df +Out[4]: + one two three four five +a 0.469112 -0.282863 -1.509059 bar True +c -1.135632 1.212112 -0.173215 bar False +e 0.119209 -1.044236 -0.861849 bar True +f -2.104569 -0.494929 1.071804 bar False +h 0.721555 -0.706771 -1.039575 bar True + +In [5]: df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) + +In [6]: df2 +Out[6]: + one two three four five +a 0.469112 -0.282863 -1.509059 bar True +b NaN NaN NaN NaN NaN +c -1.135632 1.212112 -0.173215 bar False +d NaN NaN NaN NaN NaN +e 0.119209 -1.044236 -0.861849 bar True +f -2.104569 -0.494929 1.071804 bar False +g NaN NaN NaN NaN NaN +h 0.721555 -0.706771 -1.039575 bar True +``` + +To make detecting missing values easier (and across different array dtypes), +pandas provides the [``isna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isna.html#pandas.isna) and +[``notna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.notna.html#pandas.notna) functions, which are also methods on +Series and DataFrame objects: + +``` python +In [7]: df2['one'] +Out[7]: +a 0.469112 +b NaN +c -1.135632 +d NaN +e 0.119209 +f -2.104569 +g NaN +h 0.721555 +Name: one, dtype: float64 + +In [8]: pd.isna(df2['one']) +Out[8]: +a False +b True +c False +d True +e False +f False +g True +h False +Name: one, dtype: bool + +In [9]: df2['four'].notna() +Out[9]: +a True +b False +c True +d False +e True +f True +g False +h True +Name: four, dtype: bool + +In [10]: df2.isna() +Out[10]: + one two three four five +a False False False False False +b True True True True True +c False False False False False +d True True True True True +e False False False False False +f False False False False False +g True True True True True +h False False False False False +``` + +::: danger Warning + +One has to be mindful that in Python (and NumPy), the ``nan's`` don’t compare equal, but ``None's`` **do**. +Note that pandas/NumPy uses the fact that ``np.nan != np.nan``, and treats ``None`` like ``np.nan``. + +``` python +In [11]: None == None # noqa: E711 +Out[11]: True + +In [12]: np.nan == np.nan +Out[12]: False +``` + +So as compared to above, a scalar equality comparison versus a ``None/np.nan`` doesn’t provide useful information. + +``` python +In [13]: df2['one'] == np.nan +Out[13]: +a False +b False +c False +d False +e False +f False +g False +h False +Name: one, dtype: bool +``` + +::: + +### Integer dtypes and missing data + +Because ``NaN`` is a float, a column of integers with even one missing values +is cast to floating-point dtype (see [Support for integer NA](gotchas.html#gotchas-intna) for more). Pandas +provides a nullable integer array, which can be used by explicitly requesting +the dtype: + +``` python +In [14]: pd.Series([1, 2, np.nan, 4], dtype=pd.Int64Dtype()) +Out[14]: +0 1 +1 2 +2 NaN +3 4 +dtype: Int64 +``` + +Alternatively, the string alias ``dtype='Int64'`` (note the capital ``"I"``) can be +used. + +See [Nullable integer data type](integer_na.html#integer-na) for more. + +### Datetimes + +For datetime64[ns] types, ``NaT`` represents missing values. This is a pseudo-native +sentinel value that can be represented by NumPy in a singular dtype (datetime64[ns]). +pandas objects provide compatibility between ``NaT`` and ``NaN``. + +``` python +In [15]: df2 = df.copy() + +In [16]: df2['timestamp'] = pd.Timestamp('20120101') + +In [17]: df2 +Out[17]: + one two three four five timestamp +a 0.469112 -0.282863 -1.509059 bar True 2012-01-01 +c -1.135632 1.212112 -0.173215 bar False 2012-01-01 +e 0.119209 -1.044236 -0.861849 bar True 2012-01-01 +f -2.104569 -0.494929 1.071804 bar False 2012-01-01 +h 0.721555 -0.706771 -1.039575 bar True 2012-01-01 + +In [18]: df2.loc[['a', 'c', 'h'], ['one', 'timestamp']] = np.nan + +In [19]: df2 +Out[19]: + one two three four five timestamp +a NaN -0.282863 -1.509059 bar True NaT +c NaN 1.212112 -0.173215 bar False NaT +e 0.119209 -1.044236 -0.861849 bar True 2012-01-01 +f -2.104569 -0.494929 1.071804 bar False 2012-01-01 +h NaN -0.706771 -1.039575 bar True NaT + +In [20]: df2.dtypes.value_counts() +Out[20]: +float64 3 +bool 1 +datetime64[ns] 1 +object 1 +dtype: int64 +``` + +### Inserting missing data + +You can insert missing values by simply assigning to containers. The +actual missing value used will be chosen based on the dtype. + +For example, numeric containers will always use ``NaN`` regardless of +the missing value type chosen: + +``` python +In [21]: s = pd.Series([1, 2, 3]) + +In [22]: s.loc[0] = None + +In [23]: s +Out[23]: +0 NaN +1 2.0 +2 3.0 +dtype: float64 +``` + +Likewise, datetime containers will always use ``NaT``. + +For object containers, pandas will use the value given: + +``` python +In [24]: s = pd.Series(["a", "b", "c"]) + +In [25]: s.loc[0] = None + +In [26]: s.loc[1] = np.nan + +In [27]: s +Out[27]: +0 None +1 NaN +2 c +dtype: object +``` + +### Calculations with missing data + +Missing values propagate naturally through arithmetic operations between pandas +objects. + +``` python +In [28]: a +Out[28]: + one two +a NaN -0.282863 +c NaN 1.212112 +e 0.119209 -1.044236 +f -2.104569 -0.494929 +h -2.104569 -0.706771 + +In [29]: b +Out[29]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e 0.119209 -1.044236 -0.861849 +f -2.104569 -0.494929 1.071804 +h NaN -0.706771 -1.039575 + +In [30]: a + b +Out[30]: + one three two +a NaN NaN -0.565727 +c NaN NaN 2.424224 +e 0.238417 NaN -2.088472 +f -4.209138 NaN -0.989859 +h NaN NaN -1.413542 +``` + +The descriptive statistics and computational methods discussed in the +[data structure overview](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-stats) (and listed [here](https://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-stats) and [here](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#api-dataframe-stats)) are all written to +account for missing data. For example: + +- When summing data, NA (missing) values will be treated as zero. +- If the data are all NA, the result will be 0. +- Cumulative methods like [``cumsum()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumsum.html#pandas.DataFrame.cumsum) and [``cumprod()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.cumprod.html#pandas.DataFrame.cumprod) ignore NA values by default, but preserve them in the resulting arrays. To override this behaviour and include NA values, use ``skipna=False``. + +``` python +In [31]: df +Out[31]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e 0.119209 -1.044236 -0.861849 +f -2.104569 -0.494929 1.071804 +h NaN -0.706771 -1.039575 + +In [32]: df['one'].sum() +Out[32]: -1.9853605075978744 + +In [33]: df.mean(1) +Out[33]: +a -0.895961 +c 0.519449 +e -0.595625 +f -0.509232 +h -0.873173 +dtype: float64 + +In [34]: df.cumsum() +Out[34]: + one two three +a NaN -0.282863 -1.509059 +c NaN 0.929249 -1.682273 +e 0.119209 -0.114987 -2.544122 +f -1.985361 -0.609917 -1.472318 +h NaN -1.316688 -2.511893 + +In [35]: df.cumsum(skipna=False) +Out[35]: + one two three +a NaN -0.282863 -1.509059 +c NaN 0.929249 -1.682273 +e NaN -0.114987 -2.544122 +f NaN -0.609917 -1.472318 +h NaN -1.316688 -2.511893 +``` + +## Sum/prod of empties/nans + +::: danger Warning + +This behavior is now standard as of v0.22.0 and is consistent with the default in ``numpy``; previously sum/prod of all-NA or empty Series/DataFrames would return NaN. +See [v0.22.0 whatsnew](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.22.0.html#whatsnew-0220) for more. + +::: + +The sum of an empty or all-NA Series or column of a DataFrame is 0. + +``` python +In [36]: pd.Series([np.nan]).sum() +Out[36]: 0.0 + +In [37]: pd.Series([]).sum() +Out[37]: 0.0 +``` + +The product of an empty or all-NA Series or column of a DataFrame is 1. + +``` python +In [38]: pd.Series([np.nan]).prod() +Out[38]: 1.0 + +In [39]: pd.Series([]).prod() +Out[39]: 1.0 +``` + +## NA values in GroupBy + +NA groups in GroupBy are automatically excluded. This behavior is consistent +with R, for example: + +``` python +In [40]: df +Out[40]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e 0.119209 -1.044236 -0.861849 +f -2.104569 -0.494929 1.071804 +h NaN -0.706771 -1.039575 + +In [41]: df.groupby('one').mean() +Out[41]: + two three +one +-2.104569 -0.494929 1.071804 + 0.119209 -1.044236 -0.861849 +``` + +See the groupby section [here](groupby.html#groupby-missing) for more information. + +### Cleaning / filling missing data + +pandas objects are equipped with various data manipulation methods for dealing +with missing data. + +## Filling missing values: fillna + +[``fillna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna) can “fill in” NA values with non-NA data in a couple +of ways, which we illustrate: + +**Replace NA with a scalar value** + +``` python +In [42]: df2 +Out[42]: + one two three four five timestamp +a NaN -0.282863 -1.509059 bar True NaT +c NaN 1.212112 -0.173215 bar False NaT +e 0.119209 -1.044236 -0.861849 bar True 2012-01-01 +f -2.104569 -0.494929 1.071804 bar False 2012-01-01 +h NaN -0.706771 -1.039575 bar True NaT + +In [43]: df2.fillna(0) +Out[43]: + one two three four five timestamp +a 0.000000 -0.282863 -1.509059 bar True 0 +c 0.000000 1.212112 -0.173215 bar False 0 +e 0.119209 -1.044236 -0.861849 bar True 2012-01-01 00:00:00 +f -2.104569 -0.494929 1.071804 bar False 2012-01-01 00:00:00 +h 0.000000 -0.706771 -1.039575 bar True 0 + +In [44]: df2['one'].fillna('missing') +Out[44]: +a missing +c missing +e 0.119209 +f -2.10457 +h missing +Name: one, dtype: object +``` + +**Fill gaps forward or backward** + +Using the same filling arguments as [reindexing](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-reindexing), we +can propagate non-NA values forward or backward: + +``` python +In [45]: df +Out[45]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e 0.119209 -1.044236 -0.861849 +f -2.104569 -0.494929 1.071804 +h NaN -0.706771 -1.039575 + +In [46]: df.fillna(method='pad') +Out[46]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e 0.119209 -1.044236 -0.861849 +f -2.104569 -0.494929 1.071804 +h -2.104569 -0.706771 -1.039575 +``` + +**Limit the amount of filling** + +If we only want consecutive gaps filled up to a certain number of data points, +we can use the *limit* keyword: + +``` python +In [47]: df +Out[47]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e NaN NaN NaN +f NaN NaN NaN +h NaN -0.706771 -1.039575 + +In [48]: df.fillna(method='pad', limit=1) +Out[48]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e NaN 1.212112 -0.173215 +f NaN NaN NaN +h NaN -0.706771 -1.039575 +``` + +To remind you, these are the available filling methods: + +Method | Action +---|--- +pad / ffill | Fill values forward +bfill / backfill | Fill values backward + +With time series data, using pad/ffill is extremely common so that the “last +known value” is available at every time point. + +[``ffill()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ffill.html#pandas.DataFrame.ffill) is equivalent to ``fillna(method='ffill')`` +and [``bfill()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.bfill.html#pandas.DataFrame.bfill) is equivalent to ``fillna(method='bfill')`` + +## Filling with a PandasObject + +You can also fillna using a dict or Series that is alignable. The labels of the dict or index of the Series +must match the columns of the frame you wish to fill. The +use case of this is to fill a DataFrame with the mean of that column. + +``` python +In [49]: dff = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC')) + +In [50]: dff.iloc[3:5, 0] = np.nan + +In [51]: dff.iloc[4:6, 1] = np.nan + +In [52]: dff.iloc[5:8, 2] = np.nan + +In [53]: dff +Out[53]: + A B C +0 0.271860 -0.424972 0.567020 +1 0.276232 -1.087401 -0.673690 +2 0.113648 -1.478427 0.524988 +3 NaN 0.577046 -1.715002 +4 NaN NaN -1.157892 +5 -1.344312 NaN NaN +6 -0.109050 1.643563 NaN +7 0.357021 -0.674600 NaN +8 -0.968914 -1.294524 0.413738 +9 0.276662 -0.472035 -0.013960 + +In [54]: dff.fillna(dff.mean()) +Out[54]: + A B C +0 0.271860 -0.424972 0.567020 +1 0.276232 -1.087401 -0.673690 +2 0.113648 -1.478427 0.524988 +3 -0.140857 0.577046 -1.715002 +4 -0.140857 -0.401419 -1.157892 +5 -1.344312 -0.401419 -0.293543 +6 -0.109050 1.643563 -0.293543 +7 0.357021 -0.674600 -0.293543 +8 -0.968914 -1.294524 0.413738 +9 0.276662 -0.472035 -0.013960 + +In [55]: dff.fillna(dff.mean()['B':'C']) +Out[55]: + A B C +0 0.271860 -0.424972 0.567020 +1 0.276232 -1.087401 -0.673690 +2 0.113648 -1.478427 0.524988 +3 NaN 0.577046 -1.715002 +4 NaN -0.401419 -1.157892 +5 -1.344312 -0.401419 -0.293543 +6 -0.109050 1.643563 -0.293543 +7 0.357021 -0.674600 -0.293543 +8 -0.968914 -1.294524 0.413738 +9 0.276662 -0.472035 -0.013960 +``` + +Same result as above, but is aligning the ‘fill’ value which is +a Series in this case. + +``` python +In [56]: dff.where(pd.notna(dff), dff.mean(), axis='columns') +Out[56]: + A B C +0 0.271860 -0.424972 0.567020 +1 0.276232 -1.087401 -0.673690 +2 0.113648 -1.478427 0.524988 +3 -0.140857 0.577046 -1.715002 +4 -0.140857 -0.401419 -1.157892 +5 -1.344312 -0.401419 -0.293543 +6 -0.109050 1.643563 -0.293543 +7 0.357021 -0.674600 -0.293543 +8 -0.968914 -1.294524 0.413738 +9 0.276662 -0.472035 -0.013960 +``` + +## Dropping axis labels with missing data: dropna + +You may wish to simply exclude labels from a data set which refer to missing +data. To do this, use [``dropna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna): + +``` python +In [57]: df +Out[57]: + one two three +a NaN -0.282863 -1.509059 +c NaN 1.212112 -0.173215 +e NaN 0.000000 0.000000 +f NaN 0.000000 0.000000 +h NaN -0.706771 -1.039575 + +In [58]: df.dropna(axis=0) +Out[58]: +Empty DataFrame +Columns: [one, two, three] +Index: [] + +In [59]: df.dropna(axis=1) +Out[59]: + two three +a -0.282863 -1.509059 +c 1.212112 -0.173215 +e 0.000000 0.000000 +f 0.000000 0.000000 +h -0.706771 -1.039575 + +In [60]: df['one'].dropna() +Out[60]: Series([], Name: one, dtype: float64) +``` + +An equivalent [``dropna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dropna.html#pandas.Series.dropna) is available for Series. +DataFrame.dropna has considerably more options than Series.dropna, which can be +examined [in the API](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#api-dataframe-missing). + +## Interpolation + +*New in version 0.23.0:* The ``limit_area`` keyword argument was added. + +Both Series and DataFrame objects have [``interpolate()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html#pandas.DataFrame.interpolate) +that, by default, performs linear interpolation at missing data points. + +``` python +In [61]: ts +Out[61]: +2000-01-31 0.469112 +2000-02-29 NaN +2000-03-31 NaN +2000-04-28 NaN +2000-05-31 NaN + ... +2007-12-31 -6.950267 +2008-01-31 -7.904475 +2008-02-29 -6.441779 +2008-03-31 -8.184940 +2008-04-30 -9.011531 +Freq: BM, Length: 100, dtype: float64 + +In [62]: ts.count() +Out[62]: 66 + +In [63]: ts.plot() +Out[63]: +``` + +![series_before_interpolate](https://static.pypandas.cn/public/static/images/series_before_interpolate.png) + +``` python +In [64]: ts.interpolate() +Out[64]: +2000-01-31 0.469112 +2000-02-29 0.434469 +2000-03-31 0.399826 +2000-04-28 0.365184 +2000-05-31 0.330541 + ... +2007-12-31 -6.950267 +2008-01-31 -7.904475 +2008-02-29 -6.441779 +2008-03-31 -8.184940 +2008-04-30 -9.011531 +Freq: BM, Length: 100, dtype: float64 + +In [65]: ts.interpolate().count() +Out[65]: 100 + +In [66]: ts.interpolate().plot() +Out[66]: +``` + +![series_interpolate](https://static.pypandas.cn/public/static/images/series_interpolate.png) + +Index aware interpolation is available via the ``method`` keyword: + +``` python +In [67]: ts2 +Out[67]: +2000-01-31 0.469112 +2000-02-29 NaN +2002-07-31 -5.785037 +2005-01-31 NaN +2008-04-30 -9.011531 +dtype: float64 + +In [68]: ts2.interpolate() +Out[68]: +2000-01-31 0.469112 +2000-02-29 -2.657962 +2002-07-31 -5.785037 +2005-01-31 -7.398284 +2008-04-30 -9.011531 +dtype: float64 + +In [69]: ts2.interpolate(method='time') +Out[69]: +2000-01-31 0.469112 +2000-02-29 0.270241 +2002-07-31 -5.785037 +2005-01-31 -7.190866 +2008-04-30 -9.011531 +dtype: float64 +``` + +For a floating-point index, use ``method='values'``: + +``` python +In [70]: ser +Out[70]: +0.0 0.0 +1.0 NaN +10.0 10.0 +dtype: float64 + +In [71]: ser.interpolate() +Out[71]: +0.0 0.0 +1.0 5.0 +10.0 10.0 +dtype: float64 + +In [72]: ser.interpolate(method='values') +Out[72]: +0.0 0.0 +1.0 1.0 +10.0 10.0 +dtype: float64 +``` + +You can also interpolate with a DataFrame: + +``` python +In [73]: df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8], + ....: 'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]}) + ....: + +In [74]: df +Out[74]: + A B +0 1.0 0.25 +1 2.1 NaN +2 NaN NaN +3 4.7 4.00 +4 5.6 12.20 +5 6.8 14.40 + +In [75]: df.interpolate() +Out[75]: + A B +0 1.0 0.25 +1 2.1 1.50 +2 3.4 2.75 +3 4.7 4.00 +4 5.6 12.20 +5 6.8 14.40 +``` + +The ``method`` argument gives access to fancier interpolation methods. +If you have [scipy](http://www.scipy.org) installed, you can pass the name of a 1-d interpolation routine to ``method``. +You’ll want to consult the full scipy interpolation [documentation](http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation) and reference [guide](http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html) for details. +The appropriate interpolation method will depend on the type of data you are working with. + +- If you are dealing with a time series that is growing at an increasing rate, +``method='quadratic'`` may be appropriate. +- If you have values approximating a cumulative distribution function, +then ``method='pchip'`` should work well. +- To fill missing values with goal of smooth plotting, consider ``method='akima'``. + +::: danger Warning + +These methods require ``scipy``. + +::: + +``` python +In [76]: df.interpolate(method='barycentric') +Out[76]: + A B +0 1.00 0.250 +1 2.10 -7.660 +2 3.53 -4.515 +3 4.70 4.000 +4 5.60 12.200 +5 6.80 14.400 + +In [77]: df.interpolate(method='pchip') +Out[77]: + A B +0 1.00000 0.250000 +1 2.10000 0.672808 +2 3.43454 1.928950 +3 4.70000 4.000000 +4 5.60000 12.200000 +5 6.80000 14.400000 + +In [78]: df.interpolate(method='akima') +Out[78]: + A B +0 1.000000 0.250000 +1 2.100000 -0.873316 +2 3.406667 0.320034 +3 4.700000 4.000000 +4 5.600000 12.200000 +5 6.800000 14.400000 +``` + +When interpolating via a polynomial or spline approximation, you must also specify +the degree or order of the approximation: + +``` python +In [79]: df.interpolate(method='spline', order=2) +Out[79]: + A B +0 1.000000 0.250000 +1 2.100000 -0.428598 +2 3.404545 1.206900 +3 4.700000 4.000000 +4 5.600000 12.200000 +5 6.800000 14.400000 + +In [80]: df.interpolate(method='polynomial', order=2) +Out[80]: + A B +0 1.000000 0.250000 +1 2.100000 -2.703846 +2 3.451351 -1.453846 +3 4.700000 4.000000 +4 5.600000 12.200000 +5 6.800000 14.400000 +``` + +Compare several methods: + +``` python +In [81]: np.random.seed(2) + +In [82]: ser = pd.Series(np.arange(1, 10.1, .25) ** 2 + np.random.randn(37)) + +In [83]: missing = np.array([4, 13, 14, 15, 16, 17, 18, 20, 29]) + +In [84]: ser[missing] = np.nan + +In [85]: methods = ['linear', 'quadratic', 'cubic'] + +In [86]: df = pd.DataFrame({m: ser.interpolate(method=m) for m in methods}) + +In [87]: df.plot() +Out[87]: +``` + +![compare_interpolations](https://static.pypandas.cn/public/static/images/compare_interpolations.png) + +Another use case is interpolation at *new* values. +Suppose you have 100 observations from some distribution. And let’s suppose +that you’re particularly interested in what’s happening around the middle. +You can mix pandas’ ``reindex`` and ``interpolate`` methods to interpolate +at the new values. + +``` python +In [88]: ser = pd.Series(np.sort(np.random.uniform(size=100))) + +# interpolate at new_index +In [89]: new_index = ser.index | pd.Index([49.25, 49.5, 49.75, 50.25, 50.5, 50.75]) + +In [90]: interp_s = ser.reindex(new_index).interpolate(method='pchip') + +In [91]: interp_s[49:51] +Out[91]: +49.00 0.471410 +49.25 0.476841 +49.50 0.481780 +49.75 0.485998 +50.00 0.489266 +50.25 0.491814 +50.50 0.493995 +50.75 0.495763 +51.00 0.497074 +dtype: float64 +``` + +### Interpolation limits + +Like other pandas fill methods, [``interpolate()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html#pandas.DataFrame.interpolate) accepts a ``limit`` keyword +argument. Use this argument to limit the number of consecutive ``NaN`` values +filled since the last valid observation: + +``` python +In [92]: ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, + ....: np.nan, 13, np.nan, np.nan]) + ....: + +In [93]: ser +Out[93]: +0 NaN +1 NaN +2 5.0 +3 NaN +4 NaN +5 NaN +6 13.0 +7 NaN +8 NaN +dtype: float64 + +# fill all consecutive values in a forward direction +In [94]: ser.interpolate() +Out[94]: +0 NaN +1 NaN +2 5.0 +3 7.0 +4 9.0 +5 11.0 +6 13.0 +7 13.0 +8 13.0 +dtype: float64 + +# fill one consecutive value in a forward direction +In [95]: ser.interpolate(limit=1) +Out[95]: +0 NaN +1 NaN +2 5.0 +3 7.0 +4 NaN +5 NaN +6 13.0 +7 13.0 +8 NaN +dtype: float64 +``` + +By default, ``NaN`` values are filled in a ``forward`` direction. Use +``limit_direction`` parameter to fill ``backward`` or from ``both`` directions. + +``` python +# fill one consecutive value backwards +In [96]: ser.interpolate(limit=1, limit_direction='backward') +Out[96]: +0 NaN +1 5.0 +2 5.0 +3 NaN +4 NaN +5 11.0 +6 13.0 +7 NaN +8 NaN +dtype: float64 + +# fill one consecutive value in both directions +In [97]: ser.interpolate(limit=1, limit_direction='both') +Out[97]: +0 NaN +1 5.0 +2 5.0 +3 7.0 +4 NaN +5 11.0 +6 13.0 +7 13.0 +8 NaN +dtype: float64 + +# fill all consecutive values in both directions +In [98]: ser.interpolate(limit_direction='both') +Out[98]: +0 5.0 +1 5.0 +2 5.0 +3 7.0 +4 9.0 +5 11.0 +6 13.0 +7 13.0 +8 13.0 +dtype: float64 +``` + +By default, ``NaN`` values are filled whether they are inside (surrounded by) +existing valid values, or outside existing valid values. Introduced in v0.23 +the ``limit_area`` parameter restricts filling to either inside or outside values. + +``` python +# fill one consecutive inside value in both directions +In [99]: ser.interpolate(limit_direction='both', limit_area='inside', limit=1) +Out[99]: +0 NaN +1 NaN +2 5.0 +3 7.0 +4 NaN +5 11.0 +6 13.0 +7 NaN +8 NaN +dtype: float64 + +# fill all consecutive outside values backward +In [100]: ser.interpolate(limit_direction='backward', limit_area='outside') +Out[100]: +0 5.0 +1 5.0 +2 5.0 +3 NaN +4 NaN +5 NaN +6 13.0 +7 NaN +8 NaN +dtype: float64 + +# fill all consecutive outside values in both directions +In [101]: ser.interpolate(limit_direction='both', limit_area='outside') +Out[101]: +0 5.0 +1 5.0 +2 5.0 +3 NaN +4 NaN +5 NaN +6 13.0 +7 13.0 +8 13.0 +dtype: float64 +``` + +## Replacing generic values + +Often times we want to replace arbitrary values with other values. + +[``replace()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.replace.html#pandas.Series.replace) in Series and [``replace()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html#pandas.DataFrame.replace) in DataFrame provides an efficient yet +flexible way to perform such replacements. + +For a Series, you can replace a single value or a list of values by another +value: + +``` python +In [102]: ser = pd.Series([0., 1., 2., 3., 4.]) + +In [103]: ser.replace(0, 5) +Out[103]: +0 5.0 +1 1.0 +2 2.0 +3 3.0 +4 4.0 +dtype: float64 +``` + +You can replace a list of values by a list of other values: + +``` python +In [104]: ser.replace([0, 1, 2, 3, 4], [4, 3, 2, 1, 0]) +Out[104]: +0 4.0 +1 3.0 +2 2.0 +3 1.0 +4 0.0 +dtype: float64 +``` + +You can also specify a mapping dict: + +``` python +In [105]: ser.replace({0: 10, 1: 100}) +Out[105]: +0 10.0 +1 100.0 +2 2.0 +3 3.0 +4 4.0 +dtype: float64 +``` + +For a DataFrame, you can specify individual values by column: + +``` python +In [106]: df = pd.DataFrame({'a': [0, 1, 2, 3, 4], 'b': [5, 6, 7, 8, 9]}) + +In [107]: df.replace({'a': 0, 'b': 5}, 100) +Out[107]: + a b +0 100 100 +1 1 6 +2 2 7 +3 3 8 +4 4 9 +``` + +Instead of replacing with specified values, you can treat all given values as +missing and interpolate over them: + +``` python +In [108]: ser.replace([1, 2, 3], method='pad') +Out[108]: +0 0.0 +1 0.0 +2 0.0 +3 0.0 +4 4.0 +dtype: float64 +``` + +## String/regular expression replacement + +::: tip Note + +Python strings prefixed with the ``r`` character such as ``r'hello world'`` +are so-called “raw” strings. They have different semantics regarding +backslashes than strings without this prefix. Backslashes in raw strings +will be interpreted as an escaped backslash, e.g., ``r'\' == '\\'``. You +should [read about them](https://docs.python.org/3/reference/lexical_analysis.html#string-literals) +if this is unclear. + +::: + +Replace the ‘.’ with ``NaN`` (str -> str): + +``` python +In [109]: d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a', 'b', np.nan, 'd']} + +In [110]: df = pd.DataFrame(d) + +In [111]: df.replace('.', np.nan) +Out[111]: + a b c +0 0 a a +1 1 b b +2 2 NaN NaN +3 3 NaN d +``` + +Now do it with a regular expression that removes surrounding whitespace +(regex -> regex): + +``` python +In [112]: df.replace(r'\s*\.\s*', np.nan, regex=True) +Out[112]: + a b c +0 0 a a +1 1 b b +2 2 NaN NaN +3 3 NaN d +``` + +Replace a few different values (list -> list): + +``` python +In [113]: df.replace(['a', '.'], ['b', np.nan]) +Out[113]: + a b c +0 0 b b +1 1 b b +2 2 NaN NaN +3 3 NaN d +``` + +list of regex -> list of regex: + +``` python +In [114]: df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True) +Out[114]: + a b c +0 0 astuff astuff +1 1 b b +2 2 dot NaN +3 3 dot d +``` + +Only search in column ``'b'`` (dict -> dict): + +``` python +In [115]: df.replace({'b': '.'}, {'b': np.nan}) +Out[115]: + a b c +0 0 a a +1 1 b b +2 2 NaN NaN +3 3 NaN d +``` + +Same as the previous example, but use a regular expression for +searching instead (dict of regex -> dict): + +``` python +In [116]: df.replace({'b': r'\s*\.\s*'}, {'b': np.nan}, regex=True) +Out[116]: + a b c +0 0 a a +1 1 b b +2 2 NaN NaN +3 3 NaN d +``` + +You can pass nested dictionaries of regular expressions that use ``regex=True``: + +``` python +In [117]: df.replace({'b': {'b': r''}}, regex=True) +Out[117]: + a b c +0 0 a a +1 1 b +2 2 . NaN +3 3 . d +``` + +Alternatively, you can pass the nested dictionary like so: + +``` python +In [118]: df.replace(regex={'b': {r'\s*\.\s*': np.nan}}) +Out[118]: + a b c +0 0 a a +1 1 b b +2 2 NaN NaN +3 3 NaN d +``` + +You can also use the group of a regular expression match when replacing (dict +of regex -> dict of regex), this works for lists as well. + +``` python +In [119]: df.replace({'b': r'\s*(\.)\s*'}, {'b': r'\1ty'}, regex=True) +Out[119]: + a b c +0 0 a a +1 1 b b +2 2 .ty NaN +3 3 .ty d +``` + +You can pass a list of regular expressions, of which those that match +will be replaced with a scalar (list of regex -> regex). + +``` python +In [120]: df.replace([r'\s*\.\s*', r'a|b'], np.nan, regex=True) +Out[120]: + a b c +0 0 NaN NaN +1 1 NaN NaN +2 2 NaN NaN +3 3 NaN d +``` + +All of the regular expression examples can also be passed with the +``to_replace`` argument as the ``regex`` argument. In this case the ``value`` +argument must be passed explicitly by name or ``regex`` must be a nested +dictionary. The previous example, in this case, would then be: + +``` python +In [121]: df.replace(regex=[r'\s*\.\s*', r'a|b'], value=np.nan) +Out[121]: + a b c +0 0 NaN NaN +1 1 NaN NaN +2 2 NaN NaN +3 3 NaN d +``` + +This can be convenient if you do not want to pass ``regex=True`` every time you +want to use a regular expression. + +::: tip Note + +Anywhere in the above ``replace`` examples that you see a regular expression +a compiled regular expression is valid as well. + +::: + +## Numeric replacement + +[``replace()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.replace.html#pandas.DataFrame.replace) is similar to [``fillna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna). + +``` python +In [122]: df = pd.DataFrame(np.random.randn(10, 2)) + +In [123]: df[np.random.rand(df.shape[0]) > 0.5] = 1.5 + +In [124]: df.replace(1.5, np.nan) +Out[124]: + 0 1 +0 -0.844214 -1.021415 +1 0.432396 -0.323580 +2 0.423825 0.799180 +3 1.262614 0.751965 +4 NaN NaN +5 NaN NaN +6 -0.498174 -1.060799 +7 0.591667 -0.183257 +8 1.019855 -1.482465 +9 NaN NaN +``` + +Replacing more than one value is possible by passing a list. + +``` python +In [125]: df00 = df.iloc[0, 0] + +In [126]: df.replace([1.5, df00], [np.nan, 'a']) +Out[126]: + 0 1 +0 a -1.02141 +1 0.432396 -0.32358 +2 0.423825 0.79918 +3 1.26261 0.751965 +4 NaN NaN +5 NaN NaN +6 -0.498174 -1.0608 +7 0.591667 -0.183257 +8 1.01985 -1.48247 +9 NaN NaN + +In [127]: df[1].dtype +Out[127]: dtype('float64') +``` + +You can also operate on the DataFrame in place: + +``` python +In [128]: df.replace(1.5, np.nan, inplace=True) +``` + +::: danger Warning + +When replacing multiple ``bool`` or ``datetime64`` objects, the first +argument to ``replace`` (``to_replace``) must match the type of the value +being replaced. For example, + +``` python +>>> s = pd.Series([True, False, True]) +>>> s.replace({'a string': 'new value', True: False}) # raises +TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str' +``` + +will raise a ``TypeError`` because one of the ``dict`` keys is not of the +correct type for replacement. + +However, when replacing a *single* object such as, + +``` python +In [129]: s = pd.Series([True, False, True]) + +In [130]: s.replace('a string', 'another string') +Out[130]: +0 True +1 False +2 True +dtype: bool +``` + +the original ``NDFrame`` object will be returned untouched. We’re working on +unifying this API, but for backwards compatibility reasons we cannot break +the latter behavior. See [GH6354](https://github.com/pandas-dev/pandas/issues/6354) for more details. + +::: + +### Missing data casting rules and indexing + +While pandas supports storing arrays of integer and boolean type, these types +are not capable of storing missing data. Until we can switch to using a native +NA type in NumPy, we’ve established some “casting rules”. When a reindexing +operation introduces missing data, the Series will be cast according to the +rules introduced in the table below. + +data type | Cast to +---|--- +integer | float +boolean | object +float | no cast +object | no cast + +For example: + +``` python +In [131]: s = pd.Series(np.random.randn(5), index=[0, 2, 4, 6, 7]) + +In [132]: s > 0 +Out[132]: +0 True +2 True +4 True +6 True +7 True +dtype: bool + +In [133]: (s > 0).dtype +Out[133]: dtype('bool') + +In [134]: crit = (s > 0).reindex(list(range(8))) + +In [135]: crit +Out[135]: +0 True +1 NaN +2 True +3 NaN +4 True +5 NaN +6 True +7 True +dtype: object + +In [136]: crit.dtype +Out[136]: dtype('O') +``` + +Ordinarily NumPy will complain if you try to use an object array (even if it +contains boolean values) instead of a boolean array to get or set values from +an ndarray (e.g. selecting values based on some criteria). If a boolean vector +contains NAs, an exception will be generated: + +``` python +In [137]: reindexed = s.reindex(list(range(8))).fillna(0) + +In [138]: reindexed[crit] +--------------------------------------------------------------------------- +ValueError Traceback (most recent call last) + in +----> 1 reindexed[crit] + +/pandas/pandas/core/series.py in __getitem__(self, key) + 1101 key = list(key) + 1102 +-> 1103 if com.is_bool_indexer(key): + 1104 key = check_bool_indexer(self.index, key) + 1105 + +/pandas/pandas/core/common.py in is_bool_indexer(key) + 128 if not lib.is_bool_array(key): + 129 if isna(key).any(): +--> 130 raise ValueError(na_msg) + 131 return False + 132 return True + +ValueError: cannot index with vector containing NA / NaN values +``` + +However, these can be filled in using [``fillna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna) and it will work fine: + +``` python +In [139]: reindexed[crit.fillna(False)] +Out[139]: +0 0.126504 +2 0.696198 +4 0.697416 +6 0.601516 +7 0.003659 +dtype: float64 + +In [140]: reindexed[crit.fillna(True)] +Out[140]: +0 0.126504 +1 0.000000 +2 0.696198 +3 0.000000 +4 0.697416 +5 0.000000 +6 0.601516 +7 0.003659 +dtype: float64 +``` + +Pandas provides a nullable integer dtype, but you must explicitly request it +when creating the series or column. Notice that we use a capital “I” in +the ``dtype="Int64"``. + +``` python +In [141]: s = pd.Series([0, 1, np.nan, 3, 4], dtype="Int64") + +In [142]: s +Out[142]: +0 0 +1 1 +2 NaN +3 3 +4 4 +dtype: Int64 +``` + +See [Nullable integer data type](integer_na.html#integer-na) for more. diff --git a/Python/pandas/user_guide/options.md b/Python/pandas/user_guide/options.md new file mode 100644 index 00000000..533cf766 --- /dev/null +++ b/Python/pandas/user_guide/options.md @@ -0,0 +1,711 @@ +# Options and settings + +## Overview + +pandas has an options system that lets you customize some aspects of its behaviour, +display-related options being those the user is most likely to adjust. + +Options have a full “dotted-style”, case-insensitive name (e.g. ``display.max_rows``). +You can get/set options directly as attributes of the top-level ``options`` attribute: + +``` python +In [1]: import pandas as pd + +In [2]: pd.options.display.max_rows +Out[2]: 15 + +In [3]: pd.options.display.max_rows = 999 + +In [4]: pd.options.display.max_rows +Out[4]: 999 +``` + +The API is composed of 5 relevant functions, available directly from the ``pandas`` +namespace: + +- [``get_option()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_option.html#pandas.get_option) / [``set_option()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html#pandas.set_option) - get/set the value of a single option. +- [``reset_option()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.reset_option.html#pandas.reset_option) - reset one or more options to their default value. +- [``describe_option()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.describe_option.html#pandas.describe_option) - print the descriptions of one or more options. +- [``option_context()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.option_context.html#pandas.option_context) - execute a codeblock with a set of options +that revert to prior settings after execution. + +**Note:** Developers can check out [pandas/core/config.py](https://github.com/pandas-dev/pandas/blob/master/pandas/core/config.py) for more information. + +All of the functions above accept a regexp pattern (``re.search`` style) as an argument, +and so passing in a substring will work - as long as it is unambiguous: + +``` python +In [5]: pd.get_option("display.max_rows") +Out[5]: 999 + +In [6]: pd.set_option("display.max_rows", 101) + +In [7]: pd.get_option("display.max_rows") +Out[7]: 101 + +In [8]: pd.set_option("max_r", 102) + +In [9]: pd.get_option("display.max_rows") +Out[9]: 102 +``` + +The following will **not work** because it matches multiple option names, e.g. +``display.max_colwidth``, ``display.max_rows``, ``display.max_columns``: + +``` python +In [10]: try: + ....: pd.get_option("column") + ....: except KeyError as e: + ....: print(e) + ....: +'Pattern matched multiple keys' +``` + +**Note:** Using this form of shorthand may cause your code to break if new options with similar names are added in future versions. + +You can get a list of available options and their descriptions with ``describe_option``. When called +with no argument ``describe_option`` will print out the descriptions for all available options. + +## Getting and setting options + +As described above, [``get_option()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_option.html#pandas.get_option) and [``set_option()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.set_option.html#pandas.set_option) +are available from the pandas namespace. To change an option, call +``set_option('option regex', new_value)``. + +``` python +In [11]: pd.get_option('mode.sim_interactive') +Out[11]: False + +In [12]: pd.set_option('mode.sim_interactive', True) + +In [13]: pd.get_option('mode.sim_interactive') +Out[13]: True +``` + +**Note:** The option ‘mode.sim_interactive’ is mostly used for debugging purposes. + +All options also have a default value, and you can use ``reset_option`` to do just that: + +``` python +In [14]: pd.get_option("display.max_rows") +Out[14]: 60 + +In [15]: pd.set_option("display.max_rows", 999) + +In [16]: pd.get_option("display.max_rows") +Out[16]: 999 + +In [17]: pd.reset_option("display.max_rows") + +In [18]: pd.get_option("display.max_rows") +Out[18]: 60 +``` + +It’s also possible to reset multiple options at once (using a regex): + +``` python +In [19]: pd.reset_option("^display") +``` + +``option_context`` context manager has been exposed through +the top-level API, allowing you to execute code with given option values. Option values +are restored automatically when you exit the *with* block: + +``` python +In [20]: with pd.option_context("display.max_rows", 10, "display.max_columns", 5): + ....: print(pd.get_option("display.max_rows")) + ....: print(pd.get_option("display.max_columns")) + ....: +10 +5 + +In [21]: print(pd.get_option("display.max_rows")) +60 + +In [22]: print(pd.get_option("display.max_columns")) +0 +``` + +## Setting startup options in Python/IPython environment + +Using startup scripts for the Python/IPython environment to import pandas and set options makes working with pandas more efficient. To do this, create a .py or .ipy script in the startup directory of the desired profile. An example where the startup folder is in a default ipython profile can be found at: + +``` +$IPYTHONDIR/profile_default/startup +``` + +More information can be found in the [ipython documentation](https://ipython.org/ipython-doc/stable/interactive/tutorial.html#startup-files). An example startup script for pandas is displayed below: + +``` python +import pandas as pd +pd.set_option('display.max_rows', 999) +pd.set_option('precision', 5) +``` + +## Frequently Used Options + +The following is a walk-through of the more frequently used display options. + +``display.max_rows`` and ``display.max_columns`` sets the maximum number +of rows and columns displayed when a frame is pretty-printed. Truncated +lines are replaced by an ellipsis. + +``` python +In [23]: df = pd.DataFrame(np.random.randn(7, 2)) + +In [24]: pd.set_option('max_rows', 7) + +In [25]: df +Out[25]: + 0 1 +0 0.469112 -0.282863 +1 -1.509059 -1.135632 +2 1.212112 -0.173215 +3 0.119209 -1.044236 +4 -0.861849 -2.104569 +5 -0.494929 1.071804 +6 0.721555 -0.706771 + +In [26]: pd.set_option('max_rows', 5) + +In [27]: df +Out[27]: + 0 1 +0 0.469112 -0.282863 +1 -1.509059 -1.135632 +.. ... ... +5 -0.494929 1.071804 +6 0.721555 -0.706771 + +[7 rows x 2 columns] + +In [28]: pd.reset_option('max_rows') +``` + +Once the ``display.max_rows`` is exceeded, the ``display.min_rows`` options +determines how many rows are shown in the truncated repr. + +``` python +In [29]: pd.set_option('max_rows', 8) + +In [30]: pd.set_option('max_rows', 4) + +# below max_rows -> all rows shown +In [31]: df = pd.DataFrame(np.random.randn(7, 2)) + +In [32]: df +Out[32]: + 0 1 +0 -1.039575 0.271860 +1 -0.424972 0.567020 +.. ... ... +5 0.404705 0.577046 +6 -1.715002 -1.039268 + +[7 rows x 2 columns] + +# above max_rows -> only min_rows (4) rows shown +In [33]: df = pd.DataFrame(np.random.randn(9, 2)) + +In [34]: df +Out[34]: + 0 1 +0 -0.370647 -1.157892 +1 -1.344312 0.844885 +.. ... ... +7 0.276662 -0.472035 +8 -0.013960 -0.362543 + +[9 rows x 2 columns] + +In [35]: pd.reset_option('max_rows') + +In [36]: pd.reset_option('min_rows') +``` + +``display.expand_frame_repr`` allows for the representation of +dataframes to stretch across pages, wrapped over the full column vs row-wise. + +``` python +In [37]: df = pd.DataFrame(np.random.randn(5, 10)) + +In [38]: pd.set_option('expand_frame_repr', True) + +In [39]: df +Out[39]: + 0 1 2 3 4 5 6 7 8 9 +0 -0.006154 -0.923061 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 -1.170299 -0.226169 +1 0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127 -1.436737 -1.413681 1.607920 +2 1.024180 0.569605 0.875906 -2.211372 0.974466 -2.006747 -0.410001 -0.078638 0.545952 -1.219217 +3 -1.226825 0.769804 -1.281247 -0.727707 -0.121306 -0.097883 0.695775 0.341734 0.959726 -1.110336 +4 -0.619976 0.149748 -0.732339 0.687738 0.176444 0.403310 -0.154951 0.301624 -2.179861 -1.369849 + +In [40]: pd.set_option('expand_frame_repr', False) + +In [41]: df +Out[41]: + 0 1 2 3 4 5 6 7 8 9 +0 -0.006154 -0.923061 0.895717 0.805244 -1.206412 2.565646 1.431256 1.340309 -1.170299 -0.226169 +1 0.410835 0.813850 0.132003 -0.827317 -0.076467 -1.187678 1.130127 -1.436737 -1.413681 1.607920 +2 1.024180 0.569605 0.875906 -2.211372 0.974466 -2.006747 -0.410001 -0.078638 0.545952 -1.219217 +3 -1.226825 0.769804 -1.281247 -0.727707 -0.121306 -0.097883 0.695775 0.341734 0.959726 -1.110336 +4 -0.619976 0.149748 -0.732339 0.687738 0.176444 0.403310 -0.154951 0.301624 -2.179861 -1.369849 + +In [42]: pd.reset_option('expand_frame_repr') +``` + +``display.large_repr`` lets you select whether to display dataframes that exceed +``max_columns`` or ``max_rows`` as a truncated frame, or as a summary. + +``` python +In [43]: df = pd.DataFrame(np.random.randn(10, 10)) + +In [44]: pd.set_option('max_rows', 5) + +In [45]: pd.set_option('large_repr', 'truncate') + +In [46]: df +Out[46]: + 0 1 2 3 4 5 6 7 8 9 +0 -0.954208 1.462696 -1.743161 -0.826591 -0.345352 1.314232 0.690579 0.995761 2.396780 0.014871 +1 3.357427 -0.317441 -1.236269 0.896171 -0.487602 -0.082240 -2.182937 0.380396 0.084844 0.432390 +.. ... ... ... ... ... ... ... ... ... ... +8 -0.303421 -0.858447 0.306996 -0.028665 0.384316 1.574159 1.588931 0.476720 0.473424 -0.242861 +9 -0.014805 -0.284319 0.650776 -1.461665 -1.137707 -0.891060 -0.693921 1.613616 0.464000 0.227371 + +[10 rows x 10 columns] + +In [47]: pd.set_option('large_repr', 'info') + +In [48]: df +Out[48]: + +RangeIndex: 10 entries, 0 to 9 +Data columns (total 10 columns): +0 10 non-null float64 +1 10 non-null float64 +2 10 non-null float64 +3 10 non-null float64 +4 10 non-null float64 +5 10 non-null float64 +6 10 non-null float64 +7 10 non-null float64 +8 10 non-null float64 +9 10 non-null float64 +dtypes: float64(10) +memory usage: 928.0 bytes + +In [49]: pd.reset_option('large_repr') + +In [50]: pd.reset_option('max_rows') +``` + +``display.max_colwidth`` sets the maximum width of columns. Cells +of this length or longer will be truncated with an ellipsis. + +``` python +In [51]: df = pd.DataFrame(np.array([['foo', 'bar', 'bim', 'uncomfortably long string'], + ....: ['horse', 'cow', 'banana', 'apple']])) + ....: + +In [52]: pd.set_option('max_colwidth', 40) + +In [53]: df +Out[53]: + 0 1 2 3 +0 foo bar bim uncomfortably long string +1 horse cow banana apple + +In [54]: pd.set_option('max_colwidth', 6) + +In [55]: df +Out[55]: + 0 1 2 3 +0 foo bar bim un... +1 horse cow ba... apple + +In [56]: pd.reset_option('max_colwidth') +``` + +``display.max_info_columns`` sets a threshold for when by-column info +will be given. + +``` python +In [57]: df = pd.DataFrame(np.random.randn(10, 10)) + +In [58]: pd.set_option('max_info_columns', 11) + +In [59]: df.info() + +RangeIndex: 10 entries, 0 to 9 +Data columns (total 10 columns): +0 10 non-null float64 +1 10 non-null float64 +2 10 non-null float64 +3 10 non-null float64 +4 10 non-null float64 +5 10 non-null float64 +6 10 non-null float64 +7 10 non-null float64 +8 10 non-null float64 +9 10 non-null float64 +dtypes: float64(10) +memory usage: 928.0 bytes + +In [60]: pd.set_option('max_info_columns', 5) + +In [61]: df.info() + +RangeIndex: 10 entries, 0 to 9 +Columns: 10 entries, 0 to 9 +dtypes: float64(10) +memory usage: 928.0 bytes + +In [62]: pd.reset_option('max_info_columns') +``` + +``display.max_info_rows``: ``df.info()`` will usually show null-counts for each column. +For large frames this can be quite slow. ``max_info_rows`` and ``max_info_cols`` +limit this null check only to frames with smaller dimensions then specified. Note that you +can specify the option ``df.info(null_counts=True)`` to override on showing a particular frame. + +``` python +In [63]: df = pd.DataFrame(np.random.choice([0, 1, np.nan], size=(10, 10))) + +In [64]: df +Out[64]: + 0 1 2 3 4 5 6 7 8 9 +0 0.0 NaN 1.0 NaN NaN 0.0 NaN 0.0 NaN 1.0 +1 1.0 NaN 1.0 1.0 1.0 1.0 NaN 0.0 0.0 NaN +2 0.0 NaN 1.0 0.0 0.0 NaN NaN NaN NaN 0.0 +3 NaN NaN NaN 0.0 1.0 1.0 NaN 1.0 NaN 1.0 +4 0.0 NaN NaN NaN 0.0 NaN NaN NaN 1.0 0.0 +5 0.0 1.0 1.0 1.0 1.0 0.0 NaN NaN 1.0 0.0 +6 1.0 1.0 1.0 NaN 1.0 NaN 1.0 0.0 NaN NaN +7 0.0 0.0 1.0 0.0 1.0 0.0 1.0 1.0 0.0 NaN +8 NaN NaN NaN 0.0 NaN NaN NaN NaN 1.0 NaN +9 0.0 NaN 0.0 NaN NaN 0.0 NaN 1.0 1.0 0.0 + +In [65]: pd.set_option('max_info_rows', 11) + +In [66]: df.info() + +RangeIndex: 10 entries, 0 to 9 +Data columns (total 10 columns): +0 8 non-null float64 +1 3 non-null float64 +2 7 non-null float64 +3 6 non-null float64 +4 7 non-null float64 +5 6 non-null float64 +6 2 non-null float64 +7 6 non-null float64 +8 6 non-null float64 +9 6 non-null float64 +dtypes: float64(10) +memory usage: 928.0 bytes + +In [67]: pd.set_option('max_info_rows', 5) + +In [68]: df.info() + +RangeIndex: 10 entries, 0 to 9 +Data columns (total 10 columns): +0 float64 +1 float64 +2 float64 +3 float64 +4 float64 +5 float64 +6 float64 +7 float64 +8 float64 +9 float64 +dtypes: float64(10) +memory usage: 928.0 bytes + +In [69]: pd.reset_option('max_info_rows') +``` + +``display.precision`` sets the output display precision in terms of decimal places. +This is only a suggestion. + +``` python +In [70]: df = pd.DataFrame(np.random.randn(5, 5)) + +In [71]: pd.set_option('precision', 7) + +In [72]: df +Out[72]: + 0 1 2 3 4 +0 -1.1506406 -0.7983341 -0.5576966 0.3813531 1.3371217 +1 -1.5310949 1.3314582 -0.5713290 -0.0266708 -1.0856630 +2 -1.1147378 -0.0582158 -0.4867681 1.6851483 0.1125723 +3 -1.4953086 0.8984347 -0.1482168 -1.5960698 0.1596530 +4 0.2621358 0.0362196 0.1847350 -0.2550694 -0.2710197 + +In [73]: pd.set_option('precision', 4) + +In [74]: df +Out[74]: + 0 1 2 3 4 +0 -1.1506 -0.7983 -0.5577 0.3814 1.3371 +1 -1.5311 1.3315 -0.5713 -0.0267 -1.0857 +2 -1.1147 -0.0582 -0.4868 1.6851 0.1126 +3 -1.4953 0.8984 -0.1482 -1.5961 0.1597 +4 0.2621 0.0362 0.1847 -0.2551 -0.2710 +``` + +``display.chop_threshold`` sets at what level pandas rounds to zero when +it displays a Series of DataFrame. This setting does not change the +precision at which the number is stored. + +``` python +In [75]: df = pd.DataFrame(np.random.randn(6, 6)) + +In [76]: pd.set_option('chop_threshold', 0) + +In [77]: df +Out[77]: + 0 1 2 3 4 5 +0 1.2884 0.2946 -1.1658 0.8470 -0.6856 0.6091 +1 -0.3040 0.6256 -0.0593 0.2497 1.1039 -1.0875 +2 1.9980 -0.2445 0.1362 0.8863 -1.3507 -0.8863 +3 -1.0133 1.9209 -0.3882 -2.3144 0.6655 0.4026 +4 0.3996 -1.7660 0.8504 0.3881 0.9923 0.7441 +5 -0.7398 -1.0549 -0.1796 0.6396 1.5850 1.9067 + +In [78]: pd.set_option('chop_threshold', .5) + +In [79]: df +Out[79]: + 0 1 2 3 4 5 +0 1.2884 0.0000 -1.1658 0.8470 -0.6856 0.6091 +1 0.0000 0.6256 0.0000 0.0000 1.1039 -1.0875 +2 1.9980 0.0000 0.0000 0.8863 -1.3507 -0.8863 +3 -1.0133 1.9209 0.0000 -2.3144 0.6655 0.0000 +4 0.0000 -1.7660 0.8504 0.0000 0.9923 0.7441 +5 -0.7398 -1.0549 0.0000 0.6396 1.5850 1.9067 + +In [80]: pd.reset_option('chop_threshold') +``` + +``display.colheader_justify`` controls the justification of the headers. +The options are ‘right’, and ‘left’. + +``` python +In [81]: df = pd.DataFrame(np.array([np.random.randn(6), + ....: np.random.randint(1, 9, 6) * .1, + ....: np.zeros(6)]).T, + ....: columns=['A', 'B', 'C'], dtype='float') + ....: + +In [82]: pd.set_option('colheader_justify', 'right') + +In [83]: df +Out[83]: + A B C +0 0.1040 0.1 0.0 +1 0.1741 0.5 0.0 +2 -0.4395 0.4 0.0 +3 -0.7413 0.8 0.0 +4 -0.0797 0.4 0.0 +5 -0.9229 0.3 0.0 + +In [84]: pd.set_option('colheader_justify', 'left') + +In [85]: df +Out[85]: + A B C +0 0.1040 0.1 0.0 +1 0.1741 0.5 0.0 +2 -0.4395 0.4 0.0 +3 -0.7413 0.8 0.0 +4 -0.0797 0.4 0.0 +5 -0.9229 0.3 0.0 + +In [86]: pd.reset_option('colheader_justify') +``` + +## Available options + +Option | Default | Function +---|---|--- +display.chop_threshold | None | If set to a float value, all float values smaller then the given threshold will be displayed as exactly 0 by repr and friends. +display.colheader_justify | right | Controls the justification of column headers. used by DataFrameFormatter. +display.column_space | 12 | No description available. +display.date_dayfirst | False | When True, prints and parses dates with the day first, eg 20/01/2005 +display.date_yearfirst | False | When True, prints and parses dates with the year first, eg 2005/01/20 +display.encoding | UTF-8 | Defaults to the detected encoding of the console. Specifies the encoding to be used for strings returned by to_string, these are generally strings meant to be displayed on the console. +display.expand_frame_repr | True | Whether to print out the full DataFrame repr for wide DataFrames across multiple lines, max_columns is still respected, but the output will wrap-around across multiple “pages” if its width exceeds display.width. +display.float_format | None | The callable should accept a floating point number and return a string with the desired format of the number. This is used in some places like SeriesFormatter. See core.format.EngFormatter for an example. +display.large_repr | truncate | For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can show a truncated table (the default), or switch to the view from df.info() (the behaviour in earlier versions of pandas). allowable settings, [‘truncate’, ‘info’] +display.latex.repr | False | Whether to produce a latex DataFrame representation for jupyter frontends that support it. +display.latex.escape | True | Escapes special characters in DataFrames, when using the to_latex method. +display.latex.longtable | False | Specifies if the to_latex method of a DataFrame uses the longtable format. +display.latex.multicolumn | True | Combines columns when using a MultiIndex +display.latex.multicolumn_format | ‘l’ | Alignment of multicolumn labels +display.latex.multirow | False | Combines rows when using a MultiIndex. Centered instead of top-aligned, separated by clines. +display.max_columns | 0 or 20 | max_rows and max_columns are used in __repr__() methods to decide if to_string() or info() is used to render an object to a string. In case Python/IPython is running in a terminal this is set to 0 by default and pandas will correctly auto-detect the width of the terminal and switch to a smaller format in case all columns would not fit vertically. The IPython notebook, IPython qtconsole, or IDLE do not run in a terminal and hence it is not possible to do correct auto-detection, in which case the default is set to 20. ‘None’ value means unlimited. +display.max_colwidth | 50 | The maximum width in characters of a column in the repr of a pandas data structure. When the column overflows, a “…” placeholder is embedded in the output. +display.max_info_columns | 100 | max_info_columns is used in DataFrame.info method to decide if per column information will be printed. +display.max_info_rows | 1690785 | df.info() will usually show null-counts for each column. For large frames this can be quite slow. max_info_rows and max_info_cols limit this null check only to frames with smaller dimensions then specified. +display.max_rows | 60 | This sets the maximum number of rows pandas should output when printing out various output. For example, this value determines whether the repr() for a dataframe prints out fully or just a truncated or summary repr. ‘None’ value means unlimited. +display.min_rows | 10 | The numbers of rows to show in a truncated repr (when max_rows is exceeded). Ignored when max_rows is set to None or 0. When set to None, follows the value of max_rows. +display.max_seq_items | 100 | when pretty-printing a long sequence, no more then max_seq_items will be printed. If items are omitted, they will be denoted by the addition of “…” to the resulting string. If set to None, the number of items to be printed is unlimited. +display.memory_usage | True | This specifies if the memory usage of a DataFrame should be displayed when the df.info() method is invoked. +display.multi_sparse | True | “Sparsify” MultiIndex display (don’t display repeated elements in outer levels within groups) +display.notebook_repr_html | True | When True, IPython notebook will use html representation for pandas objects (if it is available). +display.pprint_nest_depth | 3 | Controls the number of nested levels to process when pretty-printing +display.precision | 6 | Floating point output precision in terms of number of places after the decimal, for regular formatting as well as scientific notation. Similar to numpy’s precision print option +display.show_dimensions | truncate | Whether to print out dimensions at the end of DataFrame repr. If ‘truncate’ is specified, only print out the dimensions if the frame is truncated (e.g. not display all rows and/or columns) +display.width | 80 | Width of the display in characters. In case python/IPython is running in a terminal this can be set to None and pandas will correctly auto-detect the width. Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a terminal and hence it is not possible to correctly detect the width. +display.html.table_schema | False | Whether to publish a Table Schema representation for frontends that support it. +display.html.border | 1 | A border=value attribute is inserted in the ```` tag for the DataFrame HTML repr. +display.html.use_mathjax | True | When True, Jupyter notebook will process table contents using MathJax, rendering mathematical expressions enclosed by the dollar symbol. +io.excel.xls.writer | xlwt | The default Excel writer engine for ‘xls’ files. +io.excel.xlsm.writer | openpyxl | The default Excel writer engine for ‘xlsm’ files. Available options: ‘openpyxl’ (the default). +io.excel.xlsx.writer | openpyxl | The default Excel writer engine for ‘xlsx’ files. +io.hdf.default_format | None | default format writing format, if None, then put will default to ‘fixed’ and append will default to ‘table’ +io.hdf.dropna_table | True | drop ALL nan rows when appending to a table +io.parquet.engine | None | The engine to use as a default for parquet reading and writing. If None then try ‘pyarrow’ and ‘fastparquet’ +mode.chained_assignment | warn | Controls SettingWithCopyWarning: ‘raise’, ‘warn’, or None. Raise an exception, warn, or no action if trying to use [chained assignment](indexing.html#indexing-evaluation-order). +mode.sim_interactive | False | Whether to simulate interactive mode for purposes of testing. +mode.use_inf_as_na | False | True means treat None, NaN, -INF, INF as NA (old way), False means None and NaN are null, but INF, -INF are not NA (new way). +compute.use_bottleneck | True | Use the bottleneck library to accelerate computation if it is installed. +compute.use_numexpr | True | Use the numexpr library to accelerate computation if it is installed. +plotting.backend | matplotlib | Change the plotting backend to a different backend than the current matplotlib one. Backends can be implemented as third-party libraries implementing the pandas plotting API. They can use other plotting libraries like Bokeh, Altair, etc. +plotting.matplotlib.register_converters | True | Register custom converters with matplotlib. Set to False to de-register. + +## Number formatting + +pandas also allows you to set how numbers are displayed in the console. +This option is not set through the ``set_options`` API. + +Use the ``set_eng_float_format`` function +to alter the floating-point formatting of pandas objects to produce a particular +format. + +For instance: + +``` python +In [87]: import numpy as np + +In [88]: pd.set_eng_float_format(accuracy=3, use_eng_prefix=True) + +In [89]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) + +In [90]: s / 1.e3 +Out[90]: +a 303.638u +b -721.084u +c -622.696u +d 648.250u +e -1.945m +dtype: float64 + +In [91]: s / 1.e6 +Out[91]: +a 303.638n +b -721.084n +c -622.696n +d 648.250n +e -1.945u +dtype: float64 +``` + +To round floats on a case-by-case basis, you can also use [``round()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.round.html#pandas.Series.round) and [``round()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.round.html#pandas.DataFrame.round). + +## Unicode formatting + +::: danger Warning + +Enabling this option will affect the performance for printing of DataFrame and Series (about 2 times slower). +Use only when it is actually required. + +::: + +Some East Asian countries use Unicode characters whose width corresponds to two Latin characters. +If a DataFrame or Series contains these characters, the default output mode may not align them properly. + +::: tip Note + +Screen captures are attached for each output to show the actual results. + +::: + +``` python +In [92]: df = pd.DataFrame({'国籍': ['UK', '日本'], '名前': ['Alice', 'しのぶ']}) + +In [93]: df +Out[93]: + 国籍 名前 +0 UK Alice +1 日本 しのぶ +``` + +![option_unicode01](https://static.pypandas.cn/public/static/images/option_unicode01.png) + +Enabling ``display.unicode.east_asian_width`` allows pandas to check each character’s “East Asian Width” property. +These characters can be aligned properly by setting this option to ``True``. However, this will result in longer render +times than the standard ``len`` function. + +``` python +In [94]: pd.set_option('display.unicode.east_asian_width', True) + +In [95]: df +Out[95]: + 国籍 名前 +0 UK Alice +1 日本 しのぶ +``` + +![option_unicode02](https://static.pypandas.cn/public/static/images/option_unicode02.png) + +In addition, Unicode characters whose width is “Ambiguous” can either be 1 or 2 characters wide depending on the +terminal setting or encoding. The option ``display.unicode.ambiguous_as_wide`` can be used to handle the ambiguity. + +By default, an “Ambiguous” character’s width, such as “¡” (inverted exclamation) in the example below, is taken to be 1. + +``` python +In [96]: df = pd.DataFrame({'a': ['xxx', '¡¡'], 'b': ['yyy', '¡¡']}) + +In [97]: df +Out[97]: + a b +0 xxx yyy +1 ¡¡ ¡¡ +``` + +![option_unicode03](https://static.pypandas.cn/public/static/images/option_unicode03.png) + +Enabling ``display.unicode.ambiguous_as_wide`` makes pandas interpret these characters’ widths to be 2. +(Note that this option will only be effective when ``display.unicode.east_asian_width`` is enabled.) + +However, setting this option incorrectly for your terminal will cause these characters to be aligned incorrectly: + +``` python +In [98]: pd.set_option('display.unicode.ambiguous_as_wide', True) + +In [99]: df +Out[99]: + a b +0 xxx yyy +1 ¡¡ ¡¡ +``` + +![option_unicode04](https://static.pypandas.cn/public/static/images/option_unicode04.png) + +## Table schema display + +*New in version 0.20.0.* + +``DataFrame`` and ``Series`` will publish a Table Schema representation +by default. False by default, this can be enabled globally with the +``display.html.table_schema`` option: + +``` python +In [100]: pd.set_option('display.html.table_schema', True) +``` + +Only ``'display.max_rows'`` are serialized and published. diff --git a/Python/pandas/user_guide/reshaping.md b/Python/pandas/user_guide/reshaping.md new file mode 100644 index 00000000..1448831e --- /dev/null +++ b/Python/pandas/user_guide/reshaping.md @@ -0,0 +1,1520 @@ +# Reshaping and pivot tables + +## Reshaping by pivoting DataFrame objects + +![reshaping_pivot](https://static.pypandas.cn/public/static/images/reshaping_pivot.png) + +Data is often stored in so-called “stacked” or “record” format: + +``` python +In [1]: df +Out[1]: + date variable value +0 2000-01-03 A 0.469112 +1 2000-01-04 A -0.282863 +2 2000-01-05 A -1.509059 +3 2000-01-03 B -1.135632 +4 2000-01-04 B 1.212112 +5 2000-01-05 B -0.173215 +6 2000-01-03 C 0.119209 +7 2000-01-04 C -1.044236 +8 2000-01-05 C -0.861849 +9 2000-01-03 D -2.104569 +10 2000-01-04 D -0.494929 +11 2000-01-05 D 1.071804 +``` + +For the curious here is how the above ``DataFrame`` was created: + +``` python +import pandas.util.testing as tm + +tm.N = 3 + + +def unpivot(frame): + N, K = frame.shape + data = {'value': frame.to_numpy().ravel('F'), + 'variable': np.asarray(frame.columns).repeat(N), + 'date': np.tile(np.asarray(frame.index), K)} + return pd.DataFrame(data, columns=['date', 'variable', 'value']) + + +df = unpivot(tm.makeTimeDataFrame()) +``` + +To select out everything for variable ``A`` we could do: + +``` python +In [2]: df[df['variable'] == 'A'] +Out[2]: + date variable value +0 2000-01-03 A 0.469112 +1 2000-01-04 A -0.282863 +2 2000-01-05 A -1.509059 +``` + +But suppose we wish to do time series operations with the variables. A better +representation would be where the ``columns`` are the unique variables and an +``index`` of dates identifies individual observations. To reshape the data into +this form, we use the [``DataFrame.pivot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot) method (also implemented as a +top level function [``pivot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot.html#pandas.pivot)): + +``` python +In [3]: df.pivot(index='date', columns='variable', values='value') +Out[3]: +variable A B C D +date +2000-01-03 0.469112 -1.135632 0.119209 -2.104569 +2000-01-04 -0.282863 1.212112 -1.044236 -0.494929 +2000-01-05 -1.509059 -0.173215 -0.861849 1.071804 +``` + +If the ``values`` argument is omitted, and the input ``DataFrame`` has more than +one column of values which are not used as column or index inputs to ``pivot``, +then the resulting “pivoted” ``DataFrame`` will have [hierarchical columns](advanced.html#advanced-hierarchical) whose topmost level indicates the respective value +column: + +``` python +In [4]: df['value2'] = df['value'] * 2 + +In [5]: pivoted = df.pivot(index='date', columns='variable') + +In [6]: pivoted +Out[6]: + value value2 +variable A B C D A B C D +date +2000-01-03 0.469112 -1.135632 0.119209 -2.104569 0.938225 -2.271265 0.238417 -4.209138 +2000-01-04 -0.282863 1.212112 -1.044236 -0.494929 -0.565727 2.424224 -2.088472 -0.989859 +2000-01-05 -1.509059 -0.173215 -0.861849 1.071804 -3.018117 -0.346429 -1.723698 2.143608 +``` + +You can then select subsets from the pivoted ``DataFrame``: + +``` python +In [7]: pivoted['value2'] +Out[7]: +variable A B C D +date +2000-01-03 0.938225 -2.271265 0.238417 -4.209138 +2000-01-04 -0.565727 2.424224 -2.088472 -0.989859 +2000-01-05 -3.018117 -0.346429 -1.723698 2.143608 +``` + +Note that this returns a view on the underlying data in the case where the data +are homogeneously-typed. + +::: tip Note + +[``pivot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot.html#pandas.pivot) will error with a ``ValueError: Index contains duplicate +entries, cannot reshape`` if the index/column pair is not unique. In this +case, consider using [``pivot_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html#pandas.pivot_table) which is a generalization +of pivot that can handle duplicate values for one index/column pair. + +::: + +## Reshaping by stacking and unstacking + +![reshaping_stack](https://static.pypandas.cn/public/static/images/reshaping_stack.png) + +Closely related to the [``pivot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot) method are the related +[``stack()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.stack.html#pandas.DataFrame.stack) and [``unstack()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html#pandas.DataFrame.unstack) methods available on +``Series`` and ``DataFrame``. These methods are designed to work together with +``MultiIndex`` objects (see the section on [hierarchical indexing](advanced.html#advanced-hierarchical)). Here are essentially what these methods do: + +- ``stack``: “pivot” a level of the (possibly hierarchical) column labels, +returning a ``DataFrame`` with an index with a new inner-most level of row +labels. +- ``unstack``: (inverse operation of ``stack``) “pivot” a level of the +(possibly hierarchical) row index to the column axis, producing a reshaped +``DataFrame`` with a new inner-most level of column labels. + +![reshaping_unstack](https://static.pypandas.cn/public/static/images/reshaping_unstack.png) + +The clearest way to explain is by example. Let’s take a prior example data set +from the hierarchical indexing section: + +``` python +In [8]: tuples = list(zip(*[['bar', 'bar', 'baz', 'baz', + ...: 'foo', 'foo', 'qux', 'qux'], + ...: ['one', 'two', 'one', 'two', + ...: 'one', 'two', 'one', 'two']])) + ...: + +In [9]: index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second']) + +In [10]: df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B']) + +In [11]: df2 = df[:4] + +In [12]: df2 +Out[12]: + A B +first second +bar one 0.721555 -0.706771 + two -1.039575 0.271860 +baz one -0.424972 0.567020 + two 0.276232 -1.087401 +``` + +The ``stack`` function “compresses” a level in the ``DataFrame``’s columns to +produce either: + +- A ``Series``, in the case of a simple column Index. +- A ``DataFrame``, in the case of a ``MultiIndex`` in the columns. + +If the columns have a ``MultiIndex``, you can choose which level to stack. The +stacked level becomes the new lowest level in a ``MultiIndex`` on the columns: + +``` python +In [13]: stacked = df2.stack() + +In [14]: stacked +Out[14]: +first second +bar one A 0.721555 + B -0.706771 + two A -1.039575 + B 0.271860 +baz one A -0.424972 + B 0.567020 + two A 0.276232 + B -1.087401 +dtype: float64 +``` + +With a “stacked” ``DataFrame`` or ``Series`` (having a ``MultiIndex`` as the +``index``), the inverse operation of ``stack`` is ``unstack``, which by default +unstacks the **last level**: + +``` python +In [15]: stacked.unstack() +Out[15]: + A B +first second +bar one 0.721555 -0.706771 + two -1.039575 0.271860 +baz one -0.424972 0.567020 + two 0.276232 -1.087401 + +In [16]: stacked.unstack(1) +Out[16]: +second one two +first +bar A 0.721555 -1.039575 + B -0.706771 0.271860 +baz A -0.424972 0.276232 + B 0.567020 -1.087401 + +In [17]: stacked.unstack(0) +Out[17]: +first bar baz +second +one A 0.721555 -0.424972 + B -0.706771 0.567020 +two A -1.039575 0.276232 + B 0.271860 -1.087401 +``` + +![reshaping_unstack_1](https://static.pypandas.cn/public/static/images/reshaping_unstack_1.png) + +If the indexes have names, you can use the level names instead of specifying +the level numbers: + +``` python +In [18]: stacked.unstack('second') +Out[18]: +second one two +first +bar A 0.721555 -1.039575 + B -0.706771 0.271860 +baz A -0.424972 0.276232 + B 0.567020 -1.087401 +``` + +![reshaping_unstack_0](https://static.pypandas.cn/public/static/images/reshaping_unstack_0.png) + +Notice that the ``stack`` and ``unstack`` methods implicitly sort the index +levels involved. Hence a call to ``stack`` and then ``unstack``, or vice versa, +will result in a **sorted** copy of the original ``DataFrame`` or ``Series``: + +``` python +In [19]: index = pd.MultiIndex.from_product([[2, 1], ['a', 'b']]) + +In [20]: df = pd.DataFrame(np.random.randn(4), index=index, columns=['A']) + +In [21]: df +Out[21]: + A +2 a -0.370647 + b -1.157892 +1 a -1.344312 + b 0.844885 + +In [22]: all(df.unstack().stack() == df.sort_index()) +Out[22]: True +``` + +The above code will raise a ``TypeError`` if the call to ``sort_index`` is +removed. + +### Multiple levels + +You may also stack or unstack more than one level at a time by passing a list +of levels, in which case the end result is as if each level in the list were +processed individually. + +``` python +In [23]: columns = pd.MultiIndex.from_tuples([ + ....: ('A', 'cat', 'long'), ('B', 'cat', 'long'), + ....: ('A', 'dog', 'short'), ('B', 'dog', 'short')], + ....: names=['exp', 'animal', 'hair_length'] + ....: ) + ....: + +In [24]: df = pd.DataFrame(np.random.randn(4, 4), columns=columns) + +In [25]: df +Out[25]: +exp A B A B +animal cat cat dog dog +hair_length long long short short +0 1.075770 -0.109050 1.643563 -1.469388 +1 0.357021 -0.674600 -1.776904 -0.968914 +2 -1.294524 0.413738 0.276662 -0.472035 +3 -0.013960 -0.362543 -0.006154 -0.923061 + +In [26]: df.stack(level=['animal', 'hair_length']) +Out[26]: +exp A B + animal hair_length +0 cat long 1.075770 -0.109050 + dog short 1.643563 -1.469388 +1 cat long 0.357021 -0.674600 + dog short -1.776904 -0.968914 +2 cat long -1.294524 0.413738 + dog short 0.276662 -0.472035 +3 cat long -0.013960 -0.362543 + dog short -0.006154 -0.923061 +``` + +The list of levels can contain either level names or level numbers (but +not a mixture of the two). + +``` python +# df.stack(level=['animal', 'hair_length']) +# from above is equivalent to: +In [27]: df.stack(level=[1, 2]) +Out[27]: +exp A B + animal hair_length +0 cat long 1.075770 -0.109050 + dog short 1.643563 -1.469388 +1 cat long 0.357021 -0.674600 + dog short -1.776904 -0.968914 +2 cat long -1.294524 0.413738 + dog short 0.276662 -0.472035 +3 cat long -0.013960 -0.362543 + dog short -0.006154 -0.923061 +``` + +### Missing data + +These functions are intelligent about handling missing data and do not expect +each subgroup within the hierarchical index to have the same set of labels. +They also can handle the index being unsorted (but you can make it sorted by +calling ``sort_index``, of course). Here is a more complex example: + +``` python +In [28]: columns = pd.MultiIndex.from_tuples([('A', 'cat'), ('B', 'dog'), + ....: ('B', 'cat'), ('A', 'dog')], + ....: names=['exp', 'animal']) + ....: + +In [29]: index = pd.MultiIndex.from_product([('bar', 'baz', 'foo', 'qux'), + ....: ('one', 'two')], + ....: names=['first', 'second']) + ....: + +In [30]: df = pd.DataFrame(np.random.randn(8, 4), index=index, columns=columns) + +In [31]: df2 = df.iloc[[0, 1, 2, 4, 5, 7]] + +In [32]: df2 +Out[32]: +exp A B A +animal cat dog cat dog +first second +bar one 0.895717 0.805244 -1.206412 2.565646 + two 1.431256 1.340309 -1.170299 -0.226169 +baz one 0.410835 0.813850 0.132003 -0.827317 +foo one -1.413681 1.607920 1.024180 0.569605 + two 0.875906 -2.211372 0.974466 -2.006747 +qux two -1.226825 0.769804 -1.281247 -0.727707 +``` + +As mentioned above, ``stack`` can be called with a ``level`` argument to select +which level in the columns to stack: + +``` python +In [33]: df2.stack('exp') +Out[33]: +animal cat dog +first second exp +bar one A 0.895717 2.565646 + B -1.206412 0.805244 + two A 1.431256 -0.226169 + B -1.170299 1.340309 +baz one A 0.410835 -0.827317 + B 0.132003 0.813850 +foo one A -1.413681 0.569605 + B 1.024180 1.607920 + two A 0.875906 -2.006747 + B 0.974466 -2.211372 +qux two A -1.226825 -0.727707 + B -1.281247 0.769804 + +In [34]: df2.stack('animal') +Out[34]: +exp A B +first second animal +bar one cat 0.895717 -1.206412 + dog 2.565646 0.805244 + two cat 1.431256 -1.170299 + dog -0.226169 1.340309 +baz one cat 0.410835 0.132003 + dog -0.827317 0.813850 +foo one cat -1.413681 1.024180 + dog 0.569605 1.607920 + two cat 0.875906 0.974466 + dog -2.006747 -2.211372 +qux two cat -1.226825 -1.281247 + dog -0.727707 0.769804 +``` + +Unstacking can result in missing values if subgroups do not have the same +set of labels. By default, missing values will be replaced with the default +fill value for that data type, ``NaN`` for float, ``NaT`` for datetimelike, +etc. For integer types, by default data will converted to float and missing +values will be set to ``NaN``. + +``` python +In [35]: df3 = df.iloc[[0, 1, 4, 7], [1, 2]] + +In [36]: df3 +Out[36]: +exp B +animal dog cat +first second +bar one 0.805244 -1.206412 + two 1.340309 -1.170299 +foo one 1.607920 1.024180 +qux two 0.769804 -1.281247 + +In [37]: df3.unstack() +Out[37]: +exp B +animal dog cat +second one two one two +first +bar 0.805244 1.340309 -1.206412 -1.170299 +foo 1.607920 NaN 1.024180 NaN +qux NaN 0.769804 NaN -1.281247 +``` + +*New in version 0.18.0.* + +Alternatively, unstack takes an optional ``fill_value`` argument, for specifying +the value of missing data. + +``` python +In [38]: df3.unstack(fill_value=-1e9) +Out[38]: +exp B +animal dog cat +second one two one two +first +bar 8.052440e-01 1.340309e+00 -1.206412e+00 -1.170299e+00 +foo 1.607920e+00 -1.000000e+09 1.024180e+00 -1.000000e+09 +qux -1.000000e+09 7.698036e-01 -1.000000e+09 -1.281247e+00 +``` + +### With a MultiIndex + +Unstacking when the columns are a ``MultiIndex`` is also careful about doing +the right thing: + +``` python +In [39]: df[:3].unstack(0) +Out[39]: +exp A B A +animal cat dog cat dog +first bar baz bar baz bar baz bar baz +second +one 0.895717 0.410835 0.805244 0.81385 -1.206412 0.132003 2.565646 -0.827317 +two 1.431256 NaN 1.340309 NaN -1.170299 NaN -0.226169 NaN + +In [40]: df2.unstack(1) +Out[40]: +exp A B A +animal cat dog cat dog +second one two one two one two one two +first +bar 0.895717 1.431256 0.805244 1.340309 -1.206412 -1.170299 2.565646 -0.226169 +baz 0.410835 NaN 0.813850 NaN 0.132003 NaN -0.827317 NaN +foo -1.413681 0.875906 1.607920 -2.211372 1.024180 0.974466 0.569605 -2.006747 +qux NaN -1.226825 NaN 0.769804 NaN -1.281247 NaN -0.727707 +``` + +## Reshaping by Melt + +![reshaping_melt](https://static.pypandas.cn/public/static/images/reshaping_melt.png) + +The top-level [``melt()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html#pandas.melt) function and the corresponding [``DataFrame.melt()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.melt.html#pandas.DataFrame.melt) +are useful to massage a ``DataFrame`` into a format where one or more columns +are *identifier variables*, while all other columns, considered *measured +variables*, are “unpivoted” to the row axis, leaving just two non-identifier +columns, “variable” and “value”. The names of those columns can be customized +by supplying the ``var_name`` and ``value_name`` parameters. + +For instance, + +``` python +In [41]: cheese = pd.DataFrame({'first': ['John', 'Mary'], + ....: 'last': ['Doe', 'Bo'], + ....: 'height': [5.5, 6.0], + ....: 'weight': [130, 150]}) + ....: + +In [42]: cheese +Out[42]: + first last height weight +0 John Doe 5.5 130 +1 Mary Bo 6.0 150 + +In [43]: cheese.melt(id_vars=['first', 'last']) +Out[43]: + first last variable value +0 John Doe height 5.5 +1 Mary Bo height 6.0 +2 John Doe weight 130.0 +3 Mary Bo weight 150.0 + +In [44]: cheese.melt(id_vars=['first', 'last'], var_name='quantity') +Out[44]: + first last quantity value +0 John Doe height 5.5 +1 Mary Bo height 6.0 +2 John Doe weight 130.0 +3 Mary Bo weight 150.0 +``` + +Another way to transform is to use the [``wide_to_long()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.wide_to_long.html#pandas.wide_to_long) panel data +convenience function. It is less flexible than [``melt()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html#pandas.melt), but more +user-friendly. + +``` python +In [45]: dft = pd.DataFrame({"A1970": {0: "a", 1: "b", 2: "c"}, + ....: "A1980": {0: "d", 1: "e", 2: "f"}, + ....: "B1970": {0: 2.5, 1: 1.2, 2: .7}, + ....: "B1980": {0: 3.2, 1: 1.3, 2: .1}, + ....: "X": dict(zip(range(3), np.random.randn(3))) + ....: }) + ....: + +In [46]: dft["id"] = dft.index + +In [47]: dft +Out[47]: + A1970 A1980 B1970 B1980 X id +0 a d 2.5 3.2 -0.121306 0 +1 b e 1.2 1.3 -0.097883 1 +2 c f 0.7 0.1 0.695775 2 + +In [48]: pd.wide_to_long(dft, ["A", "B"], i="id", j="year") +Out[48]: + X A B +id year +0 1970 -0.121306 a 2.5 +1 1970 -0.097883 b 1.2 +2 1970 0.695775 c 0.7 +0 1980 -0.121306 d 3.2 +1 1980 -0.097883 e 1.3 +2 1980 0.695775 f 0.1 +``` + +## Combining with stats and GroupBy + +It should be no shock that combining ``pivot`` / ``stack`` / ``unstack`` with +GroupBy and the basic Series and DataFrame statistical functions can produce +some very expressive and fast data manipulations. + +``` python +In [49]: df +Out[49]: +exp A B A +animal cat dog cat dog +first second +bar one 0.895717 0.805244 -1.206412 2.565646 + two 1.431256 1.340309 -1.170299 -0.226169 +baz one 0.410835 0.813850 0.132003 -0.827317 + two -0.076467 -1.187678 1.130127 -1.436737 +foo one -1.413681 1.607920 1.024180 0.569605 + two 0.875906 -2.211372 0.974466 -2.006747 +qux one -0.410001 -0.078638 0.545952 -1.219217 + two -1.226825 0.769804 -1.281247 -0.727707 + +In [50]: df.stack().mean(1).unstack() +Out[50]: +animal cat dog +first second +bar one -0.155347 1.685445 + two 0.130479 0.557070 +baz one 0.271419 -0.006733 + two 0.526830 -1.312207 +foo one -0.194750 1.088763 + two 0.925186 -2.109060 +qux one 0.067976 -0.648927 + two -1.254036 0.021048 + +# same result, another way +In [51]: df.groupby(level=1, axis=1).mean() +Out[51]: +animal cat dog +first second +bar one -0.155347 1.685445 + two 0.130479 0.557070 +baz one 0.271419 -0.006733 + two 0.526830 -1.312207 +foo one -0.194750 1.088763 + two 0.925186 -2.109060 +qux one 0.067976 -0.648927 + two -1.254036 0.021048 + +In [52]: df.stack().groupby(level=1).mean() +Out[52]: +exp A B +second +one 0.071448 0.455513 +two -0.424186 -0.204486 + +In [53]: df.mean().unstack(0) +Out[53]: +exp A B +animal +cat 0.060843 0.018596 +dog -0.413580 0.232430 +``` + +## Pivot tables + +While [``pivot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot) provides general purpose pivoting with various +data types (strings, numerics, etc.), pandas also provides [``pivot_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html#pandas.pivot_table) +for pivoting with aggregation of numeric data. + +The function [``pivot_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html#pandas.pivot_table) can be used to create spreadsheet-style +pivot tables. See the [cookbook](cookbook.html#cookbook-pivot) for some advanced +strategies. + +It takes a number of arguments: + +- ``data``: a DataFrame object. +- ``values``: a column or a list of columns to aggregate. +- ``index``: a column, Grouper, array which has the same length as data, or list of them. +Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values. +- ``columns``: a column, Grouper, array which has the same length as data, or list of them. +Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values. +- ``aggfunc``: function to use for aggregation, defaulting to ``numpy.mean``. + +Consider a data set like this: + +``` python +In [54]: import datetime + +In [55]: df = pd.DataFrame({'A': ['one', 'one', 'two', 'three'] * 6, + ....: 'B': ['A', 'B', 'C'] * 8, + ....: 'C': ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4, + ....: 'D': np.random.randn(24), + ....: 'E': np.random.randn(24), + ....: 'F': [datetime.datetime(2013, i, 1) for i in range(1, 13)] + ....: + [datetime.datetime(2013, i, 15) for i in range(1, 13)]}) + ....: + +In [56]: df +Out[56]: + A B C D E F +0 one A foo 0.341734 -0.317441 2013-01-01 +1 one B foo 0.959726 -1.236269 2013-02-01 +2 two C foo -1.110336 0.896171 2013-03-01 +3 three A bar -0.619976 -0.487602 2013-04-01 +4 one B bar 0.149748 -0.082240 2013-05-01 +.. ... .. ... ... ... ... +19 three B foo 0.690579 -2.213588 2013-08-15 +20 one C foo 0.995761 1.063327 2013-09-15 +21 one A bar 2.396780 1.266143 2013-10-15 +22 two B bar 0.014871 0.299368 2013-11-15 +23 three C bar 3.357427 -0.863838 2013-12-15 + +[24 rows x 6 columns] +``` + +We can produce pivot tables from this data very easily: + +``` python +In [57]: pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C']) +Out[57]: +C bar foo +A B +one A 1.120915 -0.514058 + B -0.338421 0.002759 + C -0.538846 0.699535 +three A -1.181568 NaN + B NaN 0.433512 + C 0.588783 NaN +two A NaN 1.000985 + B 0.158248 NaN + C NaN 0.176180 + +In [58]: pd.pivot_table(df, values='D', index=['B'], columns=['A', 'C'], aggfunc=np.sum) +Out[58]: +A one three two +C bar foo bar foo bar foo +B +A 2.241830 -1.028115 -2.363137 NaN NaN 2.001971 +B -0.676843 0.005518 NaN 0.867024 0.316495 NaN +C -1.077692 1.399070 1.177566 NaN NaN 0.352360 + +In [59]: pd.pivot_table(df, values=['D', 'E'], index=['B'], columns=['A', 'C'], + ....: aggfunc=np.sum) + ....: +Out[59]: + D E +A one three two one three two +C bar foo bar foo bar foo bar foo bar foo bar foo +B +A 2.241830 -1.028115 -2.363137 NaN NaN 2.001971 2.786113 -0.043211 1.922577 NaN NaN 0.128491 +B -0.676843 0.005518 NaN 0.867024 0.316495 NaN 1.368280 -1.103384 NaN -2.128743 -0.194294 NaN +C -1.077692 1.399070 1.177566 NaN NaN 0.352360 -1.976883 1.495717 -0.263660 NaN NaN 0.872482 +``` + +The result object is a ``DataFrame`` having potentially hierarchical indexes on the +rows and columns. If the ``values`` column name is not given, the pivot table +will include all of the data that can be aggregated in an additional level of +hierarchy in the columns: + +``` python +In [60]: pd.pivot_table(df, index=['A', 'B'], columns=['C']) +Out[60]: + D E +C bar foo bar foo +A B +one A 1.120915 -0.514058 1.393057 -0.021605 + B -0.338421 0.002759 0.684140 -0.551692 + C -0.538846 0.699535 -0.988442 0.747859 +three A -1.181568 NaN 0.961289 NaN + B NaN 0.433512 NaN -1.064372 + C 0.588783 NaN -0.131830 NaN +two A NaN 1.000985 NaN 0.064245 + B 0.158248 NaN -0.097147 NaN + C NaN 0.176180 NaN 0.436241 +``` + +Also, you can use ``Grouper`` for ``index`` and ``columns`` keywords. For detail of ``Grouper``, see [Grouping with a Grouper specification](groupby.html#groupby-specify). + +``` python +In [61]: pd.pivot_table(df, values='D', index=pd.Grouper(freq='M', key='F'), + ....: columns='C') + ....: +Out[61]: +C bar foo +F +2013-01-31 NaN -0.514058 +2013-02-28 NaN 0.002759 +2013-03-31 NaN 0.176180 +2013-04-30 -1.181568 NaN +2013-05-31 -0.338421 NaN +2013-06-30 -0.538846 NaN +2013-07-31 NaN 1.000985 +2013-08-31 NaN 0.433512 +2013-09-30 NaN 0.699535 +2013-10-31 1.120915 NaN +2013-11-30 0.158248 NaN +2013-12-31 0.588783 NaN +``` + +You can render a nice output of the table omitting the missing values by +calling ``to_string`` if you wish: + +``` python +In [62]: table = pd.pivot_table(df, index=['A', 'B'], columns=['C']) + +In [63]: print(table.to_string(na_rep='')) + D E +C bar foo bar foo +A B +one A 1.120915 -0.514058 1.393057 -0.021605 + B -0.338421 0.002759 0.684140 -0.551692 + C -0.538846 0.699535 -0.988442 0.747859 +three A -1.181568 0.961289 + B 0.433512 -1.064372 + C 0.588783 -0.131830 +two A 1.000985 0.064245 + B 0.158248 -0.097147 + C 0.176180 0.436241 +``` + +Note that ``pivot_table`` is also available as an instance method on DataFrame, + +### Adding margins + +If you pass ``margins=True`` to ``pivot_table``, special ``All`` columns and +rows will be added with partial group aggregates across the categories on the +rows and columns: + +``` python +In [64]: df.pivot_table(index=['A', 'B'], columns='C', margins=True, aggfunc=np.std) +Out[64]: + D E +C bar foo All bar foo All +A B +one A 1.804346 1.210272 1.569879 0.179483 0.418374 0.858005 + B 0.690376 1.353355 0.898998 1.083825 0.968138 1.101401 + C 0.273641 0.418926 0.771139 1.689271 0.446140 1.422136 +three A 0.794212 NaN 0.794212 2.049040 NaN 2.049040 + B NaN 0.363548 0.363548 NaN 1.625237 1.625237 + C 3.915454 NaN 3.915454 1.035215 NaN 1.035215 +two A NaN 0.442998 0.442998 NaN 0.447104 0.447104 + B 0.202765 NaN 0.202765 0.560757 NaN 0.560757 + C NaN 1.819408 1.819408 NaN 0.650439 0.650439 +All 1.556686 0.952552 1.246608 1.250924 0.899904 1.059389 +``` + +## Cross tabulations + +Use [``crosstab()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.crosstab.html#pandas.crosstab) to compute a cross-tabulation of two (or more) +factors. By default ``crosstab`` computes a frequency table of the factors +unless an array of values and an aggregation function are passed. + +It takes a number of arguments + +- ``index``: array-like, values to group by in the rows. +- ``columns``: array-like, values to group by in the columns. +- ``values``: array-like, optional, array of values to aggregate according to +the factors. +- ``aggfunc``: function, optional, If no values array is passed, computes a +frequency table. +- ``rownames``: sequence, default ``None``, must match number of row arrays passed. +- ``colnames``: sequence, default ``None``, if passed, must match number of column +arrays passed. +- ``margins``: boolean, default ``False``, Add row/column margins (subtotals) +- ``normalize``: boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default ``False``. +Normalize by dividing all values by the sum of values. + +Any ``Series`` passed will have their name attributes used unless row or column +names for the cross-tabulation are specified + +For example: + +``` python +In [65]: foo, bar, dull, shiny, one, two = 'foo', 'bar', 'dull', 'shiny', 'one', 'two' + +In [66]: a = np.array([foo, foo, bar, bar, foo, foo], dtype=object) + +In [67]: b = np.array([one, one, two, one, two, one], dtype=object) + +In [68]: c = np.array([dull, dull, shiny, dull, dull, shiny], dtype=object) + +In [69]: pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) +Out[69]: +b one two +c dull shiny dull shiny +a +bar 1 0 0 1 +foo 2 1 1 0 +``` + +If ``crosstab`` receives only two Series, it will provide a frequency table. + +``` python +In [70]: df = pd.DataFrame({'A': [1, 2, 2, 2, 2], 'B': [3, 3, 4, 4, 4], + ....: 'C': [1, 1, np.nan, 1, 1]}) + ....: + +In [71]: df +Out[71]: + A B C +0 1 3 1.0 +1 2 3 1.0 +2 2 4 NaN +3 2 4 1.0 +4 2 4 1.0 + +In [72]: pd.crosstab(df.A, df.B) +Out[72]: +B 3 4 +A +1 1 0 +2 1 3 +``` + +Any input passed containing ``Categorical`` data will have **all** of its +categories included in the cross-tabulation, even if the actual data does +not contain any instances of a particular category. + +``` python +In [73]: foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c']) + +In [74]: bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f']) + +In [75]: pd.crosstab(foo, bar) +Out[75]: +col_0 d e +row_0 +a 1 0 +b 0 1 +``` + +### Normalization + +*New in version 0.18.1.* + +Frequency tables can also be normalized to show percentages rather than counts +using the ``normalize`` argument: + +``` python +In [76]: pd.crosstab(df.A, df.B, normalize=True) +Out[76]: +B 3 4 +A +1 0.2 0.0 +2 0.2 0.6 +``` + +``normalize`` can also normalize values within each row or within each column: + +``` python +In [77]: pd.crosstab(df.A, df.B, normalize='columns') +Out[77]: +B 3 4 +A +1 0.5 0.0 +2 0.5 1.0 +``` + +``crosstab`` can also be passed a third ``Series`` and an aggregation function +(``aggfunc``) that will be applied to the values of the third ``Series`` within +each group defined by the first two ``Series``: + +``` python +In [78]: pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum) +Out[78]: +B 3 4 +A +1 1.0 NaN +2 1.0 2.0 +``` + +### Adding margins + +Finally, one can also add margins or normalize this output. + +``` python +In [79]: pd.crosstab(df.A, df.B, values=df.C, aggfunc=np.sum, normalize=True, + ....: margins=True) + ....: +Out[79]: +B 3 4 All +A +1 0.25 0.0 0.25 +2 0.25 0.5 0.75 +All 0.50 0.5 1.00 +``` + +## Tiling + +The [``cut()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html#pandas.cut) function computes groupings for the values of the input +array and is often used to transform continuous variables to discrete or +categorical variables: + +``` python +In [80]: ages = np.array([10, 15, 13, 12, 23, 25, 28, 59, 60]) + +In [81]: pd.cut(ages, bins=3) +Out[81]: +[(9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (9.95, 26.667], (26.667, 43.333], (43.333, 60.0], (43.333, 60.0]] +Categories (3, interval[float64]): [(9.95, 26.667] < (26.667, 43.333] < (43.333, 60.0]] +``` + +If the ``bins`` keyword is an integer, then equal-width bins are formed. +Alternatively we can specify custom bin-edges: + +``` python +In [82]: c = pd.cut(ages, bins=[0, 18, 35, 70]) + +In [83]: c +Out[83]: +[(0, 18], (0, 18], (0, 18], (0, 18], (18, 35], (18, 35], (18, 35], (35, 70], (35, 70]] +Categories (3, interval[int64]): [(0, 18] < (18, 35] < (35, 70]] +``` + +*New in version 0.20.0.* + +If the ``bins`` keyword is an ``IntervalIndex``, then these will be +used to bin the passed data.: + +``` python +pd.cut([25, 20, 50], bins=c.categories) +``` + +## Computing indicator / dummy variables + +To convert a categorical variable into a “dummy” or “indicator” ``DataFrame``, +for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct +values, can derive a ``DataFrame`` containing ``k`` columns of 1s and 0s using +[``get_dummies()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html#pandas.get_dummies): + +``` python +In [84]: df = pd.DataFrame({'key': list('bbacab'), 'data1': range(6)}) + +In [85]: pd.get_dummies(df['key']) +Out[85]: + a b c +0 0 1 0 +1 0 1 0 +2 1 0 0 +3 0 0 1 +4 1 0 0 +5 0 1 0 +``` + +Sometimes it’s useful to prefix the column names, for example when merging the result +with the original ``DataFrame``: + +``` python +In [86]: dummies = pd.get_dummies(df['key'], prefix='key') + +In [87]: dummies +Out[87]: + key_a key_b key_c +0 0 1 0 +1 0 1 0 +2 1 0 0 +3 0 0 1 +4 1 0 0 +5 0 1 0 + +In [88]: df[['data1']].join(dummies) +Out[88]: + data1 key_a key_b key_c +0 0 0 1 0 +1 1 0 1 0 +2 2 1 0 0 +3 3 0 0 1 +4 4 1 0 0 +5 5 0 1 0 +``` + +This function is often used along with discretization functions like ``cut``: + +``` python +In [89]: values = np.random.randn(10) + +In [90]: values +Out[90]: +array([ 0.4082, -1.0481, -0.0257, -0.9884, 0.0941, 1.2627, 1.29 , + 0.0824, -0.0558, 0.5366]) + +In [91]: bins = [0, 0.2, 0.4, 0.6, 0.8, 1] + +In [92]: pd.get_dummies(pd.cut(values, bins)) +Out[92]: + (0.0, 0.2] (0.2, 0.4] (0.4, 0.6] (0.6, 0.8] (0.8, 1.0] +0 0 0 1 0 0 +1 0 0 0 0 0 +2 0 0 0 0 0 +3 0 0 0 0 0 +4 1 0 0 0 0 +5 0 0 0 0 0 +6 0 0 0 0 0 +7 1 0 0 0 0 +8 0 0 0 0 0 +9 0 0 1 0 0 +``` + +See also [``Series.str.get_dummies``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get_dummies.html#pandas.Series.str.get_dummies). + +[``get_dummies()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html#pandas.get_dummies) also accepts a ``DataFrame``. By default all categorical +variables (categorical in the statistical sense, those with *object* or +*categorical* dtype) are encoded as dummy variables. + +``` python +In [93]: df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'], + ....: 'C': [1, 2, 3]}) + ....: + +In [94]: pd.get_dummies(df) +Out[94]: + C A_a A_b B_b B_c +0 1 1 0 0 1 +1 2 0 1 0 1 +2 3 1 0 1 0 +``` + +All non-object columns are included untouched in the output. You can control +the columns that are encoded with the ``columns`` keyword. + +``` python +In [95]: pd.get_dummies(df, columns=['A']) +Out[95]: + B C A_a A_b +0 c 1 1 0 +1 c 2 0 1 +2 b 3 1 0 +``` + +Notice that the ``B`` column is still included in the output, it just hasn’t +been encoded. You can drop ``B`` before calling ``get_dummies`` if you don’t +want to include it in the output. + +As with the ``Series`` version, you can pass values for the ``prefix`` and +``prefix_sep``. By default the column name is used as the prefix, and ‘_’ as +the prefix separator. You can specify ``prefix`` and ``prefix_sep`` in 3 ways: + +- string: Use the same value for ``prefix`` or ``prefix_sep`` for each column +to be encoded. +- list: Must be the same length as the number of columns being encoded. +- dict: Mapping column name to prefix. + +``` python +In [96]: simple = pd.get_dummies(df, prefix='new_prefix') + +In [97]: simple +Out[97]: + C new_prefix_a new_prefix_b new_prefix_b new_prefix_c +0 1 1 0 0 1 +1 2 0 1 0 1 +2 3 1 0 1 0 + +In [98]: from_list = pd.get_dummies(df, prefix=['from_A', 'from_B']) + +In [99]: from_list +Out[99]: + C from_A_a from_A_b from_B_b from_B_c +0 1 1 0 0 1 +1 2 0 1 0 1 +2 3 1 0 1 0 + +In [100]: from_dict = pd.get_dummies(df, prefix={'B': 'from_B', 'A': 'from_A'}) + +In [101]: from_dict +Out[101]: + C from_A_a from_A_b from_B_b from_B_c +0 1 1 0 0 1 +1 2 0 1 0 1 +2 3 1 0 1 0 +``` + +*New in version 0.18.0.* + +Sometimes it will be useful to only keep k-1 levels of a categorical +variable to avoid collinearity when feeding the result to statistical models. +You can switch to this mode by turn on ``drop_first``. + +``` python +In [102]: s = pd.Series(list('abcaa')) + +In [103]: pd.get_dummies(s) +Out[103]: + a b c +0 1 0 0 +1 0 1 0 +2 0 0 1 +3 1 0 0 +4 1 0 0 + +In [104]: pd.get_dummies(s, drop_first=True) +Out[104]: + b c +0 0 0 +1 1 0 +2 0 1 +3 0 0 +4 0 0 +``` + +When a column contains only one level, it will be omitted in the result. + +``` python +In [105]: df = pd.DataFrame({'A': list('aaaaa'), 'B': list('ababc')}) + +In [106]: pd.get_dummies(df) +Out[106]: + A_a B_a B_b B_c +0 1 1 0 0 +1 1 0 1 0 +2 1 1 0 0 +3 1 0 1 0 +4 1 0 0 1 + +In [107]: pd.get_dummies(df, drop_first=True) +Out[107]: + B_b B_c +0 0 0 +1 1 0 +2 0 0 +3 1 0 +4 0 1 +``` + +By default new columns will have ``np.uint8`` dtype. +To choose another dtype, use the ``dtype`` argument: + +``` python +In [108]: df = pd.DataFrame({'A': list('abc'), 'B': [1.1, 2.2, 3.3]}) + +In [109]: pd.get_dummies(df, dtype=bool).dtypes +Out[109]: +B float64 +A_a bool +A_b bool +A_c bool +dtype: object +``` + +*New in version 0.23.0.* + +## Factorizing values + +To encode 1-d values as an enumerated type use [``factorize()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html#pandas.factorize): + +``` python +In [110]: x = pd.Series(['A', 'A', np.nan, 'B', 3.14, np.inf]) + +In [111]: x +Out[111]: +0 A +1 A +2 NaN +3 B +4 3.14 +5 inf +dtype: object + +In [112]: labels, uniques = pd.factorize(x) + +In [113]: labels +Out[113]: array([ 0, 0, -1, 1, 2, 3]) + +In [114]: uniques +Out[114]: Index(['A', 'B', 3.14, inf], dtype='object') +``` + +Note that ``factorize`` is similar to ``numpy.unique``, but differs in its +handling of NaN: + +::: tip Note + +The following ``numpy.unique`` will fail under Python 3 with a ``TypeError`` +because of an ordering bug. See also +[here](https://github.com/numpy/numpy/issues/641). + +::: + +``` python +In [1]: x = pd.Series(['A', 'A', np.nan, 'B', 3.14, np.inf]) +In [2]: pd.factorize(x, sort=True) +Out[2]: +(array([ 2, 2, -1, 3, 0, 1]), + Index([3.14, inf, 'A', 'B'], dtype='object')) + +In [3]: np.unique(x, return_inverse=True)[::-1] +Out[3]: (array([3, 3, 0, 4, 1, 2]), array([nan, 3.14, inf, 'A', 'B'], dtype=object)) +``` + +::: tip Note + +If you just want to handle one column as a categorical variable (like R’s factor), +you can use ``df["cat_col"] = pd.Categorical(df["col"])`` or +``df["cat_col"] = df["col"].astype("category")``. For full docs on [``Categorical``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html#pandas.Categorical), +see the [Categorical introduction](categorical.html#categorical) and the +[API documentation](https://pandas.pydata.org/pandas-docs/stable/reference/arrays.html#api-arrays-categorical). + +::: + +## Examples + +In this section, we will review frequently asked questions and examples. The +column names and relevant column values are named to correspond with how this +DataFrame will be pivoted in the answers below. + +``` python +In [115]: np.random.seed([3, 1415]) + +In [116]: n = 20 + +In [117]: cols = np.array(['key', 'row', 'item', 'col']) + +In [118]: df = cols + pd.DataFrame((np.random.randint(5, size=(n, 4)) + .....: // [2, 1, 2, 1]).astype(str)) + .....: + +In [119]: df.columns = cols + +In [120]: df = df.join(pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val')) + +In [121]: df +Out[121]: + key row item col val0 val1 +0 key0 row3 item1 col3 0.81 0.04 +1 key1 row2 item1 col2 0.44 0.07 +2 key1 row0 item1 col0 0.77 0.01 +3 key0 row4 item0 col2 0.15 0.59 +4 key1 row0 item2 col1 0.81 0.64 +.. ... ... ... ... ... ... +15 key0 row3 item1 col1 0.31 0.23 +16 key0 row0 item2 col3 0.86 0.01 +17 key0 row4 item0 col3 0.64 0.21 +18 key2 row2 item2 col0 0.13 0.45 +19 key0 row2 item0 col4 0.37 0.70 + +[20 rows x 6 columns] +``` + +### Pivoting with single aggregations + +Suppose we wanted to pivot ``df`` such that the ``col`` values are columns, +``row`` values are the index, and the mean of ``val0`` are the values? In +particular, the resulting DataFrame should look like: + +::: tip Note + +col col0 col1 col2 col3 col4 +row +row0 0.77 0.605 NaN 0.860 0.65 +row2 0.13 NaN 0.395 0.500 0.25 +row3 NaN 0.310 NaN 0.545 NaN +row4 NaN 0.100 0.395 0.760 0.24 + +::: + +This solution uses [``pivot_table()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html#pandas.pivot_table). Also note that +``aggfunc='mean'`` is the default. It is included here to be explicit. + +``` python +In [122]: df.pivot_table( + .....: values='val0', index='row', columns='col', aggfunc='mean') + .....: +Out[122]: +col col0 col1 col2 col3 col4 +row +row0 0.77 0.605 NaN 0.860 0.65 +row2 0.13 NaN 0.395 0.500 0.25 +row3 NaN 0.310 NaN 0.545 NaN +row4 NaN 0.100 0.395 0.760 0.24 +``` + +Note that we can also replace the missing values by using the ``fill_value`` +parameter. + +``` python +In [123]: df.pivot_table( + .....: values='val0', index='row', columns='col', aggfunc='mean', fill_value=0) + .....: +Out[123]: +col col0 col1 col2 col3 col4 +row +row0 0.77 0.605 0.000 0.860 0.65 +row2 0.13 0.000 0.395 0.500 0.25 +row3 0.00 0.310 0.000 0.545 0.00 +row4 0.00 0.100 0.395 0.760 0.24 +``` + +Also note that we can pass in other aggregation functions as well. For example, +we can also pass in ``sum``. + +``` python +In [124]: df.pivot_table( + .....: values='val0', index='row', columns='col', aggfunc='sum', fill_value=0) + .....: +Out[124]: +col col0 col1 col2 col3 col4 +row +row0 0.77 1.21 0.00 0.86 0.65 +row2 0.13 0.00 0.79 0.50 0.50 +row3 0.00 0.31 0.00 1.09 0.00 +row4 0.00 0.10 0.79 1.52 0.24 +``` + +Another aggregation we can do is calculate the frequency in which the columns +and rows occur together a.k.a. “cross tabulation”. To do this, we can pass +``size`` to the ``aggfunc`` parameter. + +``` python +In [125]: df.pivot_table(index='row', columns='col', fill_value=0, aggfunc='size') +Out[125]: +col col0 col1 col2 col3 col4 +row +row0 1 2 0 1 1 +row2 1 0 2 1 2 +row3 0 1 0 2 0 +row4 0 1 2 2 1 +``` + +### Pivoting with multiple aggregations + +We can also perform multiple aggregations. For example, to perform both a +``sum`` and ``mean``, we can pass in a list to the ``aggfunc`` argument. + +``` python +In [126]: df.pivot_table( + .....: values='val0', index='row', columns='col', aggfunc=['mean', 'sum']) + .....: +Out[126]: + mean sum +col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 +row +row0 0.77 0.605 NaN 0.860 0.65 0.77 1.21 NaN 0.86 0.65 +row2 0.13 NaN 0.395 0.500 0.25 0.13 NaN 0.79 0.50 0.50 +row3 NaN 0.310 NaN 0.545 NaN NaN 0.31 NaN 1.09 NaN +row4 NaN 0.100 0.395 0.760 0.24 NaN 0.10 0.79 1.52 0.24 +``` + +Note to aggregate over multiple value columns, we can pass in a list to the +``values`` parameter. + +``` python +In [127]: df.pivot_table( + .....: values=['val0', 'val1'], index='row', columns='col', aggfunc=['mean']) + .....: +Out[127]: + mean + val0 val1 +col col0 col1 col2 col3 col4 col0 col1 col2 col3 col4 +row +row0 0.77 0.605 NaN 0.860 0.65 0.01 0.745 NaN 0.010 0.02 +row2 0.13 NaN 0.395 0.500 0.25 0.45 NaN 0.34 0.440 0.79 +row3 NaN 0.310 NaN 0.545 NaN NaN 0.230 NaN 0.075 NaN +row4 NaN 0.100 0.395 0.760 0.24 NaN 0.070 0.42 0.300 0.46 +``` + +Note to subdivide over multiple columns we can pass in a list to the +``columns`` parameter. + +``` python +In [128]: df.pivot_table( + .....: values=['val0'], index='row', columns=['item', 'col'], aggfunc=['mean']) + .....: +Out[128]: + mean + val0 +item item0 item1 item2 +col col2 col3 col4 col0 col1 col2 col3 col4 col0 col1 col3 col4 +row +row0 NaN NaN NaN 0.77 NaN NaN NaN NaN NaN 0.605 0.86 0.65 +row2 0.35 NaN 0.37 NaN NaN 0.44 NaN NaN 0.13 NaN 0.50 0.13 +row3 NaN NaN NaN NaN 0.31 NaN 0.81 NaN NaN NaN 0.28 NaN +row4 0.15 0.64 NaN NaN 0.10 0.64 0.88 0.24 NaN NaN NaN NaN +``` + +## Exploding a list-like column + +*New in version 0.25.0.* + +Sometimes the values in a column are list-like. + +``` python +In [129]: keys = ['panda1', 'panda2', 'panda3'] + +In [130]: values = [['eats', 'shoots'], ['shoots', 'leaves'], ['eats', 'leaves']] + +In [131]: df = pd.DataFrame({'keys': keys, 'values': values}) + +In [132]: df +Out[132]: + keys values +0 panda1 [eats, shoots] +1 panda2 [shoots, leaves] +2 panda3 [eats, leaves] +``` + +We can ‘explode’ the ``values`` column, transforming each list-like to a separate row, by using [``explode()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.explode.html#pandas.Series.explode). This will replicate the index values from the original row: + +``` python +In [133]: df['values'].explode() +Out[133]: +0 eats +0 shoots +1 shoots +1 leaves +2 eats +2 leaves +Name: values, dtype: object +``` + +You can also explode the column in the ``DataFrame``. + +``` python +In [134]: df.explode('values') +Out[134]: + keys values +0 panda1 eats +0 panda1 shoots +1 panda2 shoots +1 panda2 leaves +2 panda3 eats +2 panda3 leaves +``` + +[``Series.explode()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.explode.html#pandas.Series.explode) will replace empty lists with ``np.nan`` and preserve scalar entries. The dtype of the resulting ``Series`` is always ``object``. + +``` python +In [135]: s = pd.Series([[1, 2, 3], 'foo', [], ['a', 'b']]) + +In [136]: s +Out[136]: +0 [1, 2, 3] +1 foo +2 [] +3 [a, b] +dtype: object + +In [137]: s.explode() +Out[137]: +0 1 +0 2 +0 3 +1 foo +2 NaN +3 a +3 b +dtype: object +``` + +Here is a typical usecase. You have comma separated strings in a column and want to expand this. + +``` python +In [138]: df = pd.DataFrame([{'var1': 'a,b,c', 'var2': 1}, + .....: {'var1': 'd,e,f', 'var2': 2}]) + .....: + +In [139]: df +Out[139]: + var1 var2 +0 a,b,c 1 +1 d,e,f 2 +``` + +Creating a long form DataFrame is now straightforward using explode and chained operations + +``` python +In [140]: df.assign(var1=df.var1.str.split(',')).explode('var1') +Out[140]: + var1 var2 +0 a 1 +0 b 1 +0 c 1 +1 d 2 +1 e 2 +1 f 2 +``` diff --git a/Python/pandas/user_guide/sparse.md b/Python/pandas/user_guide/sparse.md new file mode 100644 index 00000000..86615a91 --- /dev/null +++ b/Python/pandas/user_guide/sparse.md @@ -0,0 +1,565 @@ +# Sparse data structures + +::: tip Note + +``SparseSeries`` and ``SparseDataFrame`` have been deprecated. Their purpose +is served equally well by a [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) or [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) with +sparse values. See [Migrating](#sparse-migration) for tips on migrating. + +::: + +Pandas provides data structures for efficiently storing sparse data. +These are not necessarily sparse in the typical “mostly 0”. Rather, you can view these +objects as being “compressed” where any data matching a specific value (``NaN`` / missing value, though any value +can be chosen, including 0) is omitted. The compressed values are not actually stored in the array. + +``` python +In [1]: arr = np.random.randn(10) + +In [2]: arr[2:-2] = np.nan + +In [3]: ts = pd.Series(pd.SparseArray(arr)) + +In [4]: ts +Out[4]: +0 0.469112 +1 -0.282863 +2 NaN +3 NaN +4 NaN +5 NaN +6 NaN +7 NaN +8 -0.861849 +9 -2.104569 +dtype: Sparse[float64, nan] +``` + +Notice the dtype, ``Sparse[float64, nan]``. The ``nan`` means that elements in the +array that are ``nan`` aren’t actually stored, only the non-``nan`` elements are. +Those non-``nan`` elements have a ``float64`` dtype. + +The sparse objects exist for memory efficiency reasons. Suppose you had a +large, mostly NA ``DataFrame``: + +``` python +In [5]: df = pd.DataFrame(np.random.randn(10000, 4)) + +In [6]: df.iloc[:9998] = np.nan + +In [7]: sdf = df.astype(pd.SparseDtype("float", np.nan)) + +In [8]: sdf.head() +Out[8]: + 0 1 2 3 +0 NaN NaN NaN NaN +1 NaN NaN NaN NaN +2 NaN NaN NaN NaN +3 NaN NaN NaN NaN +4 NaN NaN NaN NaN + +In [9]: sdf.dtypes +Out[9]: +0 Sparse[float64, nan] +1 Sparse[float64, nan] +2 Sparse[float64, nan] +3 Sparse[float64, nan] +dtype: object + +In [10]: sdf.sparse.density +Out[10]: 0.0002 +``` + +As you can see, the density (% of values that have not been “compressed”) is +extremely low. This sparse object takes up much less memory on disk (pickled) +and in the Python interpreter. + +``` python +In [11]: 'dense : {:0.2f} bytes'.format(df.memory_usage().sum() / 1e3) +Out[11]: 'dense : 320.13 bytes' + +In [12]: 'sparse: {:0.2f} bytes'.format(sdf.memory_usage().sum() / 1e3) +Out[12]: 'sparse: 0.22 bytes' +``` + +Functionally, their behavior should be nearly +identical to their dense counterparts. + +## SparseArray + +[``SparseArray``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.SparseArray.html#pandas.SparseArray) is a [``ExtensionArray``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.extensions.ExtensionArray.html#pandas.api.extensions.ExtensionArray) +for storing an array of sparse values (see [dtypes](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dtypes) for more +on extension arrays). It is a 1-dimensional ndarray-like object storing +only values distinct from the ``fill_value``: + +``` python +In [13]: arr = np.random.randn(10) + +In [14]: arr[2:5] = np.nan + +In [15]: arr[7:8] = np.nan + +In [16]: sparr = pd.SparseArray(arr) + +In [17]: sparr +Out[17]: +[-1.9556635297215477, -1.6588664275960427, nan, nan, nan, 1.1589328886422277, 0.14529711373305043, nan, 0.6060271905134522, 1.3342113401317768] +Fill: nan +IntIndex +Indices: array([0, 1, 5, 6, 8, 9], dtype=int32) +``` + +A sparse array can be converted to a regular (dense) ndarray with ``numpy.asarray()`` + +``` python +In [18]: np.asarray(sparr) +Out[18]: +array([-1.9557, -1.6589, nan, nan, nan, 1.1589, 0.1453, + nan, 0.606 , 1.3342]) +``` + +## SparseDtype + +The ``SparseArray.dtype`` property stores two pieces of information + +1. The dtype of the non-sparse values +1. The scalar fill value + +``` python +In [19]: sparr.dtype +Out[19]: Sparse[float64, nan] +``` + +A [``SparseDtype``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.SparseDtype.html#pandas.SparseDtype) may be constructed by passing each of these + +``` python +In [20]: pd.SparseDtype(np.dtype('datetime64[ns]')) +Out[20]: Sparse[datetime64[ns], NaT] +``` + +The default fill value for a given NumPy dtype is the “missing” value for that dtype, +though it may be overridden. + +``` python +In [21]: pd.SparseDtype(np.dtype('datetime64[ns]'), + ....: fill_value=pd.Timestamp('2017-01-01')) + ....: +Out[21]: Sparse[datetime64[ns], 2017-01-01 00:00:00] +``` + +Finally, the string alias ``'Sparse[dtype]'`` may be used to specify a sparse dtype +in many places + +``` python +In [22]: pd.array([1, 0, 0, 2], dtype='Sparse[int]') +Out[22]: +[1, 0, 0, 2] +Fill: 0 +IntIndex +Indices: array([0, 3], dtype=int32) +``` + +## Sparse accessor + +*New in version 0.24.0.* + +Pandas provides a ``.sparse`` accessor, similar to ``.str`` for string data, ``.cat`` +for categorical data, and ``.dt`` for datetime-like data. This namespace provides +attributes and methods that are specific to sparse data. + +``` python +In [23]: s = pd.Series([0, 0, 1, 2], dtype="Sparse[int]") + +In [24]: s.sparse.density +Out[24]: 0.5 + +In [25]: s.sparse.fill_value +Out[25]: 0 +``` + +This accessor is available only on data with ``SparseDtype``, and on the [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) +class itself for creating a Series with sparse data from a scipy COO matrix with. + +*New in version 0.25.0.* + +A ``.sparse`` accessor has been added for [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) as well. +See [Sparse accessor](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html#api-frame-sparse) for more. + +## Sparse calculation + +You can apply NumPy [ufuncs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html) +to ``SparseArray`` and get a ``SparseArray`` as a result. + +``` python +In [26]: arr = pd.SparseArray([1., np.nan, np.nan, -2., np.nan]) + +In [27]: np.abs(arr) +Out[27]: +[1.0, nan, nan, 2.0, nan] +Fill: nan +IntIndex +Indices: array([0, 3], dtype=int32) +``` + +The *ufunc* is also applied to ``fill_value``. This is needed to get +the correct dense result. + +``` python +In [28]: arr = pd.SparseArray([1., -1, -1, -2., -1], fill_value=-1) + +In [29]: np.abs(arr) +Out[29]: +[1.0, 1, 1, 2.0, 1] +Fill: 1 +IntIndex +Indices: array([0, 3], dtype=int32) + +In [30]: np.abs(arr).to_dense() +Out[30]: array([1., 1., 1., 2., 1.]) +``` + +## Migrating + +In older versions of pandas, the ``SparseSeries`` and ``SparseDataFrame`` classes (documented below) +were the preferred way to work with sparse data. With the advent of extension arrays, these subclasses +are no longer needed. Their purpose is better served by using a regular Series or DataFrame with +sparse values instead. + +::: tip Note + +There’s no performance or memory penalty to using a Series or DataFrame with sparse values, +rather than a SparseSeries or SparseDataFrame. + +::: + +This section provides some guidance on migrating your code to the new style. As a reminder, +you can use the python warnings module to control warnings. But we recommend modifying +your code, rather than ignoring the warning. + +**Construction** + +From an array-like, use the regular [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) or +[``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) constructors with [``SparseArray``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.SparseArray.html#pandas.SparseArray) values. + +``` python +# Previous way +>>> pd.SparseDataFrame({"A": [0, 1]}) +``` + +``` python +# New way +In [31]: pd.DataFrame({"A": pd.SparseArray([0, 1])}) +Out[31]: + A +0 0 +1 1 +``` + +From a SciPy sparse matrix, use [``DataFrame.sparse.from_spmatrix()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sparse.from_spmatrix.html#pandas.DataFrame.sparse.from_spmatrix), + +``` python +# Previous way +>>> from scipy import sparse +>>> mat = sparse.eye(3) +>>> df = pd.SparseDataFrame(mat, columns=['A', 'B', 'C']) +``` + +``` python +# New way +In [32]: from scipy import sparse + +In [33]: mat = sparse.eye(3) + +In [34]: df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C']) + +In [35]: df.dtypes +Out[35]: +A Sparse[float64, 0.0] +B Sparse[float64, 0.0] +C Sparse[float64, 0.0] +dtype: object +``` + +**Conversion** + +From sparse to dense, use the ``.sparse`` accessors + +``` python +In [36]: df.sparse.to_dense() +Out[36]: + A B C +0 1.0 0.0 0.0 +1 0.0 1.0 0.0 +2 0.0 0.0 1.0 + +In [37]: df.sparse.to_coo() +Out[37]: +<3x3 sparse matrix of type '' + with 3 stored elements in COOrdinate format> +``` + +From dense to sparse, use [``DataFrame.astype()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype) with a [``SparseDtype``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.SparseDtype.html#pandas.SparseDtype). + +``` python +In [38]: dense = pd.DataFrame({"A": [1, 0, 0, 1]}) + +In [39]: dtype = pd.SparseDtype(int, fill_value=0) + +In [40]: dense.astype(dtype) +Out[40]: + A +0 1 +1 0 +2 0 +3 1 +``` + +**Sparse Properties** + +Sparse-specific properties, like ``density``, are available on the ``.sparse`` accessor. + +``` python +In [41]: df.sparse.density +Out[41]: 0.3333333333333333 +``` + +**General differences** + +In a ``SparseDataFrame``, *all* columns were sparse. A [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) can have a mixture of +sparse and dense columns. As a consequence, assigning new columns to a ``DataFrame`` with sparse +values will not automatically convert the input to be sparse. + +``` python +# Previous Way +>>> df = pd.SparseDataFrame({"A": [0, 1]}) +>>> df['B'] = [0, 0] # implicitly becomes Sparse +>>> df['B'].dtype +Sparse[int64, nan] +``` + +Instead, you’ll need to ensure that the values being assigned are sparse + +``` python +In [42]: df = pd.DataFrame({"A": pd.SparseArray([0, 1])}) + +In [43]: df['B'] = [0, 0] # remains dense + +In [44]: df['B'].dtype +Out[44]: dtype('int64') + +In [45]: df['B'] = pd.SparseArray([0, 0]) + +In [46]: df['B'].dtype +Out[46]: Sparse[int64, 0] +``` + +The ``SparseDataFrame.default_kind`` and ``SparseDataFrame.default_fill_value`` attributes +have no replacement. + +## Interaction with scipy.sparse + +Use [``DataFrame.sparse.from_spmatrix()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sparse.from_spmatrix.html#pandas.DataFrame.sparse.from_spmatrix) to create a ``DataFrame`` with sparse values from a sparse matrix. + +*New in version 0.25.0.* + +``` python +In [47]: from scipy.sparse import csr_matrix + +In [48]: arr = np.random.random(size=(1000, 5)) + +In [49]: arr[arr < .9] = 0 + +In [50]: sp_arr = csr_matrix(arr) + +In [51]: sp_arr +Out[51]: +<1000x5 sparse matrix of type '' + with 517 stored elements in Compressed Sparse Row format> + +In [52]: sdf = pd.DataFrame.sparse.from_spmatrix(sp_arr) + +In [53]: sdf.head() +Out[53]: + 0 1 2 3 4 +0 0.956380 0.0 0.0 0.000000 0.0 +1 0.000000 0.0 0.0 0.000000 0.0 +2 0.000000 0.0 0.0 0.000000 0.0 +3 0.000000 0.0 0.0 0.000000 0.0 +4 0.999552 0.0 0.0 0.956153 0.0 + +In [54]: sdf.dtypes +Out[54]: +0 Sparse[float64, 0.0] +1 Sparse[float64, 0.0] +2 Sparse[float64, 0.0] +3 Sparse[float64, 0.0] +4 Sparse[float64, 0.0] +dtype: object +``` + +All sparse formats are supported, but matrices that are not in [``COOrdinate``](https://docs.scipy.org/doc/scipy/reference/sparse.html#module-scipy.sparse) format will be converted, copying data as needed. +To convert back to sparse SciPy matrix in COO format, you can use the [``DataFrame.sparse.to_coo()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sparse.to_coo.html#pandas.DataFrame.sparse.to_coo) method: + +``` python +In [55]: sdf.sparse.to_coo() +Out[55]: +<1000x5 sparse matrix of type '' + with 517 stored elements in COOrdinate format> +``` + +meth:*Series.sparse.to_coo* is implemented for transforming a ``Series`` with sparse values indexed by a [``MultiIndex``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html#pandas.MultiIndex) to a [``scipy.sparse.coo_matrix``](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html#scipy.sparse.coo_matrix). + +The method requires a ``MultiIndex`` with two or more levels. + +``` python +In [56]: s = pd.Series([3.0, np.nan, 1.0, 3.0, np.nan, np.nan]) + +In [57]: s.index = pd.MultiIndex.from_tuples([(1, 2, 'a', 0), + ....: (1, 2, 'a', 1), + ....: (1, 1, 'b', 0), + ....: (1, 1, 'b', 1), + ....: (2, 1, 'b', 0), + ....: (2, 1, 'b', 1)], + ....: names=['A', 'B', 'C', 'D']) + ....: + +In [58]: s +Out[58]: +A B C D +1 2 a 0 3.0 + 1 NaN + 1 b 0 1.0 + 1 3.0 +2 1 b 0 NaN + 1 NaN +dtype: float64 + +In [59]: ss = s.astype('Sparse') + +In [60]: ss +Out[60]: +A B C D +1 2 a 0 3.0 + 1 NaN + 1 b 0 1.0 + 1 3.0 +2 1 b 0 NaN + 1 NaN +dtype: Sparse[float64, nan] +``` + +In the example below, we transform the ``Series`` to a sparse representation of a 2-d array by specifying that the first and second ``MultiIndex`` levels define labels for the rows and the third and fourth levels define labels for the columns. We also specify that the column and row labels should be sorted in the final sparse representation. + +``` python +In [61]: A, rows, columns = ss.sparse.to_coo(row_levels=['A', 'B'], + ....: column_levels=['C', 'D'], + ....: sort_labels=True) + ....: + +In [62]: A +Out[62]: +<3x4 sparse matrix of type '' + with 3 stored elements in COOrdinate format> + +In [63]: A.todense() +Out[63]: +matrix([[0., 0., 1., 3.], + [3., 0., 0., 0.], + [0., 0., 0., 0.]]) + +In [64]: rows +Out[64]: [(1, 1), (1, 2), (2, 1)] + +In [65]: columns +Out[65]: [('a', 0), ('a', 1), ('b', 0), ('b', 1)] +``` + +Specifying different row and column labels (and not sorting them) yields a different sparse matrix: + +``` python +In [66]: A, rows, columns = ss.sparse.to_coo(row_levels=['A', 'B', 'C'], + ....: column_levels=['D'], + ....: sort_labels=False) + ....: + +In [67]: A +Out[67]: +<3x2 sparse matrix of type '' + with 3 stored elements in COOrdinate format> + +In [68]: A.todense() +Out[68]: +matrix([[3., 0.], + [1., 3.], + [0., 0.]]) + +In [69]: rows +Out[69]: [(1, 2, 'a'), (1, 1, 'b'), (2, 1, 'b')] + +In [70]: columns +Out[70]: [0, 1] +``` + +A convenience method [``Series.sparse.from_coo()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sparse.from_coo.html#pandas.Series.sparse.from_coo) is implemented for creating a ``Series`` with sparse values from a ``scipy.sparse.coo_matrix``. + +``` python +In [71]: from scipy import sparse + +In [72]: A = sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), + ....: shape=(3, 4)) + ....: + +In [73]: A +Out[73]: +<3x4 sparse matrix of type '' + with 3 stored elements in COOrdinate format> + +In [74]: A.todense() +Out[74]: +matrix([[0., 0., 1., 2.], + [3., 0., 0., 0.], + [0., 0., 0., 0.]]) +``` + +The default behaviour (with ``dense_index=False``) simply returns a ``Series`` containing +only the non-null entries. + +``` python +In [75]: ss = pd.Series.sparse.from_coo(A) + +In [76]: ss +Out[76]: +0 2 1.0 + 3 2.0 +1 0 3.0 +dtype: Sparse[float64, nan] +``` + +Specifying ``dense_index=True`` will result in an index that is the Cartesian product of the +row and columns coordinates of the matrix. Note that this will consume a significant amount of memory +(relative to ``dense_index=False``) if the sparse matrix is large (and sparse) enough. + +``` python +In [77]: ss_dense = pd.Series.sparse.from_coo(A, dense_index=True) + +In [78]: ss_dense +Out[78]: +0 0 NaN + 1 NaN + 2 1.0 + 3 2.0 +1 0 3.0 + 1 NaN + 2 NaN + 3 NaN +2 0 NaN + 1 NaN + 2 NaN + 3 NaN +dtype: Sparse[float64, nan] +``` + +## Sparse subclasses + +The ``SparseSeries`` and ``SparseDataFrame`` classes are deprecated. Visit their +API pages for usage. diff --git a/Python/pandas/user_guide/style.md b/Python/pandas/user_guide/style.md new file mode 100644 index 00000000..f257d619 --- /dev/null +++ b/Python/pandas/user_guide/style.md @@ -0,0 +1,439 @@ +# Styling + +*New in version 0.17.1* + +Provisional: This is a new feature and still under development. We’ll be adding features and possibly making breaking changes in future releases. We’d love to hear your feedback. + +This document is written as a Jupyter Notebook, and can be viewed or downloaded [here](http://nbviewer.ipython.org/github/pandas-dev/pandas/blob/master/doc/source/style.ipynb). + +You can apply **conditional formatting**, the visual styling of a DataFrame depending on the data within, by using the ``DataFrame.style`` property. This is a property that returns a ``Styler`` object, which has useful methods for formatting and displaying DataFrames. + +The styling is accomplished using CSS. You write “style functions” that take scalars, ``DataFrame``s or ``Series``, and return *like-indexed* DataFrames or Series with CSS ``"attribute: value"`` pairs for the values. These functions can be incrementally passed to the ``Styler`` which collects the styles before rendering. + +## Building styles + +Pass your style functions into one of the following methods: + +- ``Styler.applymap``: elementwise +- ``Styler.apply``: column-/row-/table-wise + +Both of those methods take a function (and some other keyword arguments) and applies your function to the DataFrame in a certain way. ``Styler.applymap`` works through the DataFrame elementwise. ``Styler.apply`` passes each column or row into your DataFrame one-at-a-time or the entire table at once, depending on the ``axis`` keyword argument. For columnwise use ``axis=0``, rowwise use ``axis=1``, and for the entire table at once use ``axis=None``. + +For ``Styler.applymap`` your function should take a scalar and return a single string with the CSS attribute-value pair. + +For ``Styler.apply`` your function should take a Series or DataFrame (depending on the axis parameter), and return a Series or DataFrame with an identical shape where each value is a string with a CSS attribute-value pair. + +Let’s see some examples. + +![style02](https://static.pypandas.cn/public/static/images/style/user_guide_style_02.png) + +Here’s a boring example of rendering a DataFrame, without any (visible) styles: + +![style03](https://static.pypandas.cn/public/static/images/style/user_guide_style_03.png) + +*Note*: The ``DataFrame.style`` attribute is a property that returns a ``Styler`` object. ``Styler`` has a ``_repr_html_`` method defined on it so they are rendered automatically. If you want the actual HTML back for further processing or for writing to file call the ``.render()`` method which returns a string. + +The above output looks very similar to the standard DataFrame HTML representation. But we’ve done some work behind the scenes to attach CSS classes to each cell. We can view these by calling the ``.render`` method. + +``` javascript +df.style.highlight_null().render().split('\n')[:10] +``` + +``` javascript +['
', + ' ', + ' ', + ' ', + ' ', + ' ', + ' '] +``` + +The ``row0_col2`` is the identifier for that particular cell. We’ve also prepended each row/column identifier with a UUID unique to each DataFrame so that the style from one doesn’t collide with the styling from another within the same notebook or page (you can set the ``uuid`` if you’d like to tie together the styling of two DataFrames). + +When writing style functions, you take care of producing the CSS attribute / value pairs you want. Pandas matches those up with the CSS classes that identify each cell. + +Let’s write a simple style function that will color negative numbers red and positive numbers black. + +![style04](https://static.pypandas.cn/public/static/images/style/user_guide_style_04.png) + +In this case, the cell’s style depends only on it’s own value. That means we should use the ``Styler.applymap`` method which works elementwise. + +![style05](https://static.pypandas.cn/public/static/images/style/user_guide_style_05.png) + +Notice the similarity with the standard ``df.applymap``, which operates on DataFrames elementwise. We want you to be able to reuse your existing knowledge of how to interact with DataFrames. + +Notice also that our function returned a string containing the CSS attribute and value, separated by a colon just like in a ```` tag. This will be a common theme.

+ +Finally, the input shapes matched. ``Styler.applymap`` calls the function on each scalar input, and the function returns a scalar output. + +Now suppose you wanted to highlight the maximum value in each column. We can’t use ``.applymap`` anymore since that operated elementwise. Instead, we’ll turn to ``.apply`` which operates columnwise (or rowwise using the ``axis`` keyword). Later on we’ll see that something like ``highlight_max`` is already defined on ``Styler`` so you wouldn’t need to write this yourself. + +![style06](https://static.pypandas.cn/public/static/images/style/user_guide_style_06.png) + +In this case the input is a ``Series``, one column at a time. Notice that the output shape of ``highlight_max`` matches the input shape, an array with ``len(s)`` items. + +We encourage you to use method chains to build up a style piecewise, before finally rending at the end of the chain. + +![style07](https://static.pypandas.cn/public/static/images/style/user_guide_style_07.png) + +Above we used ``Styler.apply`` to pass in each column one at a time. + +Debugging Tip: If you’re having trouble writing your style function, try just passing it into DataFrame.apply. Internally, Styler.apply uses DataFrame.apply so the result should be the same. + +What if you wanted to highlight just the maximum value in the entire table? Use ``.apply(function, axis=None)`` to indicate that your function wants the entire table, not one column or row at a time. Let’s try that next. + +We’ll rewrite our ``highlight-max`` to handle either Series (from ``.apply(axis=0 or 1)``) or DataFrames (from ``.apply(axis=None)``). We’ll also allow the color to be adjustable, to demonstrate that ``.apply``, and ``.applymap`` pass along keyword arguments. + +![style08](https://static.pypandas.cn/public/static/images/style/user_guide_style_08.png) + +When using ``Styler.apply(func, axis=None)``, the function must return a DataFrame with the same index and column labels. + +![style09](https://static.pypandas.cn/public/static/images/style/user_guide_style_09.png) + +### Building Styles Summary + +Style functions should return strings with one or more CSS ``attribute: value`` delimited by semicolons. Use + +- ``Styler.applymap(func)`` for elementwise styles +- ``Styler.apply(func, axis=0)`` for columnwise styles +- ``Styler.apply(func, axis=1)`` for rowwise styles +- ``Styler.apply(func, axis=None)`` for tablewise styles + +And crucially the input and output shapes of ``func`` must match. If ``x`` is the input then ``func(x).shape == x.shape``. + +## Finer control: slicing + +Both ``Styler.apply``, and ``Styler.applymap`` accept a ``subset`` keyword. This allows you to apply styles to specific rows or columns, without having to code that logic into your ``style`` function. + +The value passed to ``subset`` behaves similar to slicing a DataFrame. + +- A scalar is treated as a column label +- A list (or series or numpy array) +- A tuple is treated as ``(row_indexer, column_indexer)`` + +Consider using ``pd.IndexSlice`` to construct the tuple for the last one. + +![style10](https://static.pypandas.cn/public/static/images/style/user_guide_style_10.png) + +For row and column slicing, any valid indexer to ``.loc`` will work. + +![style11](https://static.pypandas.cn/public/static/images/style/user_guide_style_11.png) + +Only label-based slicing is supported right now, not positional. + +If your style function uses a ``subset`` or ``axis`` keyword argument, consider wrapping your function in a ``functools.partial``, partialing out that keyword. + +``` python +my_func2 = functools.partial(my_func, subset=42) +``` + +## Finer Control: Display Values + +We distinguish the *display* value from the *actual* value in ``Styler``. To control the display value, the text is printed in each cell, use ``Styler.format``. Cells can be formatted according to a [format spec string](https://docs.python.org/3/library/string.html#format-specification-mini-language) or a callable that takes a single value and returns a string. + +![style12](https://static.pypandas.cn/public/static/images/style/user_guide_style_12.png) + +Use a dictionary to format specific columns. + +![style13](https://static.pypandas.cn/public/static/images/style/user_guide_style_13.png) + +Or pass in a callable (or dictionary of callables) for more flexible handling. + +![style14](https://static.pypandas.cn/public/static/images/style/user_guide_style_14.png) + +## Builtin styles + +Finally, we expect certain styling functions to be common enough that we’ve included a few “built-in” to the ``Styler``, so you don’t have to write them yourself. + +![style15](https://static.pypandas.cn/public/static/images/style/user_guide_style_15.png) + +You can create “heatmaps” with the ``background_gradient`` method. These require matplotlib, and we’ll use [Seaborn](http://stanford.edu/~mwaskom/software/seaborn/) to get a nice colormap. + +``` python +import seaborn as sns + +cm = sns.light_palette("green", as_cmap=True) + +s = df.style.background_gradient(cmap=cm) +s + +/opt/conda/envs/pandas/lib/python3.7/site-packages/matplotlib/colors.py:479: RuntimeWarning: invalid value encountered in less + xa[xa < 0] = -1 +``` + +![style16](https://static.pypandas.cn/public/static/images/style/user_guide_style_16.png) + +``Styler.background_gradient`` takes the keyword arguments ``low`` and ``high``. Roughly speaking these extend the range of your data by ``low`` and ``high`` percent so that when we convert the colors, the colormap’s entire range isn’t used. This is useful so that you can actually read the text still. + +![style17](https://static.pypandas.cn/public/static/images/style/user_guide_style_17.png) + +There’s also ``.highlight_min`` and ``.highlight_max``. + +![style18](https://static.pypandas.cn/public/static/images/style/user_guide_style_18.png) + +Use ``Styler.set_properties`` when the style doesn’t actually depend on the values. + +![style19](https://static.pypandas.cn/public/static/images/style/user_guide_style_19.png) + +### Bar charts + +You can include “bar charts” in your DataFrame. + +![style20](https://static.pypandas.cn/public/static/images/style/user_guide_style_20.png) + +New in version 0.20.0 is the ability to customize further the bar chart: You can now have the ``df.style.bar`` be centered on zero or midpoint value (in addition to the already existing way of having the min value at the left side of the cell), and you can pass a list of ``[color_negative, color_positive]``. + +Here’s how you can change the above with the new ``align='mid'`` option: + +![style21](https://static.pypandas.cn/public/static/images/style/user_guide_style_21.png) + +The following example aims to give a highlight of the behavior of the new align options: + +``` python +import pandas as pd +from IPython.display import HTML + +# Test series +test1 = pd.Series([-100,-60,-30,-20], name='All Negative') +test2 = pd.Series([10,20,50,100], name='All Positive') +test3 = pd.Series([-10,-5,0,90], name='Both Pos and Neg') + +head = """ +
A B C D E
011.32921nan-0.31628
+ + + + + + + + +""" + +aligns = ['left','zero','mid'] +for align in aligns: + row = "".format(align) + for serie in [test1,test2,test3]: + s = serie.copy() + s.name='' + row += "".format(s.to_frame().style.bar(align=align, + color=['#d65f5f', '#5fba7d'], + width=100).render()) #testn['width'] + row += '' + head += row + +head+= """ + +
AlignAll NegativeAll PositiveBoth Neg and Pos
{}{}
""" + + +HTML(head) +``` + +![style22](https://static.pypandas.cn/public/static/images/style/user_guide_style_22.png) + +## Sharing styles + +Say you have a lovely style built up for a DataFrame, and now you want to apply the same style to a second DataFrame. Export the style with ``df1.style.export``, and import it on the second DataFrame with ``df1.style.set`` + +![style23](https://static.pypandas.cn/public/static/images/style/user_guide_style_23.png) + +Notice that you’re able share the styles even though they’re data aware. The styles are re-evaluated on the new DataFrame they’ve been ``use``d upon. + +## Other Options + +You’ve seen a few methods for data-driven styling. ``Styler`` also provides a few other options for styles that don’t depend on the data. + +- precision +- captions +- table-wide styles +- hiding the index or columns + +Each of these can be specified in two ways: + +- A keyword argument to ``Styler.__init__`` +- A call to one of the ``.set_`` or ``.hide_`` methods, e.g. ``.set_caption`` or ``.hide_columns`` + +The best method to use depends on the context. Use the ``Styler`` constructor when building many styled DataFrames that should all share the same properties. For interactive use, the``.set_`` and ``.hide_`` methods are more convenient. + +### Precision + +You can control the precision of floats using pandas’ regular ``display.precision`` option. + +![style24](https://static.pypandas.cn/public/static/images/style/user_guide_style_24.png) + +Or through a ``set_precision`` method. + +![style25](https://static.pypandas.cn/public/static/images/style/user_guide_style_25.png) + +Setting the precision only affects the printed number; the full-precision values are always passed to your style functions. You can always use ``df.round(2).style`` if you’d prefer to round from the start. + +### Captions + +Regular table captions can be added in a few ways. + +![style26](https://static.pypandas.cn/public/static/images/style/user_guide_style_26.png) + +### Table styles + +The next option you have are “table styles”. These are styles that apply to the table as a whole, but don’t look at the data. Certain sytlings, including pseudo-selectors like ``:hover`` can only be used this way. + +![style27](https://static.pypandas.cn/public/static/images/style/user_guide_style_27.png) + +``table_styles`` should be a list of dictionaries. Each dictionary should have the ``selector`` and ``props`` keys. The value for ``selector`` should be a valid CSS selector. Recall that all the styles are already attached to an ``id``, unique to each ``Styler``. This selector is in addition to that ``id``. The value for ``props`` should be a list of tuples of ``('attribute', 'value')``. + +``table_styles`` are extremely flexible, but not as fun to type out by hand. We hope to collect some useful ones either in pandas, or preferable in a new package that [builds on top](#Extensibility) the tools here. + +### Hiding the Index or Columns + +The index can be hidden from rendering by calling ``Styler.hide_index``. Columns can be hidden from rendering by calling ``Styler.hide_columns`` and passing in the name of a column, or a slice of columns. + +![style28](https://static.pypandas.cn/public/static/images/style/user_guide_style_28.png) + +### CSS classes + +Certain CSS classes are attached to cells. + +- Index and Column names include ``index_name`` and ``level`` where ``k`` is its level in a MultiIndex +- Index label cells include +``row_heading`` +``row`` where ``n`` is the numeric position of the row +``level`` where ``k`` is the level in a MultiIndex +- ``row_heading`` +- ``row`` where ``n`` is the numeric position of the row +- ``level`` where ``k`` is the level in a MultiIndex +- Column label cells include +``col_heading`` +``col`` where ``n`` is the numeric position of the column +``level`` where ``k`` is the level in a MultiIndex +- ``col_heading`` +- ``col`` where ``n`` is the numeric position of the column +- ``level`` where ``k`` is the level in a MultiIndex +- Blank cells include ``blank`` +- Data cells include ``data`` + +### Limitations + +- DataFrame only ``(use Series.to_frame().style)`` +- The index and columns must be unique +- No large repr, and performance isn’t great; this is intended for summary DataFrames +- You can only style the *values*, not the index or columns +- You can only apply styles, you can’t insert new HTML entities + +Some of these will be addressed in the future. + +### Terms + +- Style function: a function that’s passed into ``Styler.apply`` or ``Styler.applymap`` and returns values like ``'css attribute: value'`` +- Builtin style functions: style functions that are methods on ``Styler`` +- table style: a dictionary with the two keys ``selector`` and ``props``. ``selector`` is the CSS selector that ``props`` will apply to. ``props`` is a list of ``(attribute, value)`` tuples. A list of table styles passed into ``Styler``. + +## Fun stuff + +Here are a few interesting examples. + +``Styler`` interacts pretty well with widgets. If you’re viewing this online instead of running the notebook yourself, you’re missing out on interactively adjusting the color palette. + +![style29](https://static.pypandas.cn/public/static/images/style/user_guide_style_29.png) + +![style30](https://static.pypandas.cn/public/static/images/style/user_guide_style_30.png) + +## Export to Excel + +*New in version 0.20.0* + +Experimental: This is a new feature and still under development. We’ll be adding features and possibly making breaking changes in future releases. We’d love to hear your feedback. + +Some support is available for exporting styled ``DataFrames`` to Excel worksheets using the ``OpenPyXL`` or ``XlsxWriter`` engines. CSS2.2 properties handled include: + +- ``background-color`` +- ``border-style``, ``border-width``, ``border-color`` and their {``top``, ``right``, ``bottom``, ``left`` variants} +- ``color`` +- ``font-family`` +- ``font-style`` +- ``font-weight`` +- ``text-align`` +- ``text-decoration`` +- ``vertical-align`` +- ``white-space: nowrap`` +- Only CSS2 named colors and hex colors of the form ``#rgb`` or ``#rrggbb`` are currently supported. +- The following pseudo CSS properties are also available to set excel specific style properties: +``number-format`` +- ``number-format`` + +``` python +df.style.\ + applymap(color_negative_red).\ + apply(highlight_max).\ + to_excel('styled.xlsx', engine='openpyxl') +``` + +A screenshot of the output: + +![excel](https://static.pypandas.cn/public/static/images/style-excel.png) + +## Extensibility + +The core of pandas is, and will remain, its “high-performance, easy-to-use data structures”. With that in mind, we hope that ``DataFrame.style`` accomplishes two goals + +- Provide an API that is pleasing to use interactively and is “good enough” for many tasks +- Provide the foundations for dedicated libraries to build on + +If you build a great library on top of this, let us know and we’ll [link](http://pandas.pydata.org/pandas-docs/stable/ecosystem.html) to it. + +### Subclassing + +If the default template doesn’t quite suit your needs, you can subclass Styler and extend or override the template. We’ll show an example of extending the default template to insert a custom header before each table. + + +``` python +from jinja2 import Environment, ChoiceLoader, FileSystemLoader +from IPython.display import HTML +from pandas.io.formats.style import Styler +``` + +We’ll use the following template: + + +``` python +with open("templates/myhtml.tpl") as f: + print(f.read()) +``` + +Now that we’ve created a template, we need to set up a subclass of ``Styler`` that knows about it. + + +``` python +class MyStyler(Styler): + env = Environment( + loader=ChoiceLoader([ + FileSystemLoader("templates"), # contains ours + Styler.loader, # the default + ]) + ) + template = env.get_template("myhtml.tpl") +``` + +Notice that we include the original loader in our environment’s loader. That’s because we extend the original template, so the Jinja environment needs to be able to find it. + +Now we can use that custom styler. It’s ``__init__`` takes a DataFrame. + +![style31](https://static.pypandas.cn/public/static/images/style/user_guide_style_31.png) + +Our custom template accepts a ``table_title`` keyword. We can provide the value in the ``.render`` method. + +![style32](https://static.pypandas.cn/public/static/images/style/user_guide_style_32.png) + +For convenience, we provide the ``Styler.from_custom_template`` method that does the same as the custom subclass. + +![style33](https://static.pypandas.cn/public/static/images/style/user_guide_style_33.png) + +Here’s the template structure: + +![style34](https://static.pypandas.cn/public/static/images/style/user_guide_style_34.png) + +See the template in the [GitHub repo](https://github.com/pandas-dev/pandas) for more details. diff --git a/Python/pandas/user_guide/text.md b/Python/pandas/user_guide/text.md new file mode 100644 index 00000000..f098fccf --- /dev/null +++ b/Python/pandas/user_guide/text.md @@ -0,0 +1,1056 @@ +--- +meta: + - name: keywords + content: Pandas 处理字符串 + - name: description + content: 序列和索引包含一些列的字符操作方法,这可以使我们轻易操作数组中的各个元素。最重要的是,这些方法可以自动跳过 缺失/NA 值。这些方法可以在``str``属性中访问到,并且基本上和python内建的(标量)字符串方法同名: +--- + +# Pandas 处理文本字符串 + +序列和索引包含一些列的字符操作方法,这可以使我们轻易操作数组中的各个元素。最重要的是,这些方法可以自动跳过 缺失/NA 值。这些方法可以在``str``属性中访问到,并且基本上和python内建的(标量)字符串方法同名: + +``` python +In [1]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) + +In [2]: s.str.lower() +Out[2]: +0 a +1 b +2 c +3 aaba +4 baca +5 NaN +6 caba +7 dog +8 cat +dtype: object + +In [3]: s.str.upper() +Out[3]: +0 A +1 B +2 C +3 AABA +4 BACA +5 NaN +6 CABA +7 DOG +8 CAT +dtype: object + +In [4]: s.str.len() +Out[4]: +0 1.0 +1 1.0 +2 1.0 +3 4.0 +4 4.0 +5 NaN +6 4.0 +7 3.0 +8 3.0 +dtype: float64 +``` + +``` python +In [5]: idx = pd.Index([' jack', 'jill ', ' jesse ', 'frank']) + +In [6]: idx.str.strip() +Out[6]: Index(['jack', 'jill', 'jesse', 'frank'], dtype='object') + +In [7]: idx.str.lstrip() +Out[7]: Index(['jack', 'jill ', 'jesse ', 'frank'], dtype='object') + +In [8]: idx.str.rstrip() +Out[8]: Index([' jack', 'jill', ' jesse', 'frank'], dtype='object') +``` + +索引的字符串方法在清理或者转换数据表列的时候非常有用。例如,你的列中或许会包含首位的白空格: + +``` python +In [9]: df = pd.DataFrame(np.random.randn(3, 2), + ...: columns=[' Column A ', ' Column B '], index=range(3)) + ...: + +In [10]: df +Out[10]: + Column A Column B +0 0.469112 -0.282863 +1 -1.509059 -1.135632 +2 1.212112 -0.173215 +``` + +Since ``df.columns`` is an Index object, we can use the ``.str`` accessor + +``` python +In [11]: df.columns.str.strip() +Out[11]: Index(['Column A', 'Column B'], dtype='object') + +In [12]: df.columns.str.lower() +Out[12]: Index([' column a ', ' column b '], dtype='object') +``` + +这些字符串方法可以被用来清理需要的列。这里,我们想清理开头和结尾的白空格,将所有的名称都换为小写,并且将其余的空格都替换为下划线: + +``` python +In [13]: df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_') + +In [14]: df +Out[14]: + column_a column_b +0 0.469112 -0.282863 +1 -1.509059 -1.135632 +2 1.212112 -0.173215 +``` + +::: tip 小贴士 + +如果你有一个序列,里面有很多重复的值 +(即,序列中唯一元素的数量远小于``序列``的长度),将原有序列转换为一种分类类型,然后使用``.str.`` 或者 ``.dt.``方法,则会获得更快的速度。 +速度的差异来源于,在``分类类型``的``序列``中,字符操作只是在``categories``中完成的,而不是针对``序列``中的每一个元素。 + +请注意,相比于字符串类型的``序列``,带``.categories``类型的 ``分类`` 类别的 ``序列``有一些限制(例如,你不能像其中的元素追加其他的字串:``s + " " + s`` 将不能正确工作,如果s是一个``分类``类型的序列。并且,``.str`` 中,那些可以对 ``列表(list)`` 类型的元素进行操作的方法,在分类序列中也无法使用。 + +::: + +::: danger 警告 + +v.0.25.0版以前, ``.str``访问器只会进行最基本的类型检查。 +从v.0.25.0起,序列的类型会被自动推断出来,并且会更为激进地使用恰当的类型。 + +一般来说 ``.str`` 访问器只倾向于针对字符串类型工作。只有在个别的情况下,才能对非字符串类型工作,但是这也将会在未来的版本中被逐步修正 +::: + +## 拆分和替换字符串 + +类似``split``的方法返回一个列表类型的序列: + +``` python +In [15]: s2 = pd.Series(['a_b_c', 'c_d_e', np.nan, 'f_g_h']) + +In [16]: s2.str.split('_') +Out[16]: +0 [a, b, c] +1 [c, d, e] +2 NaN +3 [f, g, h] +dtype: object +``` + +切分后的列表中的元素可以通过 ``get`` 方法或者 ``[]`` 方法进行读取: + +``` python +In [17]: s2.str.split('_').str.get(1) +Out[17]: +0 b +1 d +2 NaN +3 g +dtype: object + +In [18]: s2.str.split('_').str[1] +Out[18]: +0 b +1 d +2 NaN +3 g +dtype: object +``` + +使用``expand``方法可以轻易地将这种返回展开为一个数据表. + +``` python +In [19]: s2.str.split('_', expand=True) +Out[19]: + 0 1 2 +0 a b c +1 c d e +2 NaN NaN NaN +3 f g h +``` + +同样,我们也可以限制切分的次数: + +``` python +In [20]: s2.str.split('_', expand=True, n=1) +Out[20]: + 0 1 +0 a b_c +1 c d_e +2 NaN NaN +3 f g_h +``` + +``rsplit``与``split``相似,不同的是,这个切分的方向是反的。即,从字串的尾端向首段切分: + +``` python +In [21]: s2.str.rsplit('_', expand=True, n=1) +Out[21]: + 0 1 +0 a_b c +1 c_d e +2 NaN NaN +3 f_g h +``` + +``replace`` 方法默认使用 [正则表达式](https://docs.python.org/3/library/re.html): + +``` python +In [22]: s3 = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', + ....: '', np.nan, 'CABA', 'dog', 'cat']) + ....: + +In [23]: s3 +Out[23]: +0 A +1 B +2 C +3 Aaba +4 Baca +5 +6 NaN +7 CABA +8 dog +9 cat +dtype: object + +In [24]: s3.str.replace('^.a|dog', 'XX-XX ', case=False) +Out[24]: +0 A +1 B +2 C +3 XX-XX ba +4 XX-XX ca +5 +6 NaN +7 XX-XX BA +8 XX-XX +9 XX-XX t +dtype: object +``` + +一定要时时记得,是正则表达式,因此要格外小心。例如,因为正则表达式中的*$*符号,下列代码将会导致一些麻烦: + +``` python +# Consider the following badly formatted financial data +In [25]: dollars = pd.Series(['12', '-$10', '$10,000']) + +# This does what you'd naively expect: +In [26]: dollars.str.replace('$', '') +Out[26]: +0 12 +1 -10 +2 10,000 +dtype: object + +# But this doesn't: +In [27]: dollars.str.replace('-$', '-') +Out[27]: +0 12 +1 -$10 +2 $10,000 +dtype: object + +# We need to escape the special character (for >1 len patterns) +In [28]: dollars.str.replace(r'-\$', '-') +Out[28]: +0 12 +1 -10 +2 $10,000 +dtype: object +``` + +*v0.23.0. 新加入* + +如果你只是向单纯地替换字符 (等价于python中的 +[``str.replace()``](https://docs.python.org/3/library/stdtypes.html#str.replace)),你可以将可选参数 ``regex`` 设置为 ``False``,而不是傻傻地转义所有符号。这种情况下,``pat`` 和 ``repl`` 就都将作为普通字符对待: + +``` python +# These lines are equivalent +In [29]: dollars.str.replace(r'-\$', '-') +Out[29]: +0 12 +1 -10 +2 $10,000 +dtype: object + +In [30]: dollars.str.replace('-$', '-', regex=False) +Out[30]: +0 12 +1 -10 +2 $10,000 +dtype: object +``` + +*v0.20.0. 新加入* + +``replace`` 方法也可以传入一个可调用对象作为替换值。它针对每一个 ``pat`` 通过[``re.sub()``](https://docs.python.org/3/library/re.html#re.sub)来调用。可调用对象应只具有一个形参(一个正则表达式对象)并且返回一个字符串。 + +``` python +# Reverse every lowercase alphabetic word +In [31]: pat = r'[a-z]+' + +In [32]: def repl(m): + ....: return m.group(0)[::-1] + ....: + +In [33]: pd.Series(['foo 123', 'bar baz', np.nan]).str.replace(pat, repl) +Out[33]: +0 oof 123 +1 rab zab +2 NaN +dtype: object + +# Using regex groups +In [34]: pat = r"(?P\w+) (?P\w+) (?P\w+)" + +In [35]: def repl(m): + ....: return m.group('two').swapcase() + ....: + +In [36]: pd.Series(['Foo Bar Baz', np.nan]).str.replace(pat, repl) +Out[36]: +0 bAR +1 NaN +dtype: object +``` + +*v0.20.0. 新加入* + + ``replace`` 方法也可以接受一个来自[``re.compile()``](https://docs.python.org/3/library/re.html#re.compile) 编译过的正则表达式对象,来做为``表达式``。所有的标记都应该被包含在这个已经编译好的正则表达式对象中。 + +``` python +In [37]: import re + +In [38]: regex_pat = re.compile(r'^.a|dog', flags=re.IGNORECASE) + +In [39]: s3.str.replace(regex_pat, 'XX-XX ') +Out[39]: +0 A +1 B +2 C +3 XX-XX ba +4 XX-XX ca +5 +6 NaN +7 XX-XX BA +8 XX-XX +9 XX-XX t +dtype: object +``` + +如果在已经使用编译的正则对象中继续传入``flags`` 参数,并进行替换,将会导致``ValueError``。 + +``` python +In [40]: s3.str.replace(regex_pat, 'XX-XX ', flags=re.IGNORECASE) +--------------------------------------------------------------------------- +ValueError: case and flags cannot be set when pat is a compiled regex +``` + +## 拼接 + +Pandas提供了不同的方法将``序列``或``索引``与他们自己或者其他的对象进行拼接,所有的方法都是基于各自的[``cat()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.cat.html#pandas.Series.str.cat), +resp. ``Index.str.cat``. + +### 将单个序列拼接为一个完整字符串 + +``序列``或``索引``的内容可以进行拼接: + +``` python +In [41]: s = pd.Series(['a', 'b', 'c', 'd']) + +In [42]: s.str.cat(sep=',') +Out[42]: 'a,b,c,d' +``` + +如果没有额外声明,``sep`` 即分隔符默认为空字串,即``sep=''``: + +``` python +In [43]: s.str.cat() +Out[43]: 'abcd' +``` + +默认情况下,缺失值会被忽略。使用``na_rep``参数,可以对缺失值进行赋值: + +``` python +In [44]: t = pd.Series(['a', 'b', np.nan, 'd']) + +In [45]: t.str.cat(sep=',') +Out[45]: 'a,b,d' + +In [46]: t.str.cat(sep=',', na_rep='-') +Out[46]: 'a,b,-,d' +``` + +### 拼接序列和其他类列表型对象为新的序列 + +[``cat()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.cat.html#pandas.Series.str.cat) 的第一个参数为类列表对象,但必须要确保长度与``序列``或``索引``相同. + +``` python +In [47]: s.str.cat(['A', 'B', 'C', 'D']) +Out[47]: +0 aA +1 bB +2 cC +3 dD +dtype: object +``` + + +任何一端的缺失值都会导致之中结果为缺失值,*除非*使用``na_rep``: + +``` python +In [48]: s.str.cat(t) +Out[48]: +0 aa +1 bb +2 NaN +3 dd +dtype: object + +In [49]: s.str.cat(t, na_rep='-') +Out[49]: +0 aa +1 bb +2 c- +3 dd +dtype: object +``` + +### 拼接序列与类数组对象为新的序列 + +*v0.23.0. 新加入* + +``others`` 参数可以是二维的。此时,行数需要与``序列``或``索引``的长度相同。 + +``` python +In [50]: d = pd.concat([t, s], axis=1) + +In [51]: s +Out[51]: +0 a +1 b +2 c +3 d +dtype: object + +In [52]: d +Out[52]: + 0 1 +0 a a +1 b b +2 NaN c +3 d d + +In [53]: s.str.cat(d, na_rep='-') +Out[53]: +0 aaa +1 bbb +2 c-c +3 ddd +dtype: object +``` + +### 对齐拼接序列与带索引的对象成为新的序列 + +*v0.23.0.新加入* + +对于拼接``序列``或者``数据表``,我们可以使用 ``join``关键字来对齐索引。 + +``` python +In [54]: u = pd.Series(['b', 'd', 'a', 'c'], index=[1, 3, 0, 2]) + +In [55]: s +Out[55]: +0 a +1 b +2 c +3 d +dtype: object + +In [56]: u +Out[56]: +1 b +3 d +0 a +2 c +dtype: object + +In [57]: s.str.cat(u) +Out[57]: +0 ab +1 bd +2 ca +3 dc +dtype: object + +In [58]: s.str.cat(u, join='left') +Out[58]: +0 aa +1 bb +2 cc +3 dd +dtype: object +``` + +::: danger 警告 + +如果不使用``join`` 关键字, [``cat()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.cat.html#pandas.Series.str.cat) 方法将会滚回到0.23.0版之前,即(无对齐)模式。但如果任何的索引不一致时,将会抛出一个 +``FutureWarning`` 警告,因为在未来的版本中,默认行为将改为join='left' 。 + +::: + +``join`` 的选项为(``'left'``, ``'outer'``, ``'inner'``, ``'right'``)中的一个。 +特别的,对齐操作使得两个对象可以是不同的长度。 + +``` python +In [59]: v = pd.Series(['z', 'a', 'b', 'd', 'e'], index=[-1, 0, 1, 3, 4]) + +In [60]: s +Out[60]: +0 a +1 b +2 c +3 d +dtype: object + +In [61]: v +Out[61]: +-1 z + 0 a + 1 b + 3 d + 4 e +dtype: object + +In [62]: s.str.cat(v, join='left', na_rep='-') +Out[62]: +0 aa +1 bb +2 c- +3 dd +dtype: object + +In [63]: s.str.cat(v, join='outer', na_rep='-') +Out[63]: +-1 -z + 0 aa + 1 bb + 2 c- + 3 dd + 4 -e +dtype: object +``` + +当``others``是一个``数据表``时,也可以执行相同的对齐操作: + +``` python +In [64]: f = d.loc[[3, 2, 1, 0], :] + +In [65]: s +Out[65]: +0 a +1 b +2 c +3 d +dtype: object + +In [66]: f +Out[66]: + 0 1 +3 d d +2 NaN c +1 b b +0 a a + +In [67]: s.str.cat(f, join='left', na_rep='-') +Out[67]: +0 aaa +1 bbb +2 c-c +3 ddd +dtype: object +``` + +### 将一个序列与多个对象拼接为一个新的序列 + +所有的一维,类列表对象都可以任意组合进一个类列表的容器(包括迭代器,dict-视图等): + +``` python +In [68]: s +Out[68]: +0 a +1 b +2 c +3 d +dtype: object + +In [69]: u +Out[69]: +1 b +3 d +0 a +2 c +dtype: object + +In [70]: s.str.cat([u, u.to_numpy()], join='left') +Out[70]: +0 aab +1 bbd +2 cca +3 ddc +dtype: object +``` + +除了那些有索引的,所有传入没有索引的元素(如``np.ndarray``)必须与``序列``或``索引``有相同的长度。但是,只要禁用对齐``join=None``,那么``序列``或``索引``就可以是任意长度。 + + +``` python +In [71]: v +Out[71]: +-1 z + 0 a + 1 b + 3 d + 4 e +dtype: object + +In [72]: s.str.cat([v, u, u.to_numpy()], join='outer', na_rep='-') +Out[72]: +-1 -z-- + 0 aaab + 1 bbbd + 2 c-ca + 3 dddc + 4 -e-- +dtype: object +``` + +如果在一个包含不同的索引的``others``列表上使用``join='right'``,所有索引的并集将会被作为最终拼接的基础: + +``` python +In [73]: u.loc[[3]] +Out[73]: +3 d +dtype: object + +In [74]: v.loc[[-1, 0]] +Out[74]: +-1 z + 0 a +dtype: object + +In [75]: s.str.cat([u.loc[[3]], v.loc[[-1, 0]]], join='right', na_rep='-') +Out[75]: +-1 --z + 0 a-a + 3 dd- +dtype: object +``` + +## 使用.str进行索引 + +你可以使用 ``[]``方法来直接索引定位。如果你的索引超过了字符串的结尾,将返回``NaN``。 + +``` python +In [76]: s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, + ....: 'CABA', 'dog', 'cat']) + ....: + +In [77]: s.str[0] +Out[77]: +0 A +1 B +2 C +3 A +4 B +5 NaN +6 C +7 d +8 c +dtype: object + +In [78]: s.str[1] +Out[78]: +0 NaN +1 NaN +2 NaN +3 a +4 a +5 NaN +6 A +7 o +8 a +dtype: object +``` + +## 提取子字符串 + +### 提取第一个匹配的对象 (extract) + +::: danger 警告 + +在 0.18.0中,``extract``拥有了 ``expand`` 参数。当 ``expand=False``时, 将返回一个序列,索引或者数据表, 这取决于原对象和正则表达式(之前的版本也是如此)。当 ``expand=True``时,它则总是返回一个``DataFrame``,这样可以更加一致,并且减少用户的混淆。 ``Expand=True`` 从0.23.0版本之后成为默认值。 + +::: + +``extract`` 方法接受一个至少含有一个捕获组的 [正则表达式](https://docs.python.org/3/library/re.html)。 + +使用超过一个捕获组的正则表达式则会提取并返回一个数据表,每一列为一个捕获组。 + +``` python +In [79]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'([ab])(\d)', expand=False) +Out[79]: + 0 1 +0 a 1 +1 b 2 +2 NaN NaN +``` + +没有成功匹配的元素将会返回一行``NaN``。因此,一个序列的混乱的字符串可以被‘转换’为一个类似索引的序列或数据表。返回的内容会更为清爽,而且不需要使用``get()``方法来访问元组中的成员或者``re.match``对象。返回的类型将总是``object``类,即使匹配失败,返回的全是``NaN``。 + +有名称的捕获组,如: + +``` python +In [80]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'(?P[ab])(?P\d)', + ....: expand=False) + ....: +Out[80]: + letter digit +0 a 1 +1 b 2 +2 NaN NaN +``` + +可选组类似,如: + +``` python +In [81]: pd.Series(['a1', 'b2', '3']).str.extract(r'([ab])?(\d)', expand=False) +Out[81]: + 0 1 +0 a 1 +1 b 2 +2 NaN 3 +``` +也可以被使用。注意,任何有名称的捕获组,其名称都会被用做列名,否则将会直接使用数字。 + +如果仅使用正则表达式捕获一个组,而``expand=True``,那么仍然将返回一个``数据表``。 + +``` python +In [82]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=True) +Out[82]: + 0 +0 1 +1 2 +2 NaN +``` + +如果``expand=False``,则会返回一个``序列``。 + +``` python +In [83]: pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=False) +Out[83]: +0 1 +1 2 +2 NaN +dtype: object +``` + +在``索引``上使用正则表达式,并且仅捕获一个组时,将会返回一个``数据表``,如果``expand=True``。 + +``` python +In [84]: s = pd.Series(["a1", "b2", "c3"], ["A11", "B22", "C33"]) + +In [85]: s +Out[85]: +A11 a1 +B22 b2 +C33 c3 +dtype: object + +In [86]: s.index.str.extract("(?P[a-zA-Z])", expand=True) +Out[86]: + letter +0 A +1 B +2 C +``` + +如果``expand=False``,则返回一个``Index``。 + +``` python +In [87]: s.index.str.extract("(?P[a-zA-Z])", expand=False) +Out[87]: Index(['A', 'B', 'C'], dtype='object', name='letter') +``` + +如果在``索引``上使用正则并捕获多个组,则返回一个``数据表``,如果``expand=True``。 + +``` python +In [88]: s.index.str.extract("(?P[a-zA-Z])([0-9]+)", expand=True) +Out[88]: + letter 1 +0 A 11 +1 B 22 +2 C 33 +``` + +如果 ``expand=False``,则抛出``ValueError``。 + +``` python +>>> s.index.str.extract("(?P[a-zA-Z])([0-9]+)", expand=False) +ValueError: only one regex group is supported with Index +``` + +下面的表格总结了``extract (expand=False)``时的行为(输入对象在第一列,捕获组的数量在第一行) + +  | 1 group | >1 group +---|---|--- +Index | Index | ValueError +Series | Series | DataFrame + +### 提取所有的匹配 (extractall) + +*v0.18.0. 新加入* + +不同于 ``extract``(只返回第一个匹配), + +``` python +In [89]: s = pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"]) + +In [90]: s +Out[90]: +A a1a2 +B b1 +C c1 +dtype: object + +In [91]: two_groups = '(?P[a-z])(?P[0-9])' + +In [92]: s.str.extract(two_groups, expand=True) +Out[92]: + letter digit +A a 1 +B b 1 +C c 1 +``` + +``extractall``方法返回所有的匹配。``extractall``总是返回一个带有行``多重索引``的``数据表``,最后一级``多重索引``被命名为``match``,它指出匹配的顺序 + +``` python +In [93]: s.str.extractall(two_groups) +Out[93]: + letter digit + match +A 0 a 1 + 1 a 2 +B 0 b 1 +C 0 c 1 +``` + +当所有的对象字串都只有一个匹配时, + +``` python +In [94]: s = pd.Series(['a3', 'b3', 'c2']) + +In [95]: s +Out[95]: +0 a3 +1 b3 +2 c2 +dtype: object +``` + +``extractall(pat).xs(0, level='match')`` 的返回与``extract(pat)``相同。 + +``` python +In [96]: extract_result = s.str.extract(two_groups, expand=True) + +In [97]: extract_result +Out[97]: + letter digit +0 a 3 +1 b 3 +2 c 2 + +In [98]: extractall_result = s.str.extractall(two_groups) + +In [99]: extractall_result +Out[99]: + letter digit + match +0 0 a 3 +1 0 b 3 +2 0 c 2 + +In [100]: extractall_result.xs(0, level="match") +Out[100]: + letter digit +0 a 3 +1 b 3 +2 c 2 +``` + +``索引``也支持``.str.extractall``。 它返回一个``数据表``,其中包含与``Series.str.estractall``相同的结果,使用默认索引(从0开始) + +*v0.19.0. 新加入* + +``` python +In [101]: pd.Index(["a1a2", "b1", "c1"]).str.extractall(two_groups) +Out[101]: + letter digit + match +0 0 a 1 + 1 a 2 +1 0 b 1 +2 0 c 1 + +In [102]: pd.Series(["a1a2", "b1", "c1"]).str.extractall(two_groups) +Out[102]: + letter digit + match +0 0 a 1 + 1 a 2 +1 0 b 1 +2 0 c 1 +``` + +## 测试匹配或包含模式的字符串 + +你可以检查是否一个元素包含一个可以匹配到的正则表达式: + +``` python +In [103]: pattern = r'[0-9][a-z]' + +In [104]: pd.Series(['1', '2', '3a', '3b', '03c']).str.contains(pattern) +Out[104]: +0 False +1 False +2 True +3 True +4 True +dtype: bool +``` + +或者是否元素完整匹配一个正则表达式 + +``` python +In [105]: pd.Series(['1', '2', '3a', '3b', '03c']).str.match(pattern) +Out[105]: +0 False +1 False +2 True +3 True +4 False +dtype: bool +``` + +``match``和``contains``的区别是是否严格匹配。``match``严格基于``re.match``,而``contains``基于``re.search``。 + +类似``match``, ``contains``, ``startswith`` 和 ``endswith`` 可以传入一个额外的``na``参数,因此,因此缺失值在匹配时可以被认为是``True``或者``False``: + +``` python +In [106]: s4 = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat']) + +In [107]: s4.str.contains('A', na=False) +Out[107]: +0 True +1 False +2 False +3 True +4 False +5 False +6 True +7 False +8 False +dtype: bool +``` + +## 建立一个指示变量 + +你从字符串列可以抽出一个哑变量。例如,是否他们由``|``分割: + +``` python +In [108]: s = pd.Series(['a', 'a|b', np.nan, 'a|c']) + +In [109]: s.str.get_dummies(sep='|') +Out[109]: + a b c +0 1 0 0 +1 1 1 0 +2 0 0 0 +3 1 0 1 +``` + +索引也支持``get_dummies``,它返回一个多重索引: + +*v0.18.1. 新加入* + +``` python +In [110]: idx = pd.Index(['a', 'a|b', np.nan, 'a|c']) + +In [111]: idx.str.get_dummies(sep='|') +Out[111]: +MultiIndex([(1, 0, 0), + (1, 1, 0), + (0, 0, 0), + (1, 0, 1)], + names=['a', 'b', 'c']) +``` + +参见 [``get_dummies()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html#pandas.get_dummies). + +## 方法总览 + +方法 | 描述 +---|--- +[cat()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.cat.html#pandas.Series.str.cat) | 拼接字符串 +[split()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html#pandas.Series.str.split) | 基于分隔符切分字符串 +[rsplit()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.rsplit.html#pandas.Series.str.rsplit) | 基于分隔符,逆向切分字符串 +[get()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get.html#pandas.Series.str.get) | 索引每一个元素(返回第i个元素) +[join()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.join.html#pandas.Series.str.join) | 使用传入的分隔符依次拼接每一个元素 +[get_dummies()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get_dummies.html#pandas.Series.str.get_dummies) | 用分隔符切分字符串,并返回一个含有哑变量的数据表 +[contains()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html#pandas.Series.str.contains) | 返回一个布尔矩阵表明是每个元素包含字符串或正则表达式 +[replace()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html#pandas.Series.str.replace) | 将匹配到的子串或正则表达式替换为另外的字符串,或者一个可调用对象的返回值 +[repeat()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.repeat.html#pandas.Series.str.repeat) | 值复制(s.str.repeat(3)等价于x * 3) +[pad()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.pad.html#pandas.Series.str.pad) | 将白空格插入到字符串的左、右或者两端 +[center()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.center.html#pandas.Series.str.center) | 等价于``str.center`` +[ljust()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.ljust.html#pandas.Series.str.ljust) | 等价于``str.ljust`` +[rjust()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.rjust.html#pandas.Series.str.rjust) | 等价于``str.rjust`` +[zfill()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.zfill.html#pandas.Series.str.zfill) | 等价于``str.zfill`` +[wrap()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.wrap.html#pandas.Series.str.wrap) | 将长字符串转换为不长于给定长度的行 +[slice()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.slice.html#pandas.Series.str.slice) | 将序列中的每一个字符串切片 +[slice_replace()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.slice_replace.html#pandas.Series.str.slice_replace) | 用传入的值替换每一个字串中的切片 +[count()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.count.html#pandas.Series.str.count) | 对出现符合的规则进行计数 +[startswith()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.startswith.html#pandas.Series.str.startswith) | 等价于``str.startswith(pat)`` +[endswith()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.endswith.html#pandas.Series.str.endswith) | 等价于 ``str.endswith(pat)`` +[findall()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.findall.html#pandas.Series.str.findall) | 返回每一个字串中出现的所有满足样式或正则的匹配 +[match()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.match.html#pandas.Series.str.match) | 素调用 ``re.match``,并以列表形式返回匹配到的组 +[extract()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html#pandas.Series.str.extract) | Call 对每一个元素调用 ``re.search``, 并以数据表的形式返回。行对应原有的一个元素,列对应所有捕获的组 +[extractall()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extractall.html#pandas.Series.str.extractall) | 一个元素调用 ``re.findall``, 并以数据表的形式返回。行对应原有的一个元素,列对应所有捕获的组 +[len()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.len.html#pandas.Series.str.len) | 计算字符串长度 +[strip()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.strip.html#pandas.Series.str.strip) | 等价于``str.strip`` +[rstrip()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.rstrip.html#pandas.Series.str.rstrip) | 等价于``str.rstrip`` +[lstrip()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.lstrip.html#pandas.Series.str.lstrip) | 等价于``str.lstrip`` +[partition()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.partition.html#pandas.Series.str.partition) | 等价于 ``str.partition`` +[rpartition()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.rpartition.html#pandas.Series.str.rpartition) | 等价于 ``str.rpartition`` +[lower()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.lower.html#pandas.Series.str.lower) | 等价于 ``str.lower`` +[casefold()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.casefold.html#pandas.Series.str.casefold) | 等价于 ``str.casefold`` +[upper()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.upper.html#pandas.Series.str.upper) | 等价于 ``str.upper`` +[find()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.find.html#pandas.Series.str.find) | 等价于``str.find`` +[rfind()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.rfind.html#pandas.Series.str.rfind) | 等价于 ``str.rfind`` +[index()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.index.html#pandas.Series.str.index) | 等价于 ``str.index`` +[rindex()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.rindex.html#pandas.Series.str.rindex) | 等价于 ``str.rindex`` +[capitalize()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.capitalize.html#pandas.Series.str.capitalize) | 等价于 ``str.capitalize`` +[swapcase()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.swapcase.html#pandas.Series.str.swapcase) | 等价于 ``str.swapcase`` +[normalize()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.normalize.html#pandas.Series.str.normalize) | 返回Unicode 标注格式。等价于 unicodedata.normalize +[translate()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.translate.html#pandas.Series.str.translate) | 等价于 ``str.translate`` +[isalnum()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isalnum.html#pandas.Series.str.isalnum) | 等价于 ``str.isalnum`` +[isalpha()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isalpha.html#pandas.Series.str.isalpha) | 等价于 ``str.isalpha`` +[isdigit()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isdigit.html#pandas.Series.str.isdigit) | 等价于 ``str.isdigit`` +[isspace()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isspace.html#pandas.Series.str.isspace) | 等价于 ``str.isspace`` +[islower()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.islower.html#pandas.Series.str.islower) | 等价于 ``str.islower`` +[isupper()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isupper.html#pandas.Series.str.isupper) | 等价于 ``str.isupper`` +[istitle()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.istitle.html#pandas.Series.str.istitle) | 等价于 ``str.istitle`` +[isnumeric()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isnumeric.html#pandas.Series.str.isnumeric) | 等价于 ``str.isnumeric`` +[isdecimal()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isdecimal.html#pandas.Series.str.isdecimal) | 等价于 ``str.isdecimal`` diff --git a/Python/pandas/user_guide/timedeltas.md b/Python/pandas/user_guide/timedeltas.md new file mode 100644 index 00000000..74f55ff2 --- /dev/null +++ b/Python/pandas/user_guide/timedeltas.md @@ -0,0 +1,832 @@ +# 时间差 + +`Timedelta`,时间差,即时间之间的差异,用 `日、时、分、秒` 等时间单位表示,时间单位可为正,也可为负。 + +`Timedelta` 是 `datetime.timedelta` 的子类,两者的操作方式相似,但 `Timedelta` 兼容 `np.timedelta64` 等数据类型,还支持自定义表示形式、能解析多种类型的数据,并支持自有属性。 + +## 解析数据,生成时间差 + +`Timedelta()` 支持用多种参数生成时间差: + +``` python +In [1]: import datetime + +# 字符串 +In [2]: pd.Timedelta('1 days') +Out[2]: Timedelta('1 days 00:00:00') + +In [3]: pd.Timedelta('1 days 00:00:00') +Out[3]: Timedelta('1 days 00:00:00') + +In [4]: pd.Timedelta('1 days 2 hours') +Out[4]: Timedelta('1 days 02:00:00') + +In [5]: pd.Timedelta('-1 days 2 min 3us') +Out[5]: Timedelta('-2 days +23:57:59.999997') + +# datetime.timedelta +# 注意:必须指定关键字参数 +In [6]: pd.Timedelta(days=1, seconds=1) +Out[6]: Timedelta('1 days 00:00:01') + +# 用整数与时间单位生成时间差 +In [7]: pd.Timedelta(1, unit='d') +Out[7]: Timedelta('1 days 00:00:00') + +# datetime.timedelta 与 np.timedelta64 +In [8]: pd.Timedelta(datetime.timedelta(days=1, seconds=1)) +Out[8]: Timedelta('1 days 00:00:01') + +In [9]: pd.Timedelta(np.timedelta64(1, 'ms')) +Out[9]: Timedelta('0 days 00:00:00.001000') + +# 用字符串表示负数时间差 +# 更接近 datetime.timedelta +In [10]: pd.Timedelta('-1us') +Out[10]: Timedelta('-1 days +23:59:59.999999') + +# 时间差缺失值 +In [11]: pd.Timedelta('nan') +Out[11]: NaT + +In [12]: pd.Timedelta('nat') +Out[12]: NaT + +# ISO8601 时间格式字符串 +In [13]: pd.Timedelta('P0DT0H1M0S') +Out[13]: Timedelta('0 days 00:01:00') + +In [14]: pd.Timedelta('P0DT0H0M0.000000123S') +Out[14]: Timedelta('0 days 00:00:00.000000') +``` + +*0.23.0 版新增*:增加了用 [ISO8601 时间格式](https://en.wikipedia.org/wiki/ISO_8601#Durations)生成时间差。 + +[DateOffsets](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offsets)(`Day`、`Hour`、`Minute`、`Second`、`Milli`、`Micro`、`Nano`)也可以用来生成时间差。 + +``` python +In [15]: pd.Timedelta(pd.offsets.Second(2)) +Out[15]: Timedelta('0 days 00:00:02') +``` + +标量运算生成的也是 `Timedelta` 标量。 + +``` python +In [16]: pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) +\ + ....: pd.Timedelta('00:00:00.000123') + ....: +Out[16]: Timedelta('2 days 00:00:02.000123') +``` + +### to_timedelta + +`pd.to_timedelta()` 可以把符合时间差格式的标量、数组、列表、序列等数据转换为`Timedelta`。输入数据是序列,输出的就是序列,输入数据是标量,输出的就是标量,其它形式的输入数据则输出 `TimedeltaIndex`。 + +`to_timedelta()` 可以解析单个字符串: + +``` python +In [17]: pd.to_timedelta('1 days 06:05:01.00003') +Out[17]: Timedelta('1 days 06:05:01.000030') + +In [18]: pd.to_timedelta('15.5us') +Out[18]: Timedelta('0 days 00:00:00.000015') +``` + +还能解析字符串列表或数组: + +``` python +In [19]: pd.to_timedelta(['1 days 06:05:01.00003', '15.5us', 'nan']) +Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015', NaT], dtype='timedelta64[ns]', freq=None) +``` + +`unit` 关键字参数指定时间差的单位: + +``` python +In [20]: pd.to_timedelta(np.arange(5), unit='s') +Out[20]: TimedeltaIndex(['00:00:00', '00:00:01', '00:00:02', '00:00:03', '00:00:04'], dtype='timedelta64[ns]', freq=None) + +In [21]: pd.to_timedelta(np.arange(5), unit='d') +Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None) +``` + +### 时间差界限 + +Pandas 时间差的纳秒解析度是 64 位整数,这就决定了 `Timedelta` 的上下限。 + +``` python +In [22]: pd.Timedelta.min +Out[22]: Timedelta('-106752 days +00:12:43.145224') + +In [23]: pd.Timedelta.max +Out[23]: Timedelta('106751 days 23:47:16.854775') +``` + +## 运算 + +以时间差为数据的 `Series` 与 `DataFrame` 支持各种运算,`datetime64 [ns]` 序列或 `Timestamps` 减法运算生成的是`timedelta64 [ns]` 序列。 + +``` python +In [24]: s = pd.Series(pd.date_range('2012-1-1', periods=3, freq='D')) + +In [25]: td = pd.Series([pd.Timedelta(days=i) for i in range(3)]) + +In [26]: df = pd.DataFrame({'A': s, 'B': td}) + +In [27]: df +Out[27]: + A B +0 2012-01-01 0 days +1 2012-01-02 1 days +2 2012-01-03 2 days + +In [28]: df['C'] = df['A'] + df['B'] + +In [29]: df +Out[29]: + A B C +0 2012-01-01 0 days 2012-01-01 +1 2012-01-02 1 days 2012-01-03 +2 2012-01-03 2 days 2012-01-05 + +In [30]: df.dtypes +Out[30]: +A datetime64[ns] +B timedelta64[ns] +C datetime64[ns] +dtype: object + +In [31]: s - s.max() +Out[31]: +0 -2 days +1 -1 days +2 0 days +dtype: timedelta64[ns] + +In [32]: s - datetime.datetime(2011, 1, 1, 3, 5) +Out[32]: +0 364 days 20:55:00 +1 365 days 20:55:00 +2 366 days 20:55:00 +dtype: timedelta64[ns] + +In [33]: s + datetime.timedelta(minutes=5) +Out[33]: +0 2012-01-01 00:05:00 +1 2012-01-02 00:05:00 +2 2012-01-03 00:05:00 +dtype: datetime64[ns] + +In [34]: s + pd.offsets.Minute(5) +Out[34]: +0 2012-01-01 00:05:00 +1 2012-01-02 00:05:00 +2 2012-01-03 00:05:00 +dtype: datetime64[ns] + +In [35]: s + pd.offsets.Minute(5) + pd.offsets.Milli(5) +Out[35]: +0 2012-01-01 00:05:00.005 +1 2012-01-02 00:05:00.005 +2 2012-01-03 00:05:00.005 +dtype: datetime64[ns] +``` + +`timedelta64 [ns]` 序列的标量运算: + +``` python +In [36]: y = s - s[0] + +In [37]: y +Out[37]: +0 0 days +1 1 days +2 2 days +dtype: timedelta64[ns] +``` + +时间差序列支持 `NaT` 值: + +``` python +In [38]: y = s - s.shift() + +In [39]: y +Out[39]: +0 NaT +1 1 days +2 1 days +dtype: timedelta64[ns] +``` + +与 `datetime` 类似,`np.nan` 把时间差设置为 `NaT`: + +``` python +In [40]: y[1] = np.nan + +In [41]: y +Out[41]: +0 NaT +1 NaT +2 1 days +dtype: timedelta64[ns] +``` + +运算符也可以显示为逆序(序列与单个对象的运算): + +``` python +In [42]: s.max() - s +Out[42]: +0 2 days +1 1 days +2 0 days +dtype: timedelta64[ns] + +In [43]: datetime.datetime(2011, 1, 1, 3, 5) - s +Out[43]: +0 -365 days +03:05:00 +1 -366 days +03:05:00 +2 -367 days +03:05:00 +dtype: timedelta64[ns] + +In [44]: datetime.timedelta(minutes=5) + s +Out[44]: +0 2012-01-01 00:05:00 +1 2012-01-02 00:05:00 +2 2012-01-03 00:05:00 +dtype: datetime64[ns] +``` + +`DataFrame` 支持 `min`、`max` 及 `idxmin`、`idxmax` 运算: + +``` python +In [45]: A = s - pd.Timestamp('20120101') - pd.Timedelta('00:05:05') + +In [46]: B = s - pd.Series(pd.date_range('2012-1-2', periods=3, freq='D')) + +In [47]: df = pd.DataFrame({'A': A, 'B': B}) + +In [48]: df +Out[48]: + A B +0 -1 days +23:54:55 -1 days +1 0 days 23:54:55 -1 days +2 1 days 23:54:55 -1 days + +In [49]: df.min() +Out[49]: +A -1 days +23:54:55 +B -1 days +00:00:00 +dtype: timedelta64[ns] + +In [50]: df.min(axis=1) +Out[50]: +0 -1 days +1 -1 days +2 -1 days +dtype: timedelta64[ns] + +In [51]: df.idxmin() +Out[51]: +A 0 +B 0 +dtype: int64 + +In [52]: df.idxmax() +Out[52]: +A 2 +B 0 +dtype: int64 +``` + +`Series` 也支持`min`、`max` 及 `idxmin`、`idxmax` 运算。标量计算结果为 `Timedelta`。 + +``` python +In [53]: df.min().max() +Out[53]: Timedelta('-1 days +23:54:55') + +In [54]: df.min(axis=1).min() +Out[54]: Timedelta('-1 days +00:00:00') + +In [55]: df.min().idxmax() +Out[55]: 'A' + +In [56]: df.min(axis=1).idxmin() +Out[56]: 0 +``` + +时间差支持 `fillna` 函数,参数是 `Timedelta`,用于指定填充值。 + +``` python +In [57]: y.fillna(pd.Timedelta(0)) +Out[57]: +0 0 days +1 0 days +2 1 days +dtype: timedelta64[ns] + +In [58]: y.fillna(pd.Timedelta(10, unit='s')) +Out[58]: +0 0 days 00:00:10 +1 0 days 00:00:10 +2 1 days 00:00:00 +dtype: timedelta64[ns] + +In [59]: y.fillna(pd.Timedelta('-1 days, 00:00:05')) +Out[59]: +0 -1 days +00:00:05 +1 -1 days +00:00:05 +2 1 days 00:00:00 +dtype: timedelta64[ns] +``` + +`Timedelta` 还支持取反、乘法及绝对值(`Abs`)运算: + +``` python +In [60]: td1 = pd.Timedelta('-1 days 2 hours 3 seconds') + +In [61]: td1 +Out[61]: Timedelta('-2 days +21:59:57') + +In [62]: -1 * td1 +Out[62]: Timedelta('1 days 02:00:03') + +In [63]: - td1 +Out[63]: Timedelta('1 days 02:00:03') + +In [64]: abs(td1) +Out[64]: Timedelta('1 days 02:00:03') +``` + +## 归约 + +`timedelta64 [ns]` 数值归约运算返回的是 `Timedelta` 对象。 一般情况下,`NaT` 不计数。 + +``` python +In [65]: y2 = pd.Series(pd.to_timedelta(['-1 days +00:00:05', 'nat', + ....: '-1 days +00:00:05', '1 days'])) + ....: + +In [66]: y2 +Out[66]: +0 -1 days +00:00:05 +1 NaT +2 -1 days +00:00:05 +3 1 days 00:00:00 +dtype: timedelta64[ns] + +In [67]: y2.mean() +Out[67]: Timedelta('-1 days +16:00:03.333333') + +In [68]: y2.median() +Out[68]: Timedelta('-1 days +00:00:05') + +In [69]: y2.quantile(.1) +Out[69]: Timedelta('-1 days +00:00:05') + +In [70]: y2.sum() +Out[70]: Timedelta('-1 days +00:00:10') +``` + +## 频率转换 + +时间差除法把 `Timedelta` 序列、`TimedeltaIndex`、`Timedelta` 标量转换为其它“频率”,`astype` 也可以将之转换为指定的时间差。这些运算生成的是序列,并把 `NaT` 转换为 `nan`。 注意,NumPy 标量除法是真除法,`astype` 则等同于取底整除(Floor Division)。 + +::: tip 说明 + +Floor Division ,即两数的商为向下取整,如,9 / 2 = 4。又译作地板除或向下取整除,本文译作**取底整除**; + +扩展知识: + +Ceiling Division,即两数的商为向上取整,如,9 / 2 = 5。又译作屋顶除或向上取整除,本文译作**取顶整除**。 + +::: + +``` python +In [71]: december = pd.Series(pd.date_range('20121201', periods=4)) + +In [72]: january = pd.Series(pd.date_range('20130101', periods=4)) + +In [73]: td = january - december + +In [74]: td[2] += datetime.timedelta(minutes=5, seconds=3) + +In [75]: td[3] = np.nan + +In [76]: td +Out[76]: +0 31 days 00:00:00 +1 31 days 00:00:00 +2 31 days 00:05:03 +3 NaT +dtype: timedelta64[ns] + +# 转为日 +In [77]: td / np.timedelta64(1, 'D') +Out[77]: +0 31.000000 +1 31.000000 +2 31.003507 +3 NaN +dtype: float64 + +In [78]: td.astype('timedelta64[D]') +Out[78]: +0 31.0 +1 31.0 +2 31.0 +3 NaN +dtype: float64 + +# 转为秒 +In [79]: td / np.timedelta64(1, 's') +Out[79]: +0 2678400.0 +1 2678400.0 +2 2678703.0 +3 NaN +dtype: float64 + +In [80]: td.astype('timedelta64[s]') +Out[80]: +0 2678400.0 +1 2678400.0 +2 2678703.0 +3 NaN +dtype: float64 + +# 转为月 (此处用常量表示) +In [81]: td / np.timedelta64(1, 'M') +Out[81]: +0 1.018501 +1 1.018501 +2 1.018617 +3 NaN +dtype: float64 +``` + +`timedelta64 [ns]` 序列与整数或整数序列相乘或相除,生成的也是 `timedelta64 [ns]` 序列。 + +``` python +In [82]: td * -1 +Out[82]: +0 -31 days +00:00:00 +1 -31 days +00:00:00 +2 -32 days +23:54:57 +3 NaT +dtype: timedelta64[ns] + +In [83]: td * pd.Series([1, 2, 3, 4]) +Out[83]: +0 31 days 00:00:00 +1 62 days 00:00:00 +2 93 days 00:15:09 +3 NaT +dtype: timedelta64[ns] +``` + +`timedelta64 [ns]` 序列与 `Timedelta` 标量相除的结果为取底整除的整数序列。 + + +``` python +In [84]: td // pd.Timedelta(days=3, hours=4) +Out[84]: +0 9.0 +1 9.0 +2 9.0 +3 NaN +dtype: float64 + +In [85]: pd.Timedelta(days=3, hours=4) // td +Out[85]: +0 0.0 +1 0.0 +2 0.0 +3 NaN +dtype: float64 +``` + +`Timedelta` 的求余(`mod(%)`)与除余(`divmod`)运算,支持时间差与数值参数。 + +``` python +In [86]: pd.Timedelta(hours=37) % datetime.timedelta(hours=2) +Out[86]: Timedelta('0 days 01:00:00') + +# 除余运算的参数为时间差时,返回一对值(int, Timedelta) +In [87]: divmod(datetime.timedelta(hours=2), pd.Timedelta(minutes=11)) +Out[87]: (10, Timedelta('0 days 00:10:00')) + +# 除余运算的参数为数值时,也返回一对值(Timedelta, Timedelta) +In [88]: divmod(pd.Timedelta(hours=25), 86400000000000) +Out[88]: (Timedelta('0 days 00:00:00.000000'), Timedelta('0 days 01:00:00')) +``` + +## 属性 + +`Timedelta` 或 `TimedeltaIndex` 的组件可以直接访问 `days`、`seconds`、`microseconds`、`nanoseconds` 等属性。这些属性与`datetime.timedelta` 的返回值相同,例如,`.seconds` 属性表示大于等于 0 天且小于 1 天的秒数。带符号的 `Timedelta` 返回的值也带符号。 + +`Series` 的 `.dt` 属性也可以直接访问这些数据。 + +::: tip 注意 + +这些属性**不是** `Timedelta` 显示的值。`.components` 可以提取显示的值。 + +::: + +对于 `Series`: + +``` python +In [89]: td.dt.days +Out[89]: +0 31.0 +1 31.0 +2 31.0 +3 NaN +dtype: float64 + +In [90]: td.dt.seconds +Out[90]: +0 0.0 +1 0.0 +2 303.0 +3 NaN +dtype: float64 +``` + +直接访问 `Timedelta` 标量字段值。 + +``` python +In [91]: tds = pd.Timedelta('31 days 5 min 3 sec') + +In [92]: tds.days +Out[92]: 31 + +In [93]: tds.seconds +Out[93]: 303 + +In [94]: (-tds).seconds +Out[94]: 86097 +``` + +`.components` 属性可以快速访问时间差的组件,返回结果是 `DataFrame`。 下列代码输出 `Timedelta` 的显示值。 + +``` python +In [95]: td.dt.components +Out[95]: + days hours minutes seconds milliseconds microseconds nanoseconds +0 31.0 0.0 0.0 0.0 0.0 0.0 0.0 +1 31.0 0.0 0.0 0.0 0.0 0.0 0.0 +2 31.0 0.0 5.0 3.0 0.0 0.0 0.0 +3 NaN NaN NaN NaN NaN NaN NaN + +In [96]: td.dt.components.seconds +Out[96]: +0 0.0 +1 0.0 +2 3.0 +3 NaN +Name: seconds, dtype: float64 +``` + +`.isoformat` 方法可以把 `Timedelta` 转换为 [ISO8601 时间格式](https://en.wikipedia.org/wiki/ISO_8601#Durations)字符串。 + +*0.20.0 版新增。* + +``` python +In [97]: pd.Timedelta(days=6, minutes=50, seconds=3, + ....: milliseconds=10, microseconds=10, + ....: nanoseconds=12).isoformat() + ....: +Out[97]: 'P6DT0H50M3.010010012S' +``` + +## TimedeltaIndex + +[`TimedeltaIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.TimedeltaIndex.html#pandas.TimedeltaIndex) 或 [`timedelta_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.timedelta_range.html#pandas.timedelta_range) 可以生成时间差索引。 + +`TimedeltaIndex` 支持字符串型的 `Timedelta`、`timedelta` 或 `np.timedelta64`对象。 + +`np.nan`、`pd.NaT`、`nat` 代表缺失值。 + +``` python +In [98]: pd.TimedeltaIndex(['1 days', '1 days, 00:00:05', np.timedelta64(2, 'D'), + ....: datetime.timedelta(days=2, seconds=2)]) + ....: +Out[98]: +TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00', + '2 days 00:00:02'], + dtype='timedelta64[ns]', freq=None) +``` + +`freq` 关键字参数为 `infer` 时,`TimedeltaIndex` 可以自行推断时间频率: + +``` python +In [99]: pd.TimedeltaIndex(['0 days', '10 days', '20 days'], freq='infer') +Out[99]: TimedeltaIndex(['0 days', '10 days', '20 days'], dtype='timedelta64[ns]', freq='10D') +``` + +### 生成时间差范围 + +与 [`date_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html#pandas.date_range) 相似,[`timedelta_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.timedelta_range.html#pandas.timedelta_range) 可以生成定频 `TimedeltaIndex`,`timedelta_range` 的默认频率是日历日: + +``` python +In [100]: pd.timedelta_range(start='1 days', periods=5) +Out[100]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D') +``` + +`timedelta_range` 支持 `start`、`end`、`periods` 三个参数: + +``` python +In [101]: pd.timedelta_range(start='1 days', end='5 days') +Out[101]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D') + +In [102]: pd.timedelta_range(end='10 days', periods=4) +Out[102]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D') +``` + +`freq` 参数支持各种[频率别名](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases): + +``` python +In [103]: pd.timedelta_range(start='1 days', end='2 days', freq='30T') +Out[103]: +TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00', + '1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00', + '1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00', + '1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00', + '1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00', + '1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00', + '1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00', + '1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00', + '1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00', + '1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00', + '1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00', + '1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00', + '1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00', + '1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00', + '1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00', + '1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00', + '2 days 00:00:00'], + dtype='timedelta64[ns]', freq='30T') + +In [104]: pd.timedelta_range(start='1 days', periods=5, freq='2D5H') +Out[104]: +TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00', + '7 days 15:00:00', '9 days 20:00:00'], + dtype='timedelta64[ns]', freq='53H') +``` + +*0.23.0 版新增*。 + +用 `start`、`end`、`period` 可以生成等宽时间差范围,其中,`start` 与 `end`(含)是起止两端的时间,`periods` 为 `TimedeltaIndex` 里的元素数量: + +``` python +In [105]: pd.timedelta_range('0 days', '4 days', periods=5) +Out[105]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None) + +In [106]: pd.timedelta_range('0 days', '4 days', periods=10) +Out[106]: +TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00', + '1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00', + '2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00', + '4 days 00:00:00'], + dtype='timedelta64[ns]', freq=None) +``` + +### TimedeltaIndex 应用 + +与 `DatetimeIndex`、`PeriodIndex` 等 `datetime` 型索引类似,`TimedeltaIndex` 也可当作 pandas 对象的索引。 + +``` python +In [107]: s = pd.Series(np.arange(100), + .....: index=pd.timedelta_range('1 days', periods=100, freq='h')) + .....: + +In [108]: s +Out[108]: +1 days 00:00:00 0 +1 days 01:00:00 1 +1 days 02:00:00 2 +1 days 03:00:00 3 +1 days 04:00:00 4 + .. +4 days 23:00:00 95 +5 days 00:00:00 96 +5 days 01:00:00 97 +5 days 02:00:00 98 +5 days 03:00:00 99 +Freq: H, Length: 100, dtype: int64 +``` + +选择操作也差不多,可以强制转换字符串与切片: + +``` python +In [109]: s['1 day':'2 day'] +Out[109]: +1 days 00:00:00 0 +1 days 01:00:00 1 +1 days 02:00:00 2 +1 days 03:00:00 3 +1 days 04:00:00 4 + .. +2 days 19:00:00 43 +2 days 20:00:00 44 +2 days 21:00:00 45 +2 days 22:00:00 46 +2 days 23:00:00 47 +Freq: H, Length: 48, dtype: int64 + +In [110]: s['1 day 01:00:00'] +Out[110]: 1 + +In [111]: s[pd.Timedelta('1 day 1h')] +Out[111]: 1 +``` + +`TimedeltaIndex` 还支持局部字符串选择,并且可以推断选择范围: + +``` python +In [112]: s['1 day':'1 day 5 hours'] +Out[112]: +1 days 00:00:00 0 +1 days 01:00:00 1 +1 days 02:00:00 2 +1 days 03:00:00 3 +1 days 04:00:00 4 +1 days 05:00:00 5 +Freq: H, dtype: int64 +``` + +### TimedeltaIndex 运算 + +`TimedeltaIndex` 与 `DatetimeIndex` 运算可以保留 `NaT` 值: + +``` python +In [113]: tdi = pd.TimedeltaIndex(['1 days', pd.NaT, '2 days']) + +In [114]: tdi.to_list() +Out[114]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')] + +In [115]: dti = pd.date_range('20130101', periods=3) + +In [116]: dti.to_list() +Out[116]: +[Timestamp('2013-01-01 00:00:00', freq='D'), + Timestamp('2013-01-02 00:00:00', freq='D'), + Timestamp('2013-01-03 00:00:00', freq='D')] + +In [117]: (dti + tdi).to_list() +Out[117]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')] + +In [118]: (dti - tdi).to_list() +Out[118]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')] +``` + +### 转换 + +与 `Series` 频率转换类似,可以把 `TimedeltaIndex` 转换为其它索引。 + +``` python +In [119]: tdi / np.timedelta64(1, 's') +Out[119]: Float64Index([86400.0, nan, 172800.0], dtype='float64') + +In [120]: tdi.astype('timedelta64[s]') +Out[120]: Float64Index([86400.0, nan, 172800.0], dtype='float64') +``` + +与标量操作类似,会返回**不同**类型的索引。 + +``` python +# 时间差与日期相加,结果为日期型索引(DatetimeIndex) +In [121]: tdi + pd.Timestamp('20130101') +Out[121]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None) + +# 日期与时间戳相减,结果为日期型数据(Timestamp) +# note that trying to subtract a date from a Timedelta will raise an exception +In [122]: (pd.Timestamp('20130101') - tdi).to_list() +Out[122]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')] + +# 时间差与时间差相加,结果还是时间差索引 +In [123]: tdi + pd.Timedelta('10 days') +Out[123]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None) + +# 除数是整数,则结果为时间差索引 +In [124]: tdi / 2 +Out[124]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None) + +# 除数是时间差,则结果为 Float64Index +In [125]: tdi / tdi[0] +Out[125]: Float64Index([1.0, nan, 2.0], dtype='float64') +``` + +## 重采样 + +与[时间序列重采样](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-resampling)一样,`TimedeltaIndex` 也支持重采样。 + +``` python +In [126]: s.resample('D').mean() +Out[126]: +1 days 11.5 +2 days 35.5 +3 days 59.5 +4 days 83.5 +5 days 97.5 +Freq: D, dtype: float64 +``` \ No newline at end of file diff --git a/Python/pandas/user_guide/timeseries.md b/Python/pandas/user_guide/timeseries.md new file mode 100644 index 00000000..628768e8 --- /dev/null +++ b/Python/pandas/user_guide/timeseries.md @@ -0,0 +1,3442 @@ +# 时间序列与日期用法 + +依托 NumPy 的 `datetime64`、`timedelta64` 等数据类型,pandas 可以处理各种时间序列数据,还能调用 `scikits.timeseries` 等 Python 支持库的时间序列功能。 + +Pandas 支持以下操作: + +解析`时间格式字符串`、`np.datetime64`、`datetime.datetime` 等多种时间序列数据。 + +```python +In [1]: import datetime + +In [2]: dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'), + ...: datetime.datetime(2018, 1, 1)]) + ...: + +In [3]: dti +Out[3]: DatetimeIndex(['2018-01-01', '2018-01-01', '2018-01-01'], dtype='datetime64[ns]', freq=None) +``` + +生成 ` DatetimeIndex `、`TimedeltaIndex `、` PeriodIndex ` 等定频日期与时间段序列。 + +```python +In [4]: dti = pd.date_range('2018-01-01', periods=3, freq='H') + +In [5]: dti +Out[5]: +DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00', + '2018-01-01 02:00:00'], + dtype='datetime64[ns]', freq='H') +``` + +处理、转换带时区的日期时间数据。 + +```python +In [6]: dti = dti.tz_localize('UTC') + +In [7]: dti +Out[7]: +DatetimeIndex(['2018-01-01 00:00:00+00:00', '2018-01-01 01:00:00+00:00', + '2018-01-01 02:00:00+00:00'], + dtype='datetime64[ns, UTC]', freq='H') + +In [8]: dti.tz_convert('US/Pacific') +Out[8]: +DatetimeIndex(['2017-12-31 16:00:00-08:00', '2017-12-31 17:00:00-08:00', + '2017-12-31 18:00:00-08:00'], + dtype='datetime64[ns, US/Pacific]', freq='H') +``` + +按指定频率重采样,并转换为时间序列。 + +```python +In [9]: idx = pd.date_range('2018-01-01', periods=5, freq='H') + +In [10]: ts = pd.Series(range(len(idx)), index=idx) + +In [11]: ts +Out[11]: +2018-01-01 00:00:00 0 +2018-01-01 01:00:00 1 +2018-01-01 02:00:00 2 +2018-01-01 03:00:00 3 +2018-01-01 04:00:00 4 +Freq: H, dtype: int64 + +In [12]: ts.resample('2H').mean() +Out[12]: +2018-01-01 00:00:00 0.5 +2018-01-01 02:00:00 2.5 +2018-01-01 04:00:00 4.0 +Freq: 2H, dtype: float64 +``` + +用绝对或相对时间差计算日期与时间。 + +```python +In [13]: friday = pd.Timestamp('2018-01-05') + +In [14]: friday.day_name() +Out[14]: 'Friday' + +# 添加 1 个日历日 +In [15]: saturday = friday + pd.Timedelta('1 day') + +In [16]: saturday.day_name() +Out[16]: 'Saturday' + +# 添加 1 个工作日,从星期五跳到星期一 +In [17]: monday = friday + pd.offsets.BDay() + +In [18]: monday.day_name() +Out[18]: 'Monday' +``` + +pandas 提供了一组精悍、实用的工具集以完成上述操作。 + +## 纵览 + +pandas 支持 4 种常见时间概念: + +1. 日期时间(Datetime):带时区的日期时间,类似于标准库的 `datetime.datetime` 。 + +2. 时间差(Timedelta):绝对时间周期,类似于标准库的 `datetime.timedelta`。 + +3. 时间段(Timespan):在某一时点以指定频率定义的时间跨度。 + +4. 日期偏移(Dateoffset):与日历运算对应的时间段,类似于 `dateutil` 的 `dateutil.relativedelta.relativedelta`。 + + +| 时间概念 | 标量类 | 数组类 | Pandas 数据类型 |主要构建方法 | +| :-----------: | :-----------: | :---------------: | :--------------------------------------: | :----------------------------------: | +| Date times | `Timestamp` | `DatetimeIndex` | `datetime64[ns]` 或 `datetime64[ns,tz]` | `to_datetime` 或 `date_range` | +| Time deltas | `Timedelta` | `TimedeltaIndex` | `timedelta64[ns]` | `to_timedelta` 或 `timedelta_range` | +| Time spans | `Period` | `PeriodIndex` | `period[freq]` | `Period` 或 `period_range` | +| Date offsets | `DateOffset` | `None` | `None` | `DateOffset` | + +一般情况下,时间序列主要是 [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 或 [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame "pandas.DataFrame") 的时间型索引,可以用时间元素进行操控。 + +```python +In [19]: pd.Series(range(3), index=pd.date_range('2000', freq='D', periods=3)) +Out[19]: +2000-01-01 0 +2000-01-02 1 +2000-01-03 2 +Freq: D, dtype: int64 +``` + +当然,[`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 与 [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame "pandas.DataFrame") 也可以直接把时间序列当成数据。 + +```python +In [20]: pd.Series(pd.date_range('2000', freq='D', periods=3)) +Out[20]: +0 2000-01-01 +1 2000-01-02 +2 2000-01-03 +dtype: datetime64[ns] +``` + +[`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 与 [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame "pandas.DataFrame") 提供了 `datetime`、`timedelta` 、`Period` 扩展类型与专有用法,不过,`Dateoffset` 则保存为 `object`。 + +```python +In [21]: pd.Series(pd.period_range('1/1/2011', freq='M', periods=3)) +Out[21]: +0 2011-01 +1 2011-02 +2 2011-03 +dtype: period[M] + +In [22]: pd.Series([pd.DateOffset(1), pd.DateOffset(2)]) +Out[22]: +0 +1 <2 * DateOffsets> +dtype: object + +In [23]: pd.Series(pd.date_range('1/1/2011', freq='M', periods=3)) +Out[23]: +0 2011-01-31 +1 2011-02-28 +2 2011-03-31 +dtype: datetime64[ns] +``` + +Pandas 用 `NaT` 表示日期时间、时间差及时间段的空值,代表了缺失日期或空日期的值,类似于浮点数的 `np.nan`。 + +```python +In [24]: pd.Timestamp(pd.NaT) +Out[24]: NaT + +In [25]: pd.Timedelta(pd.NaT) +Out[25]: NaT + +In [26]: pd.Period(pd.NaT) +Out[26]: NaT + +# 与 np.nan 一样,pd.NaT 不等于 pd.NaT +In [27]: pd.NaT == pd.NaT +Out[27]: False +``` + +## 时间戳 vs. 时间段 + +时间戳是最基本的时间序列数据,用于把数值与时点关联在一起。Pandas 对象通过时间戳调用时点数据。 + +```python +In [28]: pd.Timestamp(datetime.datetime(2012, 5, 1)) +Out[28]: Timestamp('2012-05-01 00:00:00') + +In [29]: pd.Timestamp('2012-05-01') +Out[29]: Timestamp('2012-05-01 00:00:00') + +In [30]: pd.Timestamp(2012, 5, 1) +Out[30]: Timestamp('2012-05-01 00:00:00') +``` + +不过,大多数情况下,用时间段改变变量更自然。`Period` 表示的时间段更直观,还可以用日期时间格式的字符串进行推断。 + +示例如下: + +```python +In [31]: pd.Period('2011-01') +Out[31]: Period('2011-01', 'M') + +In [32]: pd.Period('2012-05', freq='D') +Out[32]: Period('2012-05-01', 'D') +``` + +[`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 与 [`Period`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Period.html#pandas.Period "pandas.Period") 可以用作索引。作为索引的 `Timestamp` 与 `Period` 列表则被强制转换为对应的 [`DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex "pandas.DatetimeIndex") 与 [`PeriodIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.PeriodIndex.html#pandas.PeriodIndex "pandas.PeriodIndex")。 + +```python +In [33]: dates = [pd.Timestamp('2012-05-01'), + ....: pd.Timestamp('2012-05-02'), + ....: pd.Timestamp('2012-05-03')] + ....: + +In [34]: ts = pd.Series(np.random.randn(3), dates) + +In [35]: type(ts.index) +Out[35]: pandas.core.indexes.datetimes.DatetimeIndex + +In [36]: ts.index +Out[36]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None) + +In [37]: ts +Out[37]: +2012-05-01 0.469112 +2012-05-02 -0.282863 +2012-05-03 -1.509059 +dtype: float64 + +In [38]: periods = [pd.Period('2012-01'), pd.Period('2012-02'), pd.Period('2012-03')] + +In [39]: ts = pd.Series(np.random.randn(3), periods) + +In [40]: type(ts.index) +Out[40]: pandas.core.indexes.period.PeriodIndex + +In [41]: ts.index +Out[41]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M') + +In [42]: ts +Out[42]: +2012-01 -1.135632 +2012-02 1.212112 +2012-03 -0.173215 +Freq: M, dtype: float64 +``` + +Pandas 可以识别这两种表现形式,并在两者之间进行转化。Pandas 后台用 `Timestamp` 实例代表时间戳,用 `DatetimeIndex` 实例代表时间戳序列。pandas 用 `Period` 对象表示符合规律的时间段标量值,用 `PeriodIndex` 表示时间段序列。未来版本将支持用任意起止时间实现不规律时间间隔。 + +## 转换时间戳 + +`to_datetime` 函数用于转换字符串、纪元式及混合的日期 [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 或日期列表。转换的是 `Series` 时,返回的是具有相同的索引的 `Series`,日期时间列表则会被转换为 `DatetimeIndex`: + +```python +In [43]: pd.to_datetime(pd.Series(['Jul 31, 2009', '2010-01-10', None])) +Out[43]: +0 2009-07-31 +1 2010-01-10 +2 NaT +dtype: datetime64[ns] + +In [44]: pd.to_datetime(['2005/11/23', '2010.12.31']) +Out[44]: DatetimeIndex(['2005-11-23', '2010-12-31'], dtype='datetime64[ns]', freq=None) +``` + +解析欧式日期(日-月-年),要用 `dayfirst` 关键字参数: + +```python +In [45]: pd.to_datetime(['04-01-2012 10:00'], dayfirst=True) +Out[45]: DatetimeIndex(['2012-01-04 10:00:00'], dtype='datetime64[ns]', freq=None) + +In [46]: pd.to_datetime(['14-01-2012', '01-14-2012'], dayfirst=True) +Out[46]: DatetimeIndex(['2012-01-14', '2012-01-14'], dtype='datetime64[ns]', freq=None) +``` + +::: danger 警告 + +从上例可以看出,`dayfirst` 并没有那么严苛,如果不能把第一个数解析为**日**,就会以 `dayfirst` 为 `False` 进行解析。 + +::: + +`to_datetime` 转换单个字符串时,返回的是单个 `Timestamp`。`Timestamp` 仅支持字符串输入,不支持 `dayfirst`、`format` 等字符串解析选项,如果要使用这些选项,就要用 `to_datetime`。 + +```python +In [47]: pd.to_datetime('2010/11/12') +Out[47]: Timestamp('2010-11-12 00:00:00') + +In [48]: pd.Timestamp('2010/11/12') +Out[48]: Timestamp('2010-11-12 00:00:00') +``` + +Pandas 还支持直接使用 `DatetimeIndex` 构建器: + +```python +In [49]: pd.DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05']) +Out[49]: DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], dtype='datetime64[ns]', freq=None) +``` + +创建 `DatetimeIndex` 时,传递字符串 `infer` 即可推断索引的频率。 + +```python +In [50]: pd.DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], freq='infer') +Out[50]: DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], dtype='datetime64[ns]', freq='2D') +``` + +### 提供格式参数 + +要实现精准转换,除了传递 `datetime` 字符串,还要指定 `format` 参数,指定此参数还可以加速转换速度。 + +```python +In [51]: pd.to_datetime('2010/11/12', format='%Y/%m/%d') +Out[51]: Timestamp('2010-11-12 00:00:00') + +In [52]: pd.to_datetime('12-11-2010 00:00', format='%d-%m-%Y %H:%M') +Out[52]: Timestamp('2010-11-12 00:00:00') +``` + +要了解更多 `format` 选项,请参阅 Python [日期时间文档](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)。 + +### 用多列组合日期时间 + +*0.18.1 版新增。* + +pandas 还可以把 `DataFrame` 里的整数或字符串列组合成 `Timestamp Series`。 + +```python +In [53]: df = pd.DataFrame({'year': [2015, 2016], + ....: 'month': [2, 3], + ....: 'day': [4, 5], + ....: 'hour': [2, 3]}) + ....: + +In [54]: pd.to_datetime(df) +Out[54]: +0 2015-02-04 02:00:00 +1 2016-03-05 03:00:00 +dtype: datetime64[ns] +``` + +只传递组合所需的列也可以。 + +```python +In [55]: pd.to_datetime(df[['year', 'month', 'day']]) +Out[55]: +0 2015-02-04 +1 2016-03-05 +dtype: datetime64[ns] +``` + +`pd.to_datetime` 查找列名里日期时间组件的标准名称,包括: + + * 必填:`year`、`month`、`day` + * 可选:`hour`、`minute`、`second`、`millisecond`、`microsecond`、`nanosecond` + +### 无效数据 + +不可解析时,默认值 `errors='raise'` 会触发错误: + +```python +In [2]: pd.to_datetime(['2009/07/31', 'asd'], errors='raise') +ValueError: Unknown string format +``` + +`errors='ignore'` 返回原始输入: + +```python +In [56]: pd.to_datetime(['2009/07/31', 'asd'], errors='ignore') +Out[56]: Index(['2009/07/31', 'asd'], dtype='object') +``` + +`errors='coerce'` 把无法解析的数据转换为 `NaT`,即不是时间(Not a Time): + +```python +In [57]: pd.to_datetime(['2009/07/31', 'asd'], errors='coerce') +Out[57]: DatetimeIndex(['2009-07-31', 'NaT'], dtype='datetime64[ns]', freq=None) +``` + +### 纪元时间戳 + +pandas 支持把整数或浮点数纪元时间转换为 `Timestamp` 与 `DatetimeIndex`。鉴于 `Timestamp` 对象内部存储方式,这种转换的默认单位是纳秒。不过,一般都会用指定其它时间单位 `unit` 来存储纪元数据,纪元时间从 `origin` 参数指定的时点开始计算。 + +```python +In [58]: pd.to_datetime([1349720105, 1349806505, 1349892905, + ....: 1349979305, 1350065705], unit='s') + ....: +Out[58]: +DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05', + '2012-10-10 18:15:05', '2012-10-11 18:15:05', + '2012-10-12 18:15:05'], + dtype='datetime64[ns]', freq=None) + +In [59]: pd.to_datetime([1349720105100, 1349720105200, 1349720105300, + ....: 1349720105400, 1349720105500], unit='ms') + ....: +Out[59]: +DatetimeIndex(['2012-10-08 18:15:05.100000', '2012-10-08 18:15:05.200000', + '2012-10-08 18:15:05.300000', '2012-10-08 18:15:05.400000', + '2012-10-08 18:15:05.500000'], + dtype='datetime64[ns]', freq=None) +``` + +用带 `tz` 参数的纪元时间戳创建 [`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 或 [`DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex "pandas.DatetimeIndex") 时,要先把纪元时间戳转化为 UTC,然后再把结果转换为指定时区。不过这种操作方式现在已经[废弃](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#whatsnew-0240-deprecations-integer-tz)了,对于其它时区 Wall Time 里的纪元时间戳,建议先把纪元时间戳转换为无时区时间戳,然后再把时区本地化。 + +```python +In [60]: pd.Timestamp(1262347200000000000).tz_localize('US/Pacific') +Out[60]: Timestamp('2010-01-01 12:00:00-0800', tz='US/Pacific') + +In [61]: pd.DatetimeIndex([1262347200000000000]).tz_localize('US/Pacific') +Out[61]: DatetimeIndex(['2010-01-01 12:00:00-08:00'], dtype='datetime64[ns, US/Pacific]', freq=None) +``` + +::: tip 注意 + +纪元时间取整到最近的纳秒。 + +::: + +::: danger 警告 + +[Python 浮点数](https://docs.python.org/3/tutorial/floatingpoint.html#tut-fp-issues "(in Python v3.7)")只精确到 15 位小数,因此,转换浮点纪元时间可能会导致不精准或失控的结果。转换过程中,免不了会对高精度 `Timestamp` 取整,只有用 `int64` 等定宽类型才有可能实现极其精准的效果。 + +```python +In [62]: pd.to_datetime([1490195805.433, 1490195805.433502912], unit='s') +Out[62]: DatetimeIndex(['2017-03-22 15:16:45.433000088', '2017-03-22 15:16:45.433502913'], dtype='datetime64[ns]', freq=None) + +In [63]: pd.to_datetime(1490195805433502912, unit='ns') +Out[63]: Timestamp('2017-03-22 15:16:45.433502912') +``` +::: + +::: tip 注意 + +纪元时间取整到最近的纳秒。 + +::: + +::: tip 参阅 + +[应用 `origin` 参数](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-origin) + +::: + +### 把时间戳转换为纪元 + +反转上述操作,把 `Timestamp` 转换为 `unix` 纪元: + +```python +In [64]: stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='D') + +In [65]: stamps +Out[65]: +DatetimeIndex(['2012-10-08 18:15:05', '2012-10-09 18:15:05', + '2012-10-10 18:15:05', '2012-10-11 18:15:05'], + dtype='datetime64[ns]', freq='D') +``` + +首先与纪元开始时点(1970 年 1 月 1 日午夜,UTC)相减,然后以 1 秒为时间单位(`unit='1s'`)取底整除。 + +```python +In [66]: (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s') +Out[66]: Int64Index([1349720105, 1349806505, 1349892905, 1349979305], dtype='int64') +``` + +### 应用 `origin` 参数 + +*0.20.0 版新增。* + +`origin` 参数可以指定 `DatetimeIndex` 的备选开始时点。例如,把`1960-01-01` 作为开始日期: + +```python +In [67]: pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01')) +Out[67]: DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None) +``` + +默认值为 `origin='unix'`,即 `1970-01-01 00:00:00`,一般把这个时点称为 `unix 纪元` 或 `POSIX` 时间。 + +```python +In [68]: pd.to_datetime([1, 2, 3], unit='D') +Out[68]: DatetimeIndex(['1970-01-02', '1970-01-03', '1970-01-04'], dtype='datetime64[ns]', freq=None) +``` + +## 生成时间戳范围 + +`DatetimeIndex`、`Index` 构建器可以生成时间戳索引,此处要提供 `datetime` 对象列表。 + +```python +In [69]: dates = [datetime.datetime(2012, 5, 1), + ....: datetime.datetime(2012, 5, 2), + ....: datetime.datetime(2012, 5, 3)] + ....: + +# 注意频率信息 +In [70]: index = pd.DatetimeIndex(dates) + +In [71]: index +Out[71]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None) + +# 自动转换为 DatetimeIndex +In [72]: index = pd.Index(dates) + +In [73]: index +Out[73]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None) +``` + +实际工作中,经常要生成含大量时间戳的超长索引,一个个输入时间戳又枯燥,又低效。如果时间戳是定频的,用 [`date_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html#pandas.date_range "pandas.date_range") 与 [`bdate_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.bdate_range.html#pandas.bdate_range "pandas.bdate_range") 函数即可创建 `DatetimeIndex`。`date_range` 默认的频率是**日历日**,`bdate_range` 的默认频率是**工作日**: + +```python +In [74]: start = datetime.datetime(2011, 1, 1) + +In [75]: end = datetime.datetime(2012, 1, 1) + +In [76]: index = pd.date_range(start, end) + +In [77]: index +Out[77]: +DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', + '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08', + '2011-01-09', '2011-01-10', + ... + '2011-12-23', '2011-12-24', '2011-12-25', '2011-12-26', + '2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30', + '2011-12-31', '2012-01-01'], + dtype='datetime64[ns]', length=366, freq='D') + +In [78]: index = pd.bdate_range(start, end) + +In [79]: index +Out[79]: +DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', + '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12', + '2011-01-13', '2011-01-14', + ... + '2011-12-19', '2011-12-20', '2011-12-21', '2011-12-22', + '2011-12-23', '2011-12-26', '2011-12-27', '2011-12-28', + '2011-12-29', '2011-12-30'], + dtype='datetime64[ns]', length=260, freq='B') +``` + +`date_range`、`bdate_range` 等便捷函数可以调用各种[频率别名](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases): + +```python +In [80]: pd.date_range(start, periods=1000, freq='M') +Out[80]: +DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-30', + '2011-05-31', '2011-06-30', '2011-07-31', '2011-08-31', + '2011-09-30', '2011-10-31', + ... + '2093-07-31', '2093-08-31', '2093-09-30', '2093-10-31', + '2093-11-30', '2093-12-31', '2094-01-31', '2094-02-28', + '2094-03-31', '2094-04-30'], + dtype='datetime64[ns]', length=1000, freq='M') + +In [81]: pd.bdate_range(start, periods=250, freq='BQS') +Out[81]: +DatetimeIndex(['2011-01-03', '2011-04-01', '2011-07-01', '2011-10-03', + '2012-01-02', '2012-04-02', '2012-07-02', '2012-10-01', + '2013-01-01', '2013-04-01', + ... + '2071-01-01', '2071-04-01', '2071-07-01', '2071-10-01', + '2072-01-01', '2072-04-01', '2072-07-01', '2072-10-03', + '2073-01-02', '2073-04-03'], + dtype='datetime64[ns]', length=250, freq='BQS-JAN') +``` + +`date_range` 与 `bdate_range` 通过指定 `start`、`end`、`period` 与 `freq` 等参数,简化了生成日期范围这项工作。开始与结束日期是必填项,因此,不会生成指定范围之外的日期。 + +```python +In [82]: pd.date_range(start, end, freq='BM') +Out[82]: +DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29', + '2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31', + '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'], + dtype='datetime64[ns]', freq='BM') + +In [83]: pd.date_range(start, end, freq='W') +Out[83]: +DatetimeIndex(['2011-01-02', '2011-01-09', '2011-01-16', '2011-01-23', + '2011-01-30', '2011-02-06', '2011-02-13', '2011-02-20', + '2011-02-27', '2011-03-06', '2011-03-13', '2011-03-20', + '2011-03-27', '2011-04-03', '2011-04-10', '2011-04-17', + '2011-04-24', '2011-05-01', '2011-05-08', '2011-05-15', + '2011-05-22', '2011-05-29', '2011-06-05', '2011-06-12', + '2011-06-19', '2011-06-26', '2011-07-03', '2011-07-10', + '2011-07-17', '2011-07-24', '2011-07-31', '2011-08-07', + '2011-08-14', '2011-08-21', '2011-08-28', '2011-09-04', + '2011-09-11', '2011-09-18', '2011-09-25', '2011-10-02', + '2011-10-09', '2011-10-16', '2011-10-23', '2011-10-30', + '2011-11-06', '2011-11-13', '2011-11-20', '2011-11-27', + '2011-12-04', '2011-12-11', '2011-12-18', '2011-12-25', + '2012-01-01'], + dtype='datetime64[ns]', freq='W-SUN') + +In [84]: pd.bdate_range(end=end, periods=20) +Out[84]: +DatetimeIndex(['2011-12-05', '2011-12-06', '2011-12-07', '2011-12-08', + '2011-12-09', '2011-12-12', '2011-12-13', '2011-12-14', + '2011-12-15', '2011-12-16', '2011-12-19', '2011-12-20', + '2011-12-21', '2011-12-22', '2011-12-23', '2011-12-26', + '2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30'], + dtype='datetime64[ns]', freq='B') + +In [85]: pd.bdate_range(start=start, periods=20) +Out[85]: +DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', + '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12', + '2011-01-13', '2011-01-14', '2011-01-17', '2011-01-18', + '2011-01-19', '2011-01-20', '2011-01-21', '2011-01-24', + '2011-01-25', '2011-01-26', '2011-01-27', '2011-01-28'], + dtype='datetime64[ns]', freq='B') +``` + +*0.23.0 版新增。* + +指定 `start`、`end`、`periods` 即可生成从 `start` 开始至 `end` 结束的等距日期范围,这个日期范围包含了 `start` 与 `end`,生成的 `DatetimeIndex` 里的元素数量为 `periods` 的值。 + +```python +In [86]: pd.date_range('2018-01-01', '2018-01-05', periods=5) +Out[86]: +DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', + '2018-01-05'], + dtype='datetime64[ns]', freq=None) + +In [87]: pd.date_range('2018-01-01', '2018-01-05', periods=10) +Out[87]: +DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 10:40:00', + '2018-01-01 21:20:00', '2018-01-02 08:00:00', + '2018-01-02 18:40:00', '2018-01-03 05:20:00', + '2018-01-03 16:00:00', '2018-01-04 02:40:00', + '2018-01-04 13:20:00', '2018-01-05 00:00:00'], + dtype='datetime64[ns]', freq=None) +``` + +### 自定义频率范围 + +设定 `weekmask` 与 `holidays` 参数,`bdate_range` 还可以生成自定义频率日期范围。这些参数只用于传递自定义字符串。 + +```python +In [88]: weekmask = 'Mon Wed Fri' + +In [89]: holidays = [datetime.datetime(2011, 1, 5), datetime.datetime(2011, 3, 14)] + +In [90]: pd.bdate_range(start, end, freq='C', weekmask=weekmask, holidays=holidays) +Out[90]: +DatetimeIndex(['2011-01-03', '2011-01-07', '2011-01-10', '2011-01-12', + '2011-01-14', '2011-01-17', '2011-01-19', '2011-01-21', + '2011-01-24', '2011-01-26', + ... + '2011-12-09', '2011-12-12', '2011-12-14', '2011-12-16', + '2011-12-19', '2011-12-21', '2011-12-23', '2011-12-26', + '2011-12-28', '2011-12-30'], + dtype='datetime64[ns]', length=154, freq='C') + +In [91]: pd.bdate_range(start, end, freq='CBMS', weekmask=weekmask) +Out[91]: +DatetimeIndex(['2011-01-03', '2011-02-02', '2011-03-02', '2011-04-01', + '2011-05-02', '2011-06-01', '2011-07-01', '2011-08-01', + '2011-09-02', '2011-10-03', '2011-11-02', '2011-12-02'], + dtype='datetime64[ns]', freq='CBMS') +``` + +::: tip 参阅 + +[自定义工作日](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-custombusinessdays) + +::: + +## 时间戳的界限 + +Pandas 时间戳的最低单位为纳秒,64 位整数显示的时间跨度约为 584 年,这就是 `Timestamp` 的界限: + +```python +In [92]: pd.Timestamp.min +Out[92]: Timestamp('1677-09-21 00:12:43.145225') + +In [93]: pd.Timestamp.max +Out[93]: Timestamp('2262-04-11 23:47:16.854775807') +``` + +::: tip 参阅 + + [时间段越界展示](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-oob) + +::: + +## 索引 + +`DatetimeIndex` 主要用作 pandas 对象的索引。`DatetimeIndex` 类为时间序列做了很多优化: + +* 预计算了各种偏移量的日期范围,并在后台缓存,让后台生成后续日期范围的速度非常快(仅需抓取切片)。 + +* 在 pandas 对象上使用 `shift` 与 `tshift` 方法进行快速偏移。 + +* 合并具有相同频率的重叠 `DatetimeIndex` 对象的速度非常快(这点对快速数据对齐非常重要)。 + +* 通过 `year`、`month` 等属性快速访问日期字段。 + +* `snap` 等正则函数与超快的 `asof` 逻辑。 + +`DatetimeIndex` 对象支持全部常规 `Index` 对象的基本用法,及一些列简化频率处理的高级时间序列专有方法。 + +::: tip 参阅 + +[重置索引](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-reindexing) + +::: + +::: tip 注意 + +Pandas 不强制排序日期索引,但如果日期没有排序,可能会引发可控范围之外的或不正确的操作。 + +::: + +`DatetimeIndex` 可以当作常规索引,支持选择、切片等方法。 + +```python +In [94]: rng = pd.date_range(start, end, freq='BM') + +In [95]: ts = pd.Series(np.random.randn(len(rng)), index=rng) + +In [96]: ts.index +Out[96]: +DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29', + '2011-05-31', '2011-06-30', '2011-07-29', '2011-08-31', + '2011-09-30', '2011-10-31', '2011-11-30', '2011-12-30'], + dtype='datetime64[ns]', freq='BM') + +In [97]: ts[:5].index +Out[97]: +DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31', '2011-04-29', + '2011-05-31'], + dtype='datetime64[ns]', freq='BM') + +In [98]: ts[::2].index +Out[98]: +DatetimeIndex(['2011-01-31', '2011-03-31', '2011-05-31', '2011-07-29', + '2011-09-30', '2011-11-30'], + dtype='datetime64[ns]', freq='2BM') +``` + +### 局部字符串索引 + +能解析为时间戳的日期与字符串可以作为索引的参数: + +```python +In [99]: ts['1/31/2011'] +Out[99]: 0.11920871129693428 + +In [100]: ts[datetime.datetime(2011, 12, 25):] +Out[100]: +2011-12-30 0.56702 +Freq: BM, dtype: float64 + +In [101]: ts['10/31/2011':'12/31/2011'] +Out[101]: +2011-10-31 0.271860 +2011-11-30 -0.424972 +2011-12-30 0.567020 +Freq: BM, dtype: float64 +``` + +pandas 为访问较长的时间序列提供了便捷方法,**年**、**年月**字符串均可: + +```python +In [102]: ts['2011'] +Out[102]: +2011-01-31 0.119209 +2011-02-28 -1.044236 +2011-03-31 -0.861849 +2011-04-29 -2.104569 +2011-05-31 -0.494929 +2011-06-30 1.071804 +2011-07-29 0.721555 +2011-08-31 -0.706771 +2011-09-30 -1.039575 +2011-10-31 0.271860 +2011-11-30 -0.424972 +2011-12-30 0.567020 +Freq: BM, dtype: float64 + +In [103]: ts['2011-6'] +Out[103]: +2011-06-30 1.071804 +Freq: BM, dtype: float64 +``` + +带 `DatetimeIndex` 的 `DateFrame` 也支持这种切片方式。局部字符串是标签切片的一种形式,这种切片也**包含**截止时点,即,与日期匹配的时间也会包含在内: + +```python +In [104]: dft = pd.DataFrame(np.random.randn(100000, 1), columns=['A'], + .....: index=pd.date_range('20130101', periods=100000, freq='T')) + .....: + +In [105]: dft +Out[105]: + A +2013-01-01 00:00:00 0.276232 +2013-01-01 00:01:00 -1.087401 +2013-01-01 00:02:00 -0.673690 +2013-01-01 00:03:00 0.113648 +2013-01-01 00:04:00 -1.478427 +... ... +2013-03-11 10:35:00 -0.747967 +2013-03-11 10:36:00 -0.034523 +2013-03-11 10:37:00 -0.201754 +2013-03-11 10:38:00 -1.509067 +2013-03-11 10:39:00 -1.693043 + +[100000 rows x 1 columns] + +In [106]: dft['2013'] +Out[106]: + A +2013-01-01 00:00:00 0.276232 +2013-01-01 00:01:00 -1.087401 +2013-01-01 00:02:00 -0.673690 +2013-01-01 00:03:00 0.113648 +2013-01-01 00:04:00 -1.478427 +... ... +2013-03-11 10:35:00 -0.747967 +2013-03-11 10:36:00 -0.034523 +2013-03-11 10:37:00 -0.201754 +2013-03-11 10:38:00 -1.509067 +2013-03-11 10:39:00 -1.693043 + +[100000 rows x 1 columns] +``` + +下列代码截取了自 1 月 1 日凌晨起,至 2 月 28 日午夜的日期与时间。 + +```python +In [107]: dft['2013-1':'2013-2'] +Out[107]: + A +2013-01-01 00:00:00 0.276232 +2013-01-01 00:01:00 -1.087401 +2013-01-01 00:02:00 -0.673690 +2013-01-01 00:03:00 0.113648 +2013-01-01 00:04:00 -1.478427 +... ... +2013-02-28 23:55:00 0.850929 +2013-02-28 23:56:00 0.976712 +2013-02-28 23:57:00 -2.693884 +2013-02-28 23:58:00 -1.575535 +2013-02-28 23:59:00 -1.573517 + +[84960 rows x 1 columns] +``` + +下列代码截取了**包含截止日期及其时间在内**的日期与时间。 + +```python +In [108]: dft['2013-1':'2013-2-28'] +Out[108]: + A +2013-01-01 00:00:00 0.276232 +2013-01-01 00:01:00 -1.087401 +2013-01-01 00:02:00 -0.673690 +2013-01-01 00:03:00 0.113648 +2013-01-01 00:04:00 -1.478427 +... ... +2013-02-28 23:55:00 0.850929 +2013-02-28 23:56:00 0.976712 +2013-02-28 23:57:00 -2.693884 +2013-02-28 23:58:00 -1.575535 +2013-02-28 23:59:00 -1.573517 + +[84960 rows x 1 columns] +``` + +下列代码指定了精准的截止时间,注意此处的结果与上述截取结果的区别: + +```python +In [109]: dft['2013-1':'2013-2-28 00:00:00'] +Out[109]: + A +2013-01-01 00:00:00 0.276232 +2013-01-01 00:01:00 -1.087401 +2013-01-01 00:02:00 -0.673690 +2013-01-01 00:03:00 0.113648 +2013-01-01 00:04:00 -1.478427 +... ... +2013-02-27 23:56:00 1.197749 +2013-02-27 23:57:00 0.720521 +2013-02-27 23:58:00 -0.072718 +2013-02-27 23:59:00 -0.681192 +2013-02-28 00:00:00 -0.557501 + +[83521 rows x 1 columns] +``` + +截止时间是索引的一部分,包含在截取的内容之内: + +```python +In [110]: dft['2013-1-15':'2013-1-15 12:30:00'] +Out[110]: + A +2013-01-15 00:00:00 -0.984810 +2013-01-15 00:01:00 0.941451 +2013-01-15 00:02:00 1.559365 +2013-01-15 00:03:00 1.034374 +2013-01-15 00:04:00 -1.480656 +... ... +2013-01-15 12:26:00 0.371454 +2013-01-15 12:27:00 -0.930806 +2013-01-15 12:28:00 -0.069177 +2013-01-15 12:29:00 0.066510 +2013-01-15 12:30:00 -0.003945 + +[751 rows x 1 columns] +``` + +*0.18.0 版新增*。 + +`DatetimeIndex` 局部字符串索引还支持多重索引 `DataFrame`。 + +```python +In [111]: dft2 = pd.DataFrame(np.random.randn(20, 1), + .....: columns=['A'], + .....: index=pd.MultiIndex.from_product( + .....: [pd.date_range('20130101', periods=10, freq='12H'), + .....: ['a', 'b']])) + .....: + +In [112]: dft2 +Out[112]: + A +2013-01-01 00:00:00 a -0.298694 + b 0.823553 +2013-01-01 12:00:00 a 0.943285 + b -1.479399 +2013-01-02 00:00:00 a -1.643342 +... ... +2013-01-04 12:00:00 b 0.069036 +2013-01-05 00:00:00 a 0.122297 + b 1.422060 +2013-01-05 12:00:00 a 0.370079 + b 1.016331 + +[20 rows x 1 columns] + +In [113]: dft2.loc['2013-01-05'] +Out[113]: + A +2013-01-05 00:00:00 a 0.122297 + b 1.422060 +2013-01-05 12:00:00 a 0.370079 + b 1.016331 + +In [114]: idx = pd.IndexSlice + +In [115]: dft2 = dft2.swaplevel(0, 1).sort_index() + +In [116]: dft2.loc[idx[:, '2013-01-05'], :] +Out[116]: + A +a 2013-01-05 00:00:00 0.122297 + 2013-01-05 12:00:00 0.370079 +b 2013-01-05 00:00:00 1.422060 + 2013-01-05 12:00:00 1.016331 +``` + +*0.25.0 版新增*。 + +字符串索引切片支持 UTC 偏移。 + +```python +In [117]: df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) + +In [118]: df +Out[118]: + 0 +2019-01-01 00:00:00-08:00 0 + +In [119]: df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00'] +Out[119]: + 0 +2019-01-01 00:00:00-08:00 0 +``` + +### 切片 vs. 精准匹配 + +*0.20.0 版新增。* + +基于索引的精度,字符串既可用于切片,也可用于精准匹配。字符串精度比索引精度低,就是切片,比索引精度高,则是精准匹配。 + +```python +In [120]: series_minute = pd.Series([1, 2, 3], + .....: pd.DatetimeIndex(['2011-12-31 23:59:00', + .....: '2012-01-01 00:00:00', + .....: '2012-01-01 00:02:00'])) + .....: + +In [121]: series_minute.index.resolution +Out[121]: 'minute' +``` + +下例中的时间戳字符串没有 `Series` 对象的精度高。`series_minute` 到`秒`,时间戳字符串只到`分`。 + +```python +In [122]: series_minute['2011-12-31 23'] +Out[122]: +2011-12-31 23:59:00 1 +dtype: int64 +``` + +精度为分钟(或更高精度)的时间戳字符串,给出的是标量,不会被当作切片。 + +```python +In [123]: series_minute['2011-12-31 23:59'] +Out[123]: 1 + +In [124]: series_minute['2011-12-31 23:59:00'] +Out[124]: 1 +``` + +索引的精度为秒时,精度为分钟的时间戳返回的是 `Series`。 + +```python +In [125]: series_second = pd.Series([1, 2, 3], + .....: pd.DatetimeIndex(['2011-12-31 23:59:59', + .....: '2012-01-01 00:00:00', + .....: '2012-01-01 00:00:01'])) + .....: + +In [126]: series_second.index.resolution +Out[126]: 'second' + +In [127]: series_second['2011-12-31 23:59'] +Out[127]: +2011-12-31 23:59:59 1 +dtype: int64 +``` + +用时间戳字符串切片时,还可以用 `[]` 索引 `DataFrame`。 + +```python +In [128]: dft_minute = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, + .....: index=series_minute.index) + .....: + +In [129]: dft_minute['2011-12-31 23'] +Out[129]: + a b +2011-12-31 23:59:00 1 4 +``` + +::: danger 警告 + +字符串执行精确匹配时,用 `[]` 按列,而不是按行截取 `DateFrame` ,参阅 [索引基础](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-basics)。如,`dft_minute ['2011-12-31 23:59']` 会触发 `KeyError`,这是因为 `2012-12-31 23:59`与索引的精度一样,但没有叫这个名字的列。 + +为了实现精准切片,要用 `.loc` 对行进行切片或选择。 + +```python +In [130]: dft_minute.loc['2011-12-31 23:59'] +Out[130]: +a 1 +b 4 +Name: 2011-12-31 23:59:00, dtype: int64 +``` + +::: + +注意:`DatetimeIndex` 精度不能低于日。 + +```python +In [131]: series_monthly = pd.Series([1, 2, 3], + .....: pd.DatetimeIndex(['2011-12', '2012-01', '2012-02'])) + .....: + +In [132]: series_monthly.index.resolution +Out[132]: 'day' + +In [133]: series_monthly['2011-12'] # 返回的是 Series +Out[133]: +2011-12-01 1 +dtype: int64 +``` + +### 精确索引 + +正如上节所述,局部字符串依靠时间段的**精度**索引 `DatetimeIndex`,即时间间隔与索引精度相关。反之,用 `Timestamp` 或 `datetime` 索引更精准,这些对象指定的时间更精确。注意,精确索引包含了起始时点。 + +就算没有显式指定,`Timestamp` 与`datetime` 也支持 `hours`、`minutes`、`seconds`,默认值为 0。 + +```python +In [134]: dft[datetime.datetime(2013, 1, 1):datetime.datetime(2013, 2, 28)] +Out[134]: + A +2013-01-01 00:00:00 0.276232 +2013-01-01 00:01:00 -1.087401 +2013-01-01 00:02:00 -0.673690 +2013-01-01 00:03:00 0.113648 +2013-01-01 00:04:00 -1.478427 +... ... +2013-02-27 23:56:00 1.197749 +2013-02-27 23:57:00 0.720521 +2013-02-27 23:58:00 -0.072718 +2013-02-27 23:59:00 -0.681192 +2013-02-28 00:00:00 -0.557501 + +[83521 rows x 1 columns] +``` + +不用默认值。 + +```python +In [135]: dft[datetime.datetime(2013, 1, 1, 10, 12, 0): + .....: datetime.datetime(2013, 2, 28, 10, 12, 0)] + .....: +Out[135]: + A +2013-01-01 10:12:00 0.565375 +2013-01-01 10:13:00 0.068184 +2013-01-01 10:14:00 0.788871 +2013-01-01 10:15:00 -0.280343 +2013-01-01 10:16:00 0.931536 +... ... +2013-02-28 10:08:00 0.148098 +2013-02-28 10:09:00 -0.388138 +2013-02-28 10:10:00 0.139348 +2013-02-28 10:11:00 0.085288 +2013-02-28 10:12:00 0.950146 + +[83521 rows x 1 columns] +``` + +### 截取与花式索引 + +[`truncate()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.truncate.html#pandas.DataFrame.truncate "pandas.DataFrame.truncate") 便捷函数与切片类似。注意,与切片返回的是部分匹配日期不同, `truncate` 假设 `DatetimeIndex` 里未标明时间组件的值为 0。 + +```python +In [136]: rng2 = pd.date_range('2011-01-01', '2012-01-01', freq='W') + +In [137]: ts2 = pd.Series(np.random.randn(len(rng2)), index=rng2) + +In [138]: ts2.truncate(before='2011-11', after='2011-12') +Out[138]: +2011-11-06 0.437823 +2011-11-13 -0.293083 +2011-11-20 -0.059881 +2011-11-27 1.252450 +Freq: W-SUN, dtype: float64 + +In [139]: ts2['2011-11':'2011-12'] +Out[139]: +2011-11-06 0.437823 +2011-11-13 -0.293083 +2011-11-20 -0.059881 +2011-11-27 1.252450 +2011-12-04 0.046611 +2011-12-11 0.059478 +2011-12-18 -0.286539 +2011-12-25 0.841669 +Freq: W-SUN, dtype: float64 +``` + +花式索引返回的是 `DatetimeIndex`, 但因为打乱了 `DatetimeIndex` 的频率,所以频率信息没有了,见 `freq=None`: + +```python +In [140]: ts2[[0, 2, 6]].index +Out[140]: DatetimeIndex(['2011-01-02', '2011-01-16', '2011-02-13'], dtype='datetime64[ns]', freq=None) +``` + +## 日期/时间组件 + +以下日期/时间属性可以访问 `Timestamp` 或 `DatetimeIndex`。 + +| 属性 | 说明 | +| :---------------: | :----------------------------------------------------: | +| year | datetime 的年 | +| month | datetime 的月 | +| day | datetime 的日 | +| hour | datetime 的小时 | +| minute | datetime 的分钟 | +| second | datetime 的秒 | +| microsecond | datetime 的微秒 | +| nanosecond | datetime 的纳秒 | +| date | 返回 datetime.date(不包含时区信息) | +| time | 返回 datetime.time(不包含时区信息) | +| timetz | 返回带本地时区信息的 datetime.time | +| dayofyear | 一年里的第几天 | +| weekofyear | 一年里的第几周 | +| week | 一年里的第几周 | +| dayofweek | 一周里的第几天,Monday=0, Sunday=6 | +| weekday | 一周里的第几天,Monday=0, Sunday=6 | +| weekday_name | 这一天是星期几 (如,Friday) | +| quarter | 日期所处的季节:Jan-Mar = 1,Apr-Jun = 2 等 | +| days_in_month | 日期所在的月有多少天 | +| is_month_start | 逻辑判断是不是月初(由频率定义) | +| is_month_end | 逻辑判断是不是月末(由频率定义) | +| is_quarter_start | 逻辑判断是不是季初(由频率定义) | +| is_quarter_end | 逻辑判断是不是季末(由频率定义) | +| is_year_start | 逻辑判断是不是年初(由频率定义) | +| is_year_end | 逻辑判断是不是年末(由频率定义) | +| is_leap_year | 逻辑判断是不是日期所在年是不是闰年 | + +参照 [.dt 访问器](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dt-accessors) 一节介绍的知识点,`Series` 的值为 `datetime` 时,还可以用 `.dt` 访问这些属性。 + +## DateOffset 对象 + +上例中,频率字符串(如,`D`)用于定义指定的频率: + +* 用 [`date_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html#pandas.date_range "pandas.date_range") 按指定频率分隔 [`DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex "pandas.DatetimeIndex")` 里的日期与时间 + +* [`Period`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Period.html#pandas.Period "pandas.Period") 或 [`PeriodIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.PeriodIndex.html#pandas.PeriodIndex "pandas.PeriodIndex") 的频率 + +频率字符串表示的是 `DateOffset` 对象及其子类。`DateOffset` 类似于时间差 [`Timedelta`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html#pandas.Timedelta "pandas.Timedelta") ,但遵循指定的日历日规则。例如,[`Timedelta`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html#pandas.Timedelta "pandas.Timedelta") 表示的每日时间差一直都是 24 小时,而 `DateOffset` 的每日偏移量则是与下一天相同的时间差,使用夏时制时,每日偏移时间有可能是 23 或 24 小时,甚至还有可能是 25 小时。不过,`DateOffset` 子类只能是等于或小于**小时**的时间单位(`Hour`、`Minute`、`Second`、`Milli`、`Micro`、`Nano`),操作类似于 [`Timedelta`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html#pandas.Timedelta "pandas.Timedelta") 及对应的绝对时间。 + +`DateOffset` 基础操作类似于 `dateutil.relativedelta`([relativedelta 文档](https://dateutil.readthedocs.io/en/stable/relativedelta.html)),可按指定的日历日时间段偏移日期时间。可用算数运算符(+)或 `apply` 方法执行日期偏移操作。 + +```python +# 指定包含夏时制变迁的某天 +In [141]: ts = pd.Timestamp('2016-10-30 00:00:00', tz='Europe/Helsinki') + +# 对应的绝对时间 +In [142]: ts + pd.Timedelta(days=1) +Out[142]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki') + +# 对应的日历时间 +In [143]: ts + pd.DateOffset(days=1) +Out[143]: Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki') + +In [144]: friday = pd.Timestamp('2018-01-05') + +In [145]: friday.day_name() +Out[145]: 'Friday' + +# 与两个工作日相加(星期五 --> 星期二) +In [146]: two_business_days = 2 * pd.offsets.BDay() + +In [147]: two_business_days.apply(friday) +Out[147]: Timestamp('2018-01-09 00:00:00') + +In [148]: friday + two_business_days +Out[148]: Timestamp('2018-01-09 00:00:00') + +In [149]: (friday + two_business_days).day_name() +Out[149]: 'Tuesday' +``` + +大多数 `DateOffset` 都支持频率字符串或偏移别名,可用作 `freq` 关键字参数。有效的日期偏移及频率字符串如下: + +| 日期偏移量 | 频率字符串 | 说明 | +| :-----------------------------------------------------------: | :----------------: | :-----------------------------------------: | +| [`DateOffset`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.DateOffset.html#pandas.tseries.offsets.DateOffset) | 无 | 通用偏移类,默认为一个日历日 | +| [`BDay`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BDay.html#pandas.tseries.offsets.BDay) 或 [`BusinessDay`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BusinessDay.html#pandas.tseries.offsets.BusinessDay) | `'B'` | 工作日 | +| [`CDay`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.CDay.html#pandas.tseries.offsets.CDay) 或 [`CustomBusinessDay`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.CustomBusinessDay.html#pandas.tseries.offsets.CustomBusinessDay) | `'C'` | 自定义工作日 | +| [`Week`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Week.html#pandas.tseries.offsets.Week) | `'W'` | 一周,可选周内固定某日 | +| [`WeekOfMonth`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.WeekOfMonth.html#pandas.tseries.offsets.WeekOfMonth) | `'WOM'` | 每月第几周的第几天 | +| [`LastWeekOfMonth`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.LastWeekOfMonth.html#pandas.tseries.offsets.LastWeekOfMonth) | `'LWOM'` | 每月最后一周的第几天 | +| [`MonthEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.MonthEnd.html#pandas.tseries.offsets.MonthEnd) | `'M'` | 日历日月末 | +| [`MonthBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.MonthBegin.html#pandas.tseries.offsets.MonthBegin) | `'MS'` | 日历日月初 | +| [`BMonthEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BMonthEnd.html#pandas.tseries.offsets.BMonthEnd) 或 [`BusinessMonthEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BusinessMonthEnd.html#pandas.tseries.offsets.BusinessMonthEnd) | `'BM'` | 工作日月末 | +| [`BMonthBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BMonthBegin.html#pandas.tseries.offsets.BMonthBegin) 或 [`BusinessMonthBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BusinessMonthBegin.html#pandas.tseries.offsets.BusinessMonthBegin) | `'BMS'` | 工作日月初 | +| [`CBMonthEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.CBMonthEnd.html#pandas.tseries.offsets.CBMonthEnd) 或 [`CustomBusinessMonthEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.CustomBusinessMonthEnd.html#pandas.tseries.offsets.CustomBusinessMonthEnd) | `'CBM'` | 自定义工作日月末 | +| [`CBMonthBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.CBMonthBegin.html#pandas.tseries.offsets.CBMonthBegin) 或 [`CustomBusinessMonthBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.CustomBusinessMonthBegin.html#pandas.tseries.offsets.CustomBusinessMonthBegin) | `'CBMS'` | 自定义工作日月初 | +| [`SemiMonthEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.SemiMonthEnd.html#pandas.tseries.offsets.SemiMonthEnd) | `'SM'` | 某月第 15 天(或其它半数日期)与日历日月末 | +| [`SemiMonthBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.SemiMonthBegin.html#pandas.tseries.offsets.SemiMonthBegin) | `'SMS'` | 日历日月初与第 15 天(或其它半数日期) | +| [`QuarterEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.QuarterEnd.html#pandas.tseries.offsets.QuarterEnd) | `'Q'` | 日历日季末 | +| [`QuarterBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.QuarterBegin.html#pandas.tseries.offsets.QuarterBegin) | `'QS'` | 日历日季初 | +| [`BQuarterEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BQuarterEnd.html#pandas.tseries.offsets.BQuarterEnd) | `'BQ` | 工作日季末 | +| [`BQuarterBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BQuarterBegin.html#pandas.tseries.offsets.BQuarterBegin) | `'BQS'` | 工作日季初 | +| [`FY5253Quarter`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.FY5253Quarter.html#pandas.tseries.offsets.FY5253Quarter) | `'REQ'` | 零售季,又名 52-53 周 | +| [`YearEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.YearEnd.html#pandas.tseries.offsets.YearEnd) | `'A'` | 日历日年末 | +| [`YearBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.YearBegin.html#pandas.tseries.offsets.YearBegin) | `'AS'` 或 `'BYS'` | 日历日年初 | +| [`BYearEnd`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BYearEnd.html#pandas.tseries.offsets.BYearEnd) | `'BA'` | 工作日年末 | +| [`BYearBegin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BYearBegin.html#pandas.tseries.offsets.BYearBegin) | `'BAS'` | 工作日年初 | +| [`FY5253`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.FY5253.html#pandas.tseries.offsets.FY5253) | `'RE'` | 零售年(又名 52-53 周) | +| [`Easter`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Easter.html#pandas.tseries.offsets.Easter) | 无 | 复活节假日 | +| [`BusinessHour`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BusinessHour.html#pandas.tseries.offsets.BusinessHour) | `'BH'` | 工作小时 | +| [`CustomBusinessHour`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.CustomBusinessHour.html#pandas.tseries.offsets.CustomBusinessHour) | `'CBH'` | 自定义工作小时 | +| [`Day`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Day.html#pandas.tseries.offsets.Day) | `'D'` | 一天 | +| [`Hour`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Hour.html#pandas.tseries.offsets.Hour) | `'H'` | 一小时 | +| [`Minute`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Minute.html#pandas.tseries.offsets.Minute) | `'T'` 或 `'min'` | 一分钟 | +| [`Second`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Second.html#pandas.tseries.offsets.Second) | `'S'` | 一秒 | +| [`Milli`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Milli.html#pandas.tseries.offsets.Milli) | `'L'` 或 `'ms'` | 一毫秒 | +| [`Micro`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Micro.html#pandas.tseries.offsets.Micro) | `'U'` 或 `'us'` | 一微秒 | +| [`Nano`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.Nano.html#pandas.tseries.offsets.Nano) | `'N'` | 一纳秒 | + +`DateOffset` 还支持 `rollforward()` 与 `rollback()` 方法,按偏移量把某一日期**向前**或**向后**移动至有效偏移日期。例如,工作日偏移滚动日期时会跳过周末(即,星期六与星期日),直接到星期一,因为工作日偏移针对的是工作日。 + +```python +In [150]: ts = pd.Timestamp('2018-01-06 00:00:00') + +In [151]: ts.day_name() +Out[151]: 'Saturday' + +# 工作时间的有效偏移日期为星期一至星期五 +In [152]: offset = pd.offsets.BusinessHour(start='09:00') + +# 向前偏移到最近的工作日,即星期一 +In [153]: offset.rollforward(ts) +Out[153]: Timestamp('2018-01-08 09:00:00') + +# 向前偏移至最近的工作日,同时,小时也相应增加了 +In [154]: ts + offset +Out[154]: Timestamp('2018-01-08 10:00:00') +``` + +这些操作默认保存时间(小时、分钟等)信息。`normalize()` 可以把时间重置为午夜零点,是否应用此操作,取决于是否需要保留时间信息。 + +```python +In [155]: ts = pd.Timestamp('2014-01-01 09:00') + +In [156]: day = pd.offsets.Day() + +In [157]: day.apply(ts) +Out[157]: Timestamp('2014-01-02 09:00:00') + +In [158]: day.apply(ts).normalize() +Out[158]: Timestamp('2014-01-02 00:00:00') + +In [159]: ts = pd.Timestamp('2014-01-01 22:00') + +In [160]: hour = pd.offsets.Hour() + +In [161]: hour.apply(ts) +Out[161]: Timestamp('2014-01-01 23:00:00') + +In [162]: hour.apply(ts).normalize() +Out[162]: Timestamp('2014-01-01 00:00:00') + +In [163]: hour.apply(pd.Timestamp("2014-01-01 23:30")).normalize() +Out[163]: Timestamp('2014-01-02 00:00:00') +``` + +### 参数偏移 + +偏移量支持参数,可以让不同操作生成不同结果。例如,`Week` 偏移生成每周数据时支持 `weekday` 参数,生成日期始终位于一周中的指定日期。 + +```python +In [164]: d = datetime.datetime(2008, 8, 18, 9, 0) + +In [165]: d +Out[165]: datetime.datetime(2008, 8, 18, 9, 0) + +In [166]: d + pd.offsets.Week() +Out[166]: Timestamp('2008-08-25 09:00:00') + +In [167]: d + pd.offsets.Week(weekday=4) +Out[167]: Timestamp('2008-08-22 09:00:00') + +In [168]: (d + pd.offsets.Week(weekday=4)).weekday() +Out[168]: 4 + +In [169]: d - pd.offsets.Week() +Out[169]: Timestamp('2008-08-11 09:00:00') +``` + +加减法也支持 `normalize` 选项。 + +```python +In [170]: d + pd.offsets.Week(normalize=True) +Out[170]: Timestamp('2008-08-25 00:00:00') + +In [171]: d - pd.offsets.Week(normalize=True) +Out[171]: Timestamp('2008-08-11 00:00:00') +``` + +`YearEnd` 也支持参数,如 `month` 参数,用于指定月份 。 + +```python +In [172]: d + pd.offsets.YearEnd() +Out[172]: Timestamp('2008-12-31 09:00:00') + +In [173]: d + pd.offsets.YearEnd(month=6) +Out[173]: Timestamp('2009-06-30 09:00:00') +``` + +### `Series` 与 `DatetimeIndex` 偏移 + +可以为 `Series` 或 `DatetimeIndex` 里的每个元素应用偏移。 + +```python +In [174]: rng = pd.date_range('2012-01-01', '2012-01-03') + +In [175]: s = pd.Series(rng) + +In [176]: rng +Out[176]: DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], dtype='datetime64[ns]', freq='D') + +In [177]: rng + pd.DateOffset(months=2) +Out[177]: DatetimeIndex(['2012-03-01', '2012-03-02', '2012-03-03'], dtype='datetime64[ns]', freq='D') + +In [178]: s + pd.DateOffset(months=2) +Out[178]: +0 2012-03-01 +1 2012-03-02 +2 2012-03-03 +dtype: datetime64[ns] + +In [179]: s - pd.DateOffset(months=2) +Out[179]: +0 2011-11-01 +1 2011-11-02 +2 2011-11-03 +dtype: datetime64[ns] +``` + +如果偏移直接映射 `Timedelta` (`Day`、`Hour`、`Minute`、`Second`、`Micro`、`Milli`、`Nano`),则该偏移与 `Timedelta` 的使用方式完全一样。参阅[时间差 - Timedelta](https://pandas.pydata.org/pandas-docs/stable/user_guide/timedeltas.html#timedeltas-operations),查看更多示例。 + +```python +In [180]: s - pd.offsets.Day(2) +Out[180]: +0 2011-12-30 +1 2011-12-31 +2 2012-01-01 +dtype: datetime64[ns] + +In [181]: td = s - pd.Series(pd.date_range('2011-12-29', '2011-12-31')) + +In [182]: td +Out[182]: +0 3 days +1 3 days +2 3 days +dtype: timedelta64[ns] + +In [183]: td + pd.offsets.Minute(15) +Out[183]: +0 3 days 00:15:00 +1 3 days 00:15:00 +2 3 days 00:15:00 +dtype: timedelta64[ns] +``` + +注意,某些偏移量(如 `BQuarterEnd`)不支持矢量操作,即使可以执行运算,速度也非常慢,并可能显示 `PerformanceWaring`(性能警告)。 + +```python +In [184]: rng + pd.offsets.BQuarterEnd() +Out[184]: DatetimeIndex(['2012-03-30', '2012-03-30', '2012-03-30'], dtype='datetime64[ns]', freq='D') +``` + +### 自定义工作日 + +`Cday` 或 `CustomBusinessDay` 类可以参数化 `BusinessDay` 类,用于创建支持本地周末与传统节假日的自定义工作日历。 + +下面这个例子就很有意思,知道吗?埃及的周末是星期五与星期六。 + +```python +In [185]: weekmask_egypt = 'Sun Mon Tue Wed Thu' + + +# 下面是 2012 - 2014 年的五一劳动节 +In [186]: holidays = ['2012-05-01', + .....: datetime.datetime(2013, 5, 1), + .....: np.datetime64('2014-05-01')] + .....: + +In [187]: bday_egypt = pd.offsets.CustomBusinessDay(holidays=holidays, + .....: weekmask=weekmask_egypt) + .....: + +In [188]: dt = datetime.datetime(2013, 4, 30) + +In [189]: dt + 2 * bday_egypt +Out[189]: Timestamp('2013-05-05 00:00:00') +``` + +下列代码实现了日期与工作日之间的映射关系。 + +```python +In [190]: dts = pd.date_range(dt, periods=5, freq=bday_egypt) + +In [191]: pd.Series(dts.weekday, dts).map( + .....: pd.Series('Mon Tue Wed Thu Fri Sat Sun'.split())) + .....: +Out[191]: +2013-04-30 Tue +2013-05-02 Thu +2013-05-05 Sun +2013-05-06 Mon +2013-05-07 Tue +Freq: C, dtype: object +``` + +节日日历支持节假日列表。更多信息,请参阅[节日日历](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-holiday)文档。 + +```python +In [192]: from pandas.tseries.holiday import USFederalHolidayCalendar + +In [193]: bday_us = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar()) + +# 马丁路德金纪念日前的星期五 +In [194]: dt = datetime.datetime(2014, 1, 17) + +# 马丁路德金纪念日后的星期二,因为星期一放假,所以跳过了 +In [195]: dt + bday_us +Out[195]: Timestamp('2014-01-21 00:00:00') +``` + +遵循节日日历规则的月偏移可以用正常方式定义。 + +```python +In [196]: bmth_us = pd.offsets.CustomBusinessMonthBegin( + .....: calendar=USFederalHolidayCalendar()) + .....: + +# 跳过新年 +In [197]: dt = datetime.datetime(2013, 12, 17) + +In [198]: dt + bmth_us +Out[198]: Timestamp('2014-01-02 00:00:00') + +# 定义带自定义偏移的日期索引 +In [199]: pd.date_range(start='20100101', end='20120101', freq=bmth_us) +Out[199]: +DatetimeIndex(['2010-01-04', '2010-02-01', '2010-03-01', '2010-04-01', + '2010-05-03', '2010-06-01', '2010-07-01', '2010-08-02', + '2010-09-01', '2010-10-01', '2010-11-01', '2010-12-01', + '2011-01-03', '2011-02-01', '2011-03-01', '2011-04-01', + '2011-05-02', '2011-06-01', '2011-07-01', '2011-08-01', + '2011-09-01', '2011-10-03', '2011-11-01', '2011-12-01'], + dtype='datetime64[ns]', freq='CBMS') +``` + +::: tip 注意 + +频率字符串 'C' 验证 `CustomBusinessDay` 日期偏移 调用,注意,`CustomBusinessDay` 可实现参数化,`CustomBusinessDay` 实例会各不相同,且频率字符串 'C' 无法识别这个问题。用户应确保应用里调用的频率字符串 'C' 的一致性 。 + +### 工作时间 + +`BusinessHour` 表示 `BusinessDay` 基础上的工作时间,用于指定开始与结束工作时间。 + +`BusinessHour` 默认的工作时间是 9:00 - 17:00。`BusinessHour` 加法以小时频率增加 `Timestamp` 。如果目标 `Timestamp` 超出了一小时,则要先移动到下一个工作小时,再行增加。如果超过了当日工作时间的范围,剩下的时间则添加到下一个工作日。 + +```python +In [200]: bh = pd.offsets.BusinessHour() + +In [201]: bh +Out[201]: + +# 2014 年 8 月 1 日是星期五 +In [202]: pd.Timestamp('2014-08-01 10:00').weekday() +Out[202]: 4 + +In [203]: pd.Timestamp('2014-08-01 10:00') + bh +Out[203]: Timestamp('2014-08-01 11:00:00') + +# 下例等同于: pd.Timestamp('2014-08-01 09:00') + bh +In [204]: pd.Timestamp('2014-08-01 08:00') + bh +Out[204]: Timestamp('2014-08-01 10:00:00') + +# 如果计算结果为当日下班时间,则转移到下一个工作日的上班时间 +In [205]: pd.Timestamp('2014-08-01 16:00') + bh +Out[205]: Timestamp('2014-08-04 09:00:00') + +# 剩下的时间也会添加到下一天 +In [206]: pd.Timestamp('2014-08-01 16:30') + bh +Out[206]: Timestamp('2014-08-04 09:30:00') + +# 添加 2 个工作小时 +In [207]: pd.Timestamp('2014-08-01 10:00') + pd.offsets.BusinessHour(2) +Out[207]: Timestamp('2014-08-01 12:00:00') + +# 减掉 3 个工作小时 +In [208]: pd.Timestamp('2014-08-01 10:00') + pd.offsets.BusinessHour(-3) +Out[208]: Timestamp('2014-07-31 15:00:00') +``` + +还可以用关键字指定 `start` 与 `end` 时间。参数必须是`hour:minute` 格式的字符串或 `datetime.time` 实例。把秒、微秒、纳秒设置为工作时间会导致 `ValueError`。 + +```python +In [209]: bh = pd.offsets.BusinessHour(start='11:00', end=datetime.time(20, 0)) + +In [210]: bh +Out[210]: + +In [211]: pd.Timestamp('2014-08-01 13:00') + bh +Out[211]: Timestamp('2014-08-01 14:00:00') + +In [212]: pd.Timestamp('2014-08-01 09:00') + bh +Out[212]: Timestamp('2014-08-01 12:00:00') + +In [213]: pd.Timestamp('2014-08-01 18:00') + bh +Out[213]: Timestamp('2014-08-01 19:00:00') +``` + +`start` 时间晚于 `end` 时间表示夜班工作时间。此时,工作时间将从午夜延至第二天。工作时间是否有效取决于该时间是否开始于有效的 `BusinessDay`。 + +```python +In [214]: bh = pd.offsets.BusinessHour(start='17:00', end='09:00') + +In [215]: bh +Out[215]: + +In [216]: pd.Timestamp('2014-08-01 17:00') + bh +Out[216]: Timestamp('2014-08-01 18:00:00') + +In [217]: pd.Timestamp('2014-08-01 23:00') + bh +Out[217]: Timestamp('2014-08-02 00:00:00') + +# 虽然 2014 年 8 月 2 日是星期六, +# 但因为工作时间开始于星期五,因此,也是有效的 +In [218]: pd.Timestamp('2014-08-02 04:00') + bh +Out[218]: Timestamp('2014-08-02 05:00:00') + + +# 虽然 2014 年 8 月 4 日是星期一, +# 但开始时间是星期日,因此,超出了工作时间 +In [219]: pd.Timestamp('2014-08-04 04:00') + bh +Out[219]: Timestamp('2014-08-04 18:00:00') +``` + +`BusinessHour.rollforward` 与 `rollback` 操作将前滚至下一天的上班时间,或回滚至前一天的下班时间。与其它偏移量不同,`BusinessHour.rollforward` 输出与 `apply` 定义不同的结果。 + +这是因为一天工作时间的结束等同于第二天工作时间的开始。默认情况下,工作时间为 9:00 - 17:00,pandas 认为 `2014-08-01 17:00` 与 `2014-08-04 09:00` 之间的时间间隔为 0 分钟。 + +```python +# 把时间戳回滚到前一天的下班时间 +In [220]: pd.offsets.BusinessHour().rollback(pd.Timestamp('2014-08-02 15:00')) +Out[220]: Timestamp('2014-08-01 17:00:00') + +# 把时间戳前滚到下一个工作日的上班时间 +In [221]: pd.offsets.BusinessHour().rollforward(pd.Timestamp('2014-08-02 15:00')) +Out[221]: Timestamp('2014-08-04 09:00:00') + +# 等同于:BusinessHour().apply(pd.Timestamp('2014-08-01 17:00')) +# 与 BusinessHour().apply(pd.Timestamp('2014-08-04 09:00')) +In [222]: pd.offsets.BusinessHour().apply(pd.Timestamp('2014-08-02 15:00')) +Out[222]: Timestamp('2014-08-04 10:00:00') + +# 工作日的结果(仅供参考) +In [223]: pd.offsets.BusinessHour().rollforward(pd.Timestamp('2014-08-02')) +Out[223]: Timestamp('2014-08-04 09:00:00') + +# 等同于 BusinessDay().apply(pd.Timestamp('2014-08-01')) +# 等同于 rollforward 因为工作日不会重叠 +In [224]: pd.offsets.BusinessHour().apply(pd.Timestamp('2014-08-02')) +Out[224]: Timestamp('2014-08-04 10:00:00') +``` + +`BusinessHour` 把星期六与星期日当成假日。`CustomBusinessHour` 可以把节假日设为工作时间,详见下文。 + +### 自定义工作时间 + +*0.18.1 版新增*。 + +`CustomBusinessHour` 是 `BusinessHour` 和 `CustomBusinessDay` 的混合体,可以指定任意节假日。除了跳过自定义节假日之外,`CustomBusinessHour` 的运作方式与 `BusinessHour` 一样。 + +```python +In [225]: from pandas.tseries.holiday import USFederalHolidayCalendar + +In [226]: bhour_us = pd.offsets.CustomBusinessHour(calendar=USFederalHolidayCalendar()) + +# 马丁路德金纪念日之前的星期五 +In [227]: dt = datetime.datetime(2014, 1, 17, 15) + +In [228]: dt + bhour_us +Out[228]: Timestamp('2014-01-17 16:00:00') + +# 跳至马丁路德金纪念日之后的星期二,星期一过节,所以跳过了 +In [229]: dt + bhour_us * 2 +Out[229]: Timestamp('2014-01-21 09:00:00') +``` + +`BusinessHour` 支持与 `CustomBusinessDay` 一样的关键字参数。 + +```python +In [230]: bhour_mon = pd.offsets.CustomBusinessHour(start='10:00', + .....: weekmask='Tue Wed Thu Fri') + .....: + +# 跳过了星期一,因为星期一过节,工作时间从 10 点开始 +In [231]: dt + bhour_mon * 2 +Out[231]: Timestamp('2014-01-21 10:00:00') +``` + +### 偏移量别名 + +时间序列频率的字符串别名在这里叫**偏移量别名**。 + +| 别名 | 说明 | +| :-------: | :------------------------- | +| B | 工作日频率 | +| C | 自定义工作日频率 | +| D | 日历日频率 | +| W | 周频率 | +| M | 月末频率 | +| SM | 半月末频率(15 号与月末) | +| BM | 工作日月末频率 | +| CBM | 自定义工作日月末频率 | +| MS | 月初频率 | +| SMS | 半月初频率(1 号与 15 号) | +| BMS | 工作日月初频率 | +| CBMS | 自定义工作日月初频率 | +| Q | 季末频率 | +| BQ | 工作日季末频率 | +| QS | 季初频率 | +| BQS | 工作日季初频率 | +| A, Y | 年末频率 | +| BA, BY | 工作日年末频率 | +| AS, YS | 年初频率 | +| BAS, BYS | 工作日年初频率 | +| BH | 工作时间频率 | +| H | 小时频率 | +| T, min | 分钟频率 | +| S | 秒频率 | +| L, ms | 毫秒 | +| U, us | 微秒 | +| N | 纳秒 | + +### 别名组合 + +如前说述,别名与偏移量实例在绝大多数函数里可以互换: + +```python +In [232]: pd.date_range(start, periods=5, freq='B') +Out[232]: +DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', + '2011-01-07'], + dtype='datetime64[ns]', freq='B') + +In [233]: pd.date_range(start, periods=5, freq=pd.offsets.BDay()) +Out[233]: +DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', + '2011-01-07'], + dtype='datetime64[ns]', freq='B') +``` + +可以组合日与当日偏移量。 + +```python +In [234]: pd.date_range(start, periods=10, freq='2h20min') +Out[234]: +DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 02:20:00', + '2011-01-01 04:40:00', '2011-01-01 07:00:00', + '2011-01-01 09:20:00', '2011-01-01 11:40:00', + '2011-01-01 14:00:00', '2011-01-01 16:20:00', + '2011-01-01 18:40:00', '2011-01-01 21:00:00'], + dtype='datetime64[ns]', freq='140T') + +In [235]: pd.date_range(start, periods=10, freq='1D10U') +Out[235]: +DatetimeIndex([ '2011-01-01 00:00:00', '2011-01-02 00:00:00.000010', + '2011-01-03 00:00:00.000020', '2011-01-04 00:00:00.000030', + '2011-01-05 00:00:00.000040', '2011-01-06 00:00:00.000050', + '2011-01-07 00:00:00.000060', '2011-01-08 00:00:00.000070', + '2011-01-09 00:00:00.000080', '2011-01-10 00:00:00.000090'], + dtype='datetime64[ns]', freq='86400000010U') +``` + +### 锚定偏移量 + +可以指定某些频率的锚定后缀: + +| 别名 | 说明 | +| :----------: | :------------------------------------ | +| W-SUN | 周频率(星期日),与 “W” 相同 | +| W-MON | 周频率(星期一) | +| W-TUE | 周频率(星期二) | +| W-WED | 周频率(星期三) | +| W-THU | 周频率(星期四) | +| W-FRI | 周频率(星期五) | +| W-SAT | 周频率(星期六) | +| (B)Q(S)-DEC | 季频率,该年结束于十二月,与 “Q” 相同 | +| (B)Q(S)-JAN | 季频率,该年结束于一月 | +| (B)Q(S)-FEB | 季频率,该年结束于二月 | +| (B)Q(S)-MAR | 季频率,该年结束于三月 | +| (B)Q(S)-APR | 季频率,该年结束于四月 | +| (B)Q(S)-MAY | 季频率,该年结束于五月 | +| (B)Q(S)-JUN | 季频率,该年结束于六月 | +| (B)Q(S)-JUL | 季频率,该年结束于七月 | +| (B)Q(S)-AUG | 季频率,该年结束于八月 | +| (B)Q(S)-SEP | 季频率,该年结束于九月 | +| (B)Q(S)-OCT | 季频率,该年结束于十月 | +| (B)Q(S)-NOV | 季频率,该年结束于十一月 | +| (B)A(S)-DEC | 年频率,锚定结束于十二月,与 “A” 相同 | +| (B)A(S)-JAN | 年频率,锚定结束于一月 | +| (B)A(S)-FEB | 年频率,锚定结束于二月 | +| (B)A(S)-MAR | 年频率,锚定结束于三月 | +| (B)A(S)-APR | 年频率,锚定结束于四月 | +| (B)A(S)-MAY | 年频率,锚定结束于五月 | +| (B)A(S)-JUN | 年频率,锚定结束于六月 | +| (B)A(S)-JUL | 年频率,锚定结束于七月 | +| (B)A(S)-AUG | 年频率,锚定结束于八月 | +| (B)A(S)-SEP | 年频率,锚定结束于九月 | +| (B)A(S)-OCT | 年频率,锚定结束于十月 | +| (B)A(S)-NOV | 年频率,锚定结束于十一月 | + +这些别名可以用作 `date_range`、`bdate_range` 、`DatetimeIndex` 及其它时间序列函数的参数。 + +### 锚定偏移量的含义 + +对于偏移量锚定于开始或结束指定频率(`MonthEnd`、`MonthBegin`、`WeekEnd` 等)下列规则应用于前滚与后滚。 + +`n` 不为 0 时,如果给定日期不是锚定日期,将寻找下一个或上一个锚点,并向前或向后移动 `|n|-1 ` 步。 + +```python +In [236]: pd.Timestamp('2014-01-02') + pd.offsets.MonthBegin(n=1) +Out[236]: Timestamp('2014-02-01 00:00:00') + +In [237]: pd.Timestamp('2014-01-02') + pd.offsets.MonthEnd(n=1) +Out[237]: Timestamp('2014-01-31 00:00:00') + +In [238]: pd.Timestamp('2014-01-02') - pd.offsets.MonthBegin(n=1) +Out[238]: Timestamp('2014-01-01 00:00:00') + +In [239]: pd.Timestamp('2014-01-02') - pd.offsets.MonthEnd(n=1) +Out[239]: Timestamp('2013-12-31 00:00:00') + +In [240]: pd.Timestamp('2014-01-02') + pd.offsets.MonthBegin(n=4) +Out[240]: Timestamp('2014-05-01 00:00:00') + +In [241]: pd.Timestamp('2014-01-02') - pd.offsets.MonthBegin(n=4) +Out[241]: Timestamp('2013-10-01 00:00:00') +``` + +如果给定日期是锚定日期,则向前(或向后)移动 `|n|` 个点。 + +```python +In [242]: pd.Timestamp('2014-01-01') + pd.offsets.MonthBegin(n=1) +Out[242]: Timestamp('2014-02-01 00:00:00') + +In [243]: pd.Timestamp('2014-01-31') + pd.offsets.MonthEnd(n=1) +Out[243]: Timestamp('2014-02-28 00:00:00') + +In [244]: pd.Timestamp('2014-01-01') - pd.offsets.MonthBegin(n=1) +Out[244]: Timestamp('2013-12-01 00:00:00') + +In [245]: pd.Timestamp('2014-01-31') - pd.offsets.MonthEnd(n=1) +Out[245]: Timestamp('2013-12-31 00:00:00') + +In [246]: pd.Timestamp('2014-01-01') + pd.offsets.MonthBegin(n=4) +Out[246]: Timestamp('2014-05-01 00:00:00') + +In [247]: pd.Timestamp('2014-01-31') - pd.offsets.MonthBegin(n=4) +Out[247]: Timestamp('2013-10-01 00:00:00') +``` + +`n=0` 时,如果日期在锚点,则不移动,否则将前滚至下一个锚点。 + +```python +In [248]: pd.Timestamp('2014-01-02') + pd.offsets.MonthBegin(n=0) +Out[248]: Timestamp('2014-02-01 00:00:00') + +In [249]: pd.Timestamp('2014-01-02') + pd.offsets.MonthEnd(n=0) +Out[249]: Timestamp('2014-01-31 00:00:00') + +In [250]: pd.Timestamp('2014-01-01') + pd.offsets.MonthBegin(n=0) +Out[250]: Timestamp('2014-01-01 00:00:00') + +In [251]: pd.Timestamp('2014-01-31') + pd.offsets.MonthEnd(n=0) +Out[251]: Timestamp('2014-01-31 00:00:00') +``` + +### 假日与节日日历 + +用假日与日历可以轻松定义 `CustomBusinessDay` 假日规则,或其它分析所需的预设假日。`AbstractHolidayCalendar` 类支持所有返回假日列表的方法,并且仅需在指定假日日历类里定义 `rules` 。`start_date` 与 `end_date` 类属性决定了假日的范围。该操作会覆盖 `AbstractHolidayCalendar` 类,适用于所有日历子类。`USFederalHolidayCalendar` 是仅有的假日日历,主要用作开发其它日历的示例。 + +固定日期的假日,如美国阵亡将士纪念日或美国国庆日(7 月 4 日),取决于该假日是否是在周末,可以使用以下规则: + +| 规则 | 说明 | +| :---------------------: | :------------------------------------: | +| nearest_workday | 把星期六移至星期五,星期日移至星期一 | +| sunday_to_monday | 星期六紧接着星期一 | +| next_monday_or_tuesday | 把星期六移至星期一,并把星期日/星期一移至星期二 | +| previous_friday | 把星期六与星期日移至上一个星期五 | +| next_monday | 把星期六与星期日移至下一个星期一 | + +下例展示如何定义假日与假日日历: + +```python +In [252]: from pandas.tseries.holiday import Holiday, USMemorialDay,\ + .....: AbstractHolidayCalendar, nearest_workday, MO + .....: + +In [253]: class ExampleCalendar(AbstractHolidayCalendar): + .....: rules = [ + .....: USMemorialDay, + .....: Holiday('July 4th', month=7, day=4, observance=nearest_workday), + .....: Holiday('Columbus Day', month=10, day=1, + .....: offset=pd.DateOffset(weekday=MO(2)))] + .....: + +In [254]: cal = ExampleCalendar() + +In [255]: cal.holidays(datetime.datetime(2012, 1, 1), datetime.datetime(2012, 12, 31)) +Out[255]: DatetimeIndex(['2012-05-28', '2012-07-04', '2012-10-08'], dtype='datetime64[ns]', freq=None) +``` + +::: tip 提示 + +`weekday=MO(2)` 与 `2 * Week(weekday=2)` 相同。 + +::: + +用这个日历创建索引,或计算偏移量,将跳过周末与假日(如,纪念日与国庆节)。下列代码用 `ExampleCalendar` 设定自定义工作日偏移量。至于其它偏移量,可以用于创建 `DatetimeIndex` 或添加到 `datetime` 与 `Timestamp` 对象。 + +```python +In [256]: pd.date_range(start='7/1/2012', end='7/10/2012', + .....: freq=pd.offsets.CDay(calendar=cal)).to_pydatetime() + .....: +Out[256]: +array([datetime.datetime(2012, 7, 2, 0, 0), + datetime.datetime(2012, 7, 3, 0, 0), + datetime.datetime(2012, 7, 5, 0, 0), + datetime.datetime(2012, 7, 6, 0, 0), + datetime.datetime(2012, 7, 9, 0, 0), + datetime.datetime(2012, 7, 10, 0, 0)], dtype=object) + +In [257]: offset = pd.offsets.CustomBusinessDay(calendar=cal) + +In [258]: datetime.datetime(2012, 5, 25) + offset +Out[258]: Timestamp('2012-05-29 00:00:00') + +In [259]: datetime.datetime(2012, 7, 3) + offset +Out[259]: Timestamp('2012-07-05 00:00:00') + +In [260]: datetime.datetime(2012, 7, 3) + 2 * offset +Out[260]: Timestamp('2012-07-06 00:00:00') + +In [261]: datetime.datetime(2012, 7, 6) + offset +Out[261]: Timestamp('2012-07-09 00:00:00') +``` + +`AbstractHolidayCalendar` 的类属性 `start_date` 与 `end_date` 定义日期范围。默认值如下: + +```python +In [262]: AbstractHolidayCalendar.start_date +Out[262]: Timestamp('1970-01-01 00:00:00') + +In [263]: AbstractHolidayCalendar.end_date +Out[263]: Timestamp('2030-12-31 00:00:00') +``` + +这两个日期可以用 `datetime`、`Timestamp`、`字符串` 修改。 + +```python +In [264]: AbstractHolidayCalendar.start_date = datetime.datetime(2012, 1, 1) + +In [265]: AbstractHolidayCalendar.end_date = datetime.datetime(2012, 12, 31) + +In [266]: cal.holidays() +Out[266]: DatetimeIndex(['2012-05-28', '2012-07-04', '2012-10-08'], dtype='datetime64[ns]', freq=None) +``` + +`get_calender` 函数通过日历名称访问日历,返回的是日历实例。任意导入的日历都自动适用于此函数。同时,`HolidayCalendarFactory` 还提供了一个创建日历组合或含附加规则日历的简易接口。 + +```python +In [267]: from pandas.tseries.holiday import get_calendar, HolidayCalendarFactory,\ + .....: USLaborDay + .....: + +In [268]: cal = get_calendar('ExampleCalendar') + +In [269]: cal.rules +Out[269]: +[Holiday: Memorial Day (month=5, day=31, offset=), + Holiday: July 4th (month=7, day=4, observance=), + Holiday: Columbus Day (month=10, day=1, offset=)] + +In [270]: new_cal = HolidayCalendarFactory('NewExampleCalendar', cal, USLaborDay) + +In [271]: new_cal.rules +Out[271]: +[Holiday: Labor Day (month=9, day=1, offset=), + Holiday: Memorial Day (month=5, day=31, offset=), + Holiday: July 4th (month=7, day=4, observance=), + Holiday: Columbus Day (month=10, day=1, offset=)] +``` + +## 时间序列实例方法 + +### 移位与延迟 + +有时,需要整体向前或向后移动时间序列里的值,这就是移位与延迟。实现这一操作的方法是 [`shift()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.shift.html#pandas.Series.shift "pandas.Series.shift"),该方法适用于所有 pandas 对象。 + +```python +In [272]: ts = pd.Series(range(len(rng)), index=rng) + +In [273]: ts = ts[:5] + +In [274]: ts.shift(1) +Out[274]: +2012-01-01 NaN +2012-01-02 0.0 +2012-01-03 1.0 +Freq: D, dtype: float64 +``` + +`shift` 方法支持 `freq` 参数,可以把 `DateOffset`、`timedelta` 对象、[`偏移量别名`](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases) 作为参数值: + +```python +In [275]: ts.shift(5, freq=pd.offsets.BDay()) +Out[275]: +2012-01-06 0 +2012-01-09 1 +2012-01-10 2 +Freq: B, dtype: int64 + +In [276]: ts.shift(5, freq='BM') +Out[276]: +2012-05-31 0 +2012-05-31 1 +2012-05-31 2 +Freq: D, dtype: int64 +``` + +除更改数据与索引的对齐方式外,`DataFrame` 与 `Series` 对象还提供了 [`tshift()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.tshift.html#pandas.Series.tshift "pandas.Series.tshift") 便捷方法,可以指定偏移量修改索引日期。 + +```python +In [277]: ts.tshift(5, freq='D') +Out[277]: +2012-01-06 0 +2012-01-07 1 +2012-01-08 2 +Freq: D, dtype: int64 +``` + +注意,使用 `tshift()` 时,因为数据没有重对齐,` NaN ` 不会排在前面。 + +### 频率转换 + +改变频率的函数主要是 [`asfreq()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.asfreq.html#pandas.Series.asfreq "pandas.Series.asfreq")。对于 `DatetimeIndex`,这就是一个调用 [`reindex()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.reindex.html#pandas.Series.reindex "pandas.Series.reindex"),并生成 `date_range` 的便捷打包器。 + +```python +In [278]: dr = pd.date_range('1/1/2010', periods=3, freq=3 * pd.offsets.BDay()) + +In [279]: ts = pd.Series(np.random.randn(3), index=dr) + +In [280]: ts +Out[280]: +2010-01-01 1.494522 +2010-01-06 -0.778425 +2010-01-11 -0.253355 +Freq: 3B, dtype: float64 + +In [281]: ts.asfreq(pd.offsets.BDay()) +Out[281]: +2010-01-01 1.494522 +2010-01-04 NaN +2010-01-05 NaN +2010-01-06 -0.778425 +2010-01-07 NaN +2010-01-08 NaN +2010-01-11 -0.253355 +Freq: B, dtype: float64 +``` + +`asfreq` 用起来很方便,可以为频率转化后出现的任意间隔指定插值方法。 + +```python +In [282]: ts.asfreq(pd.offsets.BDay(), method='pad') +Out[282]: +2010-01-01 1.494522 +2010-01-04 1.494522 +2010-01-05 1.494522 +2010-01-06 -0.778425 +2010-01-07 -0.778425 +2010-01-08 -0.778425 +2010-01-11 -0.253355 +Freq: B, dtype: float64 +``` + +### 向前与向后填充 + +与 `asfreq` 与 `reindex` 相关的是 [`fillna()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.fillna.html#pandas.Series.fillna "pandas.Series.fillna"),有关文档请参阅[缺失值](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-fillna)。 + +### 转换 Python 日期与时间 + +用 `to_datetime` 方法可以把`DatetimeIndex` 转换为 Python 原生 [`datetime.datetime`](https://docs.python.org/3/library/datetime.html#datetime.datetime "(in Python v3.7)") 对象数组。 + +## 重采样 + +::: danger 警告 + +0.18.0 版修改了 `.resample` 接口,现在的 `.resample` 更灵活,更像 groupby。参阅[更新文档](https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.18.0.html#whatsnew-0180-breaking-resample) ,对比新旧版本操作的区别。 + +::: + +Pandas 有一个虽然简单,但却强大、高效的功能,可在频率转换时执行重采样,如,将秒数据转换为 5 分钟数据,这种操作在金融等领域里的应用非常广泛。 + +[`resample()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.resample.html#pandas.Series.resample "pandas.Series.resample") 是基于时间的分组操作,每个组都遵循归纳方法。参阅 [Cookbook 示例](https://pandas.pydata.org/pandas-docs/stable/user_guide/cookbook.html#cookbook-resample)了解高级应用。 + +从 0.18.0 版开始,`resample()` 可以直接用于 `DataFrameGroupBy` 对象,参阅 [groupby 文档](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby-transform-window-resample)。 + +::: tip 注意 + +`.resample()` 类似于基于时间偏移量的 [`rolling()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rolling.html#pandas.Series.rolling "pandas.Series.rolling") 操作,请参阅[这里](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#stats-moments-ts-versus-resampling)的讨论。 + +::: + +### 基础知识 + +```python +In [283]: rng = pd.date_range('1/1/2012', periods=100, freq='S') + +In [284]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng) + +In [285]: ts.resample('5Min').sum() +Out[285]: +2012-01-01 25103 +Freq: 5T, dtype: int64 +``` + +`resample` 函数非常灵活,可以指定多种频率转换与重采样参数。 + +任何支持[派送(dispatch)](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby-dispatch)的函数都可用于 `resample` 返回对象,包括 `sum`、`mean`、`std`、`sem`、`max`、`min`、`mid`、`median`、`first`、`last`、`ohlc`: + +```python +In [286]: ts.resample('5Min').mean() +Out[286]: +2012-01-01 251.03 +Freq: 5T, dtype: float64 + +In [287]: ts.resample('5Min').ohlc() +Out[287]: + open high low close +2012-01-01 308 460 9 205 + +In [288]: ts.resample('5Min').max() +Out[288]: +2012-01-01 460 +Freq: 5T, dtype: int64 +``` + +对于下采样,`closed` 可以设置为`left` 或 `right`,用于指定关闭哪一端间隔: + +```python +In [289]: ts.resample('5Min', closed='right').mean() +Out[289]: +2011-12-31 23:55:00 308.000000 +2012-01-01 00:00:00 250.454545 +Freq: 5T, dtype: float64 + +In [290]: ts.resample('5Min', closed='left').mean() +Out[290]: +2012-01-01 251.03 +Freq: 5T, dtype: float64 +``` + +`label`、`loffset` 等参数用于生成标签。`label` 指定生成的结果是否要为间隔标注起始时间。`loffset` 调整输出标签的时间。 + +```python +In [291]: ts.resample('5Min').mean() # 默认为 label='left' +Out[291]: +2012-01-01 251.03 +Freq: 5T, dtype: float64 + +In [292]: ts.resample('5Min', label='left').mean() +Out[292]: +2012-01-01 251.03 +Freq: 5T, dtype: float64 + +In [293]: ts.resample('5Min', label='left', loffset='1s').mean() +Out[293]: +2012-01-01 00:00:01 251.03 +dtype: float64 +``` + + +::: danger 警告 + +除了 `M`、`A`、`Q`、`BM`、`BA`、`BQ`、`W` 的默认值是 `right` 外,其它频率偏移量的 `label` 与 `closed` 默认值都是 `left`。 + +这种操作可能会导致时间回溯,即后面的时间会被拉回到前面的时间,如下例的 [`BusinessDay`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.BusinessDay.html#pandas.tseries.offsets.BusinessDay "pandas.tseries.offsets.BusinessDay") 频率所示。 + +```python +In [294]: s = pd.date_range('2000-01-01', '2000-01-05').to_series() + +In [295]: s.iloc[2] = pd.NaT + +In [296]: s.dt.weekday_name +Out[296]: +2000-01-01 Saturday +2000-01-02 Sunday +2000-01-03 NaN +2000-01-04 Tuesday +2000-01-05 Wednesday +Freq: D, dtype: object + +# 默认为:label='left', closed='left' +In [297]: s.resample('B').last().dt.weekday_name +Out[297]: +1999-12-31 Sunday +2000-01-03 NaN +2000-01-04 Tuesday +2000-01-05 Wednesday +Freq: B, dtype: object +``` + +看到了吗?星期日被拉回到了上一个星期五。要想把星期日移至星期一,改用以下代码: + +```python +In [298]: s.resample('B', label='right', closed='right').last().dt.weekday_name +Out[298]: +2000-01-03 Sunday +2000-01-04 Tuesday +2000-01-05 Wednesday +Freq: B, dtype: object +``` +::: + +`axis` 参数的值为 `0` 或 `1`,并可指定 `DataFrame` 重采样的轴。 + +`kind` 参数可以是 `timestamp` 或 `period`,转换为时间戳或时间段形式的索引。`resample` 默认保留输入的日期时间形式。 + +重采样 `period` 数据时(详情见下文),`convention` 可以设置为 `start` 或 `end`。指定低频时间段如何转换为高频时间段。 + +### 上采样 + +上采样可以指定上采样的方式及插入时间间隔的 `limit` 参数: + +```python +# 从秒到每 250 毫秒 +In [299]: ts[:2].resample('250L').asfreq() +Out[299]: +2012-01-01 00:00:00.000 308.0 +2012-01-01 00:00:00.250 NaN +2012-01-01 00:00:00.500 NaN +2012-01-01 00:00:00.750 NaN +2012-01-01 00:00:01.000 204.0 +Freq: 250L, dtype: float64 + +In [300]: ts[:2].resample('250L').ffill() +Out[300]: +2012-01-01 00:00:00.000 308 +2012-01-01 00:00:00.250 308 +2012-01-01 00:00:00.500 308 +2012-01-01 00:00:00.750 308 +2012-01-01 00:00:01.000 204 +Freq: 250L, dtype: int64 + +In [301]: ts[:2].resample('250L').ffill(limit=2) +Out[301]: +2012-01-01 00:00:00.000 308.0 +2012-01-01 00:00:00.250 308.0 +2012-01-01 00:00:00.500 308.0 +2012-01-01 00:00:00.750 NaN +2012-01-01 00:00:01.000 204.0 +Freq: 250L, dtype: float64 +``` + +### 稀疏重采样 + +相对于时间点总量,稀疏时间序列重采样的点要少很多。单纯上采样稀疏系列可能会生成很多中间值。未指定填充值,即 `fill_method` 是 `None` 时,中间值将填充为 `NaN`。 + +鉴于 `resample` 是基于时间的分组,下列这种方法可以有效重采样,只是分组不是都为 `NaN`。 + +```python +In [302]: rng = pd.date_range('2014-1-1', periods=100, freq='D') + pd.Timedelta('1s') + +In [303]: ts = pd.Series(range(100), index=rng) +``` + +对 `Series` 全范围重采样。 + +```python +In [304]: ts.resample('3T').sum() +Out[304]: +2014-01-01 00:00:00 0 +2014-01-01 00:03:00 0 +2014-01-01 00:06:00 0 +2014-01-01 00:09:00 0 +2014-01-01 00:12:00 0 + .. +2014-04-09 23:48:00 0 +2014-04-09 23:51:00 0 +2014-04-09 23:54:00 0 +2014-04-09 23:57:00 0 +2014-04-10 00:00:00 99 +Freq: 3T, Length: 47521, dtype: int64 +``` + +对以下包含点的分组重采样: + +```python +In [305]: from functools import partial + +In [306]: from pandas.tseries.frequencies import to_offset + +In [307]: def round(t, freq): + .....: freq = to_offset(freq) + .....: return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value) + .....: + +In [308]: ts.groupby(partial(round, freq='3T')).sum() +Out[308]: +2014-01-01 0 +2014-01-02 1 +2014-01-03 2 +2014-01-04 3 +2014-01-05 4 + .. +2014-04-06 95 +2014-04-07 96 +2014-04-08 97 +2014-04-09 98 +2014-04-10 99 +Length: 100, dtype: int64 +``` + +### 聚合 + +类似于[聚合 API](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-aggregate),[Groupby API](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby-aggregate) 及[窗口函数 API](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#stats-aggregate),`Resampler` 可以有选择地重采样。 + +`DataFrame` 重采样,默认用相同函数操作所有列。 + +```python +In [309]: df = pd.DataFrame(np.random.randn(1000, 3), + .....: index=pd.date_range('1/1/2012', freq='S', periods=1000), + .....: columns=['A', 'B', 'C']) + .....: + +In [310]: r = df.resample('3T') + +In [311]: r.mean() +Out[311]: + A B C +2012-01-01 00:00:00 -0.033823 -0.121514 -0.081447 +2012-01-01 00:03:00 0.056909 0.146731 -0.024320 +2012-01-01 00:06:00 -0.058837 0.047046 -0.052021 +2012-01-01 00:09:00 0.063123 -0.026158 -0.066533 +2012-01-01 00:12:00 0.186340 -0.003144 0.074752 +2012-01-01 00:15:00 -0.085954 -0.016287 -0.050046 +``` + +标准 `getitem` 操作可以指定的一列或多列。 + + + +```python +In [312]: r['A'].mean() +Out[312]: +2012-01-01 00:00:00 -0.033823 +2012-01-01 00:03:00 0.056909 +2012-01-01 00:06:00 -0.058837 +2012-01-01 00:09:00 0.063123 +2012-01-01 00:12:00 0.186340 +2012-01-01 00:15:00 -0.085954 +Freq: 3T, Name: A, dtype: float64 + +In [313]: r[['A', 'B']].mean() +Out[313]: + A B +2012-01-01 00:00:00 -0.033823 -0.121514 +2012-01-01 00:03:00 0.056909 0.146731 +2012-01-01 00:06:00 -0.058837 0.047046 +2012-01-01 00:09:00 0.063123 -0.026158 +2012-01-01 00:12:00 0.186340 -0.003144 +2012-01-01 00:15:00 -0.085954 -0.016287 +``` + +聚合还支持函数列表与字典,输出的是 `DataFrame`。 + +```python +In [314]: r['A'].agg([np.sum, np.mean, np.std]) +Out[314]: + sum mean std +2012-01-01 00:00:00 -6.088060 -0.033823 1.043263 +2012-01-01 00:03:00 10.243678 0.056909 1.058534 +2012-01-01 00:06:00 -10.590584 -0.058837 0.949264 +2012-01-01 00:09:00 11.362228 0.063123 1.028096 +2012-01-01 00:12:00 33.541257 0.186340 0.884586 +2012-01-01 00:15:00 -8.595393 -0.085954 1.035476 +``` + +重采样后的 `DataFrame`,可以为每列指定函数列表,生成结构化索引的聚合结果: + +```python +In [315]: r.agg([np.sum, np.mean]) +Out[315]: + A B C + sum mean sum mean sum mean +2012-01-01 00:00:00 -6.088060 -0.033823 -21.872530 -0.121514 -14.660515 -0.081447 +2012-01-01 00:03:00 10.243678 0.056909 26.411633 0.146731 -4.377642 -0.024320 +2012-01-01 00:06:00 -10.590584 -0.058837 8.468289 0.047046 -9.363825 -0.052021 +2012-01-01 00:09:00 11.362228 0.063123 -4.708526 -0.026158 -11.975895 -0.066533 +2012-01-01 00:12:00 33.541257 0.186340 -0.565895 -0.003144 13.455299 0.074752 +2012-01-01 00:15:00 -8.595393 -0.085954 -1.628689 -0.016287 -5.004580 -0.050046 +``` + +把字典传递给 `aggregate`,可以为 `DataFrame` 里不同的列应用不同聚合函数。 + +```python +In [316]: r.agg({'A': np.sum, + .....: 'B': lambda x: np.std(x, ddof=1)}) + .....: +Out[316]: + A B +2012-01-01 00:00:00 -6.088060 1.001294 +2012-01-01 00:03:00 10.243678 1.074597 +2012-01-01 00:06:00 -10.590584 0.987309 +2012-01-01 00:09:00 11.362228 0.944953 +2012-01-01 00:12:00 33.541257 1.095025 +2012-01-01 00:15:00 -8.595393 1.035312 +``` + +还可以用字符串代替函数名。为了让字符串有效,必须在重采样对象上操作: + +```python +In [317]: r.agg({'A': 'sum', 'B': 'std'}) +Out[317]: + A B +2012-01-01 00:00:00 -6.088060 1.001294 +2012-01-01 00:03:00 10.243678 1.074597 +2012-01-01 00:06:00 -10.590584 0.987309 +2012-01-01 00:09:00 11.362228 0.944953 +2012-01-01 00:12:00 33.541257 1.095025 +2012-01-01 00:15:00 -8.595393 1.035312 +``` + +甚至还可以为每列单独多个聚合函数。 + +```python +In [318]: r.agg({'A': ['sum', 'std'], 'B': ['mean', 'std']}) +Out[318]: + A B + sum std mean std +2012-01-01 00:00:00 -6.088060 1.043263 -0.121514 1.001294 +2012-01-01 00:03:00 10.243678 1.058534 0.146731 1.074597 +2012-01-01 00:06:00 -10.590584 0.949264 0.047046 0.987309 +2012-01-01 00:09:00 11.362228 1.028096 -0.026158 0.944953 +2012-01-01 00:12:00 33.541257 0.884586 -0.003144 1.095025 +2012-01-01 00:15:00 -8.595393 1.035476 -0.016287 1.035312 +``` + +如果 `DataFrame` 用的不是 `datetime` 型索引,则可以基于 `datetime` 数据列重采样,用关键字 `on` 控制。 + +```python +In [319]: df = pd.DataFrame({'date': pd.date_range('2015-01-01', freq='W', periods=5), + .....: 'a': np.arange(5)}, + .....: index=pd.MultiIndex.from_arrays([ + .....: [1, 2, 3, 4, 5], + .....: pd.date_range('2015-01-01', freq='W', periods=5)], + .....: names=['v', 'd'])) + .....: + +In [320]: df +Out[320]: + date a +v d +1 2015-01-04 2015-01-04 0 +2 2015-01-11 2015-01-11 1 +3 2015-01-18 2015-01-18 2 +4 2015-01-25 2015-01-25 3 +5 2015-02-01 2015-02-01 4 + +In [321]: df.resample('M', on='date').sum() +Out[321]: + a +date +2015-01-31 6 +2015-02-28 4 +``` + +同样,还可以对 `datetime MultiIndex` 重采样,通过关键字 `level` 传递名字与位置。 + +```python +In [322]: df.resample('M', level='d').sum() +Out[322]: + a +d +2015-01-31 6 +2015-02-28 4 +``` + +### 分组迭代 + +`Resampler`对象迭代分组数据的操作非常自然,类似于 [`itertools.groupby()`](https://docs.python.org/3/library/itertools.html#itertools.groupby "(in Python v3.7)"): + +```python +In [323]: small = pd.Series( + .....: range(6), + .....: index=pd.to_datetime(['2017-01-01T00:00:00', + .....: '2017-01-01T00:30:00', + .....: '2017-01-01T00:31:00', + .....: '2017-01-01T01:00:00', + .....: '2017-01-01T03:00:00', + .....: '2017-01-01T03:05:00']) + .....: ) + .....: + +In [324]: resampled = small.resample('H') + +In [325]: for name, group in resampled: + .....: print("Group: ", name) + .....: print("-" * 27) + .....: print(group, end="\n\n") + .....: +Group: 2017-01-01 00:00:00 +--------------------------- +2017-01-01 00:00:00 0 +2017-01-01 00:30:00 1 +2017-01-01 00:31:00 2 +dtype: int64 + +Group: 2017-01-01 01:00:00 +--------------------------- +2017-01-01 01:00:00 3 +dtype: int64 + +Group: 2017-01-01 02:00:00 +--------------------------- +Series([], dtype: int64) + +Group: 2017-01-01 03:00:00 +--------------------------- +2017-01-01 03:00:00 4 +2017-01-01 03:05:00 5 +dtype: int64 +``` + +了解更多详情,请参阅[分组迭代](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#groupby-iterating-label)或 [`itertools.groupby()`](https://docs.python.org/3/library/itertools.html#itertools.groupby "(in Python v3.7)")。 + +## 时间跨度表示 + +规律时间间隔可以用 pandas 的 `Peirod` 对象表示,`Period` 对象序列叫做 `PeriodIndex`,用便捷函数 `period_range` 创建。 + +### Period + +`Period` 表示时间跨度,即时间段,如年、季、月、日等。关键字 `freq` 与频率别名可以指定时间段。`freq` 表示的是 `Period` 的时间跨度,不能为负,如,`-3D`。 + +```python +In [326]: pd.Period('2012', freq='A-DEC') +Out[326]: Period('2012', 'A-DEC') + +In [327]: pd.Period('2012-1-1', freq='D') +Out[327]: Period('2012-01-01', 'D') + +In [328]: pd.Period('2012-1-1 19:00', freq='H') +Out[328]: Period('2012-01-01 19:00', 'H') + +In [329]: pd.Period('2012-1-1 19:00', freq='5H') +Out[329]: Period('2012-01-01 19:00', '5H') +``` + +时间段加减法按自身频率位移。 不同频率的时间段不可进行算术运算。 + +```python +In [330]: p = pd.Period('2012', freq='A-DEC') + +In [331]: p + 1 +Out[331]: Period('2013', 'A-DEC') + +In [332]: p - 3 +Out[332]: Period('2009', 'A-DEC') + +In [333]: p = pd.Period('2012-01', freq='2M') + +In [334]: p + 2 +Out[334]: Period('2012-05', '2M') + +In [335]: p - 1 +Out[335]: Period('2011-11', '2M') + +In [336]: p == pd.Period('2012-01', freq='3M') +--------------------------------------------------------------------------- +IncompatibleFrequency Traceback (most recent call last) + in +----> 1 p == pd.Period('2012-01', freq='3M') + +/pandas/pandas/_libs/tslibs/period.pyx in pandas._libs.tslibs.period._Period.__richcmp__() + +IncompatibleFrequency: Input has different freq=3M from Period(freq=2M) +``` + +`freq` 的频率为日或更高频率时,如 `D`、`H`、`T`、`S`、`L`、`U`、`N`,`offsets` 与 `timedelta` 可以用相同频率实现加法。否则,会触发 `ValueError`。 + +```python +In [337]: p = pd.Period('2014-07-01 09:00', freq='H') + +In [338]: p + pd.offsets.Hour(2) +Out[338]: Period('2014-07-01 11:00', 'H') + +In [339]: p + datetime.timedelta(minutes=120) +Out[339]: Period('2014-07-01 11:00', 'H') + +In [340]: p + np.timedelta64(7200, 's') +Out[340]: Period('2014-07-01 11:00', 'H') +In [1]: p + pd.offsets.Minute(5) +Traceback + ... +ValueError: Input has different freq from Period(freq=H) +``` + +如果 `Period` 为其它频率,只有相同频率的 `offsets` 可以相加。否则,会触发 `ValueError`。 + +```python +In [341]: p = pd.Period('2014-07', freq='M') + +In [342]: p + pd.offsets.MonthEnd(3) +Out[342]: Period('2014-10', 'M') +In [1]: p + pd.offsets.MonthBegin(3) +Traceback + ... +ValueError: Input has different freq from Period(freq=M) +``` + +用相同频率计算不同时间段实例之间的区别,将返回这些实例之间的频率单元数量。 + +```python +In [343]: pd.Period('2012', freq='A-DEC') - pd.Period('2002', freq='A-DEC') +Out[343]: <10 * YearEnds: month=12> +``` + +### PeriodIndex 与 period_range + + `period_range` 便捷函数可以创建有规律的 `Period` 对象序列,即 `PeriodIndex`。 + +```python +In [344]: prng = pd.period_range('1/1/2011', '1/1/2012', freq='M') + +In [345]: prng +Out[345]: +PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06', + '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12', + '2012-01'], + dtype='period[M]', freq='M') +``` + +也可以直接用 `PeriodIndex` 创建: + +```python +In [346]: pd.PeriodIndex(['2011-1', '2011-2', '2011-3'], freq='M') +Out[346]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M') +``` + +频率为复数时,输出的 `Period` 序列为复数时间段。 + +```python +In [347]: pd.period_range(start='2014-01', freq='3M', periods=4) +Out[347]: PeriodIndex(['2014-01', '2014-04', '2014-07', '2014-10'], dtype='period[3M]', freq='3M') +``` + +`Period` 对象的 `start` 或 `end` 会被当作 `PeriodIndex` 的锚定终点,其频率与 `PeriodIndex` 的频率一样。 + +```python +In [348]: pd.period_range(start=pd.Period('2017Q1', freq='Q'), + .....: end=pd.Period('2017Q2', freq='Q'), freq='M') + .....: +Out[348]: PeriodIndex(['2017-03', '2017-04', '2017-05', '2017-06'], dtype='period[M]', freq='M') +``` + +和 `DatetimeIndex` 一样,`PeriodIndex` 也可以作为 pandas 对象的索引。 + +```python +In [349]: ps = pd.Series(np.random.randn(len(prng)), prng) + +In [350]: ps +Out[350]: +2011-01 -2.916901 +2011-02 0.514474 +2011-03 1.346470 +2011-04 0.816397 +2011-05 2.258648 +2011-06 0.494789 +2011-07 0.301239 +2011-08 0.464776 +2011-09 -1.393581 +2011-10 0.056780 +2011-11 0.197035 +2011-12 2.261385 +2012-01 -0.329583 +Freq: M, dtype: float64 +``` + +`PeriodIndex` 的加减法与 `Period` 一样。 + +```python +In [351]: idx = pd.period_range('2014-07-01 09:00', periods=5, freq='H') + +In [352]: idx +Out[352]: +PeriodIndex(['2014-07-01 09:00', '2014-07-01 10:00', '2014-07-01 11:00', + '2014-07-01 12:00', '2014-07-01 13:00'], + dtype='period[H]', freq='H') + +In [353]: idx + pd.offsets.Hour(2) +Out[353]: +PeriodIndex(['2014-07-01 11:00', '2014-07-01 12:00', '2014-07-01 13:00', + '2014-07-01 14:00', '2014-07-01 15:00'], + dtype='period[H]', freq='H') + +In [354]: idx = pd.period_range('2014-07', periods=5, freq='M') + +In [355]: idx +Out[355]: PeriodIndex(['2014-07', '2014-08', '2014-09', '2014-10', '2014-11'], dtype='period[M]', freq='M') + +In [356]: idx + pd.offsets.MonthEnd(3) +Out[356]: PeriodIndex(['2014-10', '2014-11', '2014-12', '2015-01', '2015-02'], dtype='period[M]', freq='M') +``` + +`PeriodIndex` 有自己的数据类型,即 `period`,请参阅 [Period 数据类型](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-period-dtype)。 + +### Period 数据类型 + +*0.19.0 版新增*。 + +`PeriodIndex` 的自定义数据类型是 `period`,是 pandas 扩展数据类型,类似于[带时区信息的数据类型](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-timezone-series)(`datetime64[ns, tz]`)。 + +`Period` 数据类型支持 `freq` 属性,还可以用 `period[freq]` 表示,如,`period[D]` 或 `period[M]`,这里用的是[频率字符串](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases)。 + +```python +In [357]: pi = pd.period_range('2016-01-01', periods=3, freq='M') + +In [358]: pi +Out[358]: PeriodIndex(['2016-01', '2016-02', '2016-03'], dtype='period[M]', freq='M') + +In [359]: pi.dtype +Out[359]: period[M] +``` + +`period` 数据类型在 `.astype(...)` 里使用。允许改变 `PeriodIndex` 的 `freq`, 如 `.asfreq()`,并用 `to_period()` 把 `DatetimeIndex` 转化为 `PeriodIndex`: + +```python +# 把月频改为日频 +In [360]: pi.astype('period[D]') +Out[360]: PeriodIndex(['2016-01-31', '2016-02-29', '2016-03-31'], dtype='period[D]', freq='D') + +# 转换为 DatetimeIndex +In [361]: pi.astype('datetime64[ns]') +Out[361]: DatetimeIndex(['2016-01-01', '2016-02-01', '2016-03-01'], dtype='datetime64[ns]', freq='MS') + +# 转换为 PeriodIndex +In [362]: dti = pd.date_range('2011-01-01', freq='M', periods=3) + +In [363]: dti +Out[363]: DatetimeIndex(['2011-01-31', '2011-02-28', '2011-03-31'], dtype='datetime64[ns]', freq='M') + +In [364]: dti.astype('period[M]') +Out[364]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M') +``` + +### PeriodIndex 局部字符串索引 + +与 `DatetimeIndex` 一样,`PeriodIndex` 可以把日期与字符串传递给 `Series` 与 `DataFrame`。详情请参阅 [DatetimeIndex 局部字符串索引](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-partialindexing)。 + +```python +In [365]: ps['2011-01'] +Out[365]: -2.9169013294054507 + +In [366]: ps[datetime.datetime(2011, 12, 25):] +Out[366]: +2011-12 2.261385 +2012-01 -0.329583 +Freq: M, dtype: float64 + +In [367]: ps['10/31/2011':'12/31/2011'] +Out[367]: +2011-10 0.056780 +2011-11 0.197035 +2011-12 2.261385 +Freq: M, dtype: float64 +``` + +传递比 `PeriodIndex` 更低频率的字符串会返回局部切片数据。 + +```python +In [368]: ps['2011'] +Out[368]: +2011-01 -2.916901 +2011-02 0.514474 +2011-03 1.346470 +2011-04 0.816397 +2011-05 2.258648 +2011-06 0.494789 +2011-07 0.301239 +2011-08 0.464776 +2011-09 -1.393581 +2011-10 0.056780 +2011-11 0.197035 +2011-12 2.261385 +Freq: M, dtype: float64 + +In [369]: dfp = pd.DataFrame(np.random.randn(600, 1), + .....: columns=['A'], + .....: index=pd.period_range('2013-01-01 9:00', + .....: periods=600, + .....: freq='T')) + .....: + +In [370]: dfp +Out[370]: + A +2013-01-01 09:00 -0.538468 +2013-01-01 09:01 -1.365819 +2013-01-01 09:02 -0.969051 +2013-01-01 09:03 -0.331152 +2013-01-01 09:04 -0.245334 +... ... +2013-01-01 18:55 0.522460 +2013-01-01 18:56 0.118710 +2013-01-01 18:57 0.167517 +2013-01-01 18:58 0.922883 +2013-01-01 18:59 1.721104 + +[600 rows x 1 columns] + +In [371]: dfp['2013-01-01 10H'] +Out[371]: + A +2013-01-01 10:00 -0.308975 +2013-01-01 10:01 0.542520 +2013-01-01 10:02 1.061068 +2013-01-01 10:03 0.754005 +2013-01-01 10:04 0.352933 +... ... +2013-01-01 10:55 -0.865621 +2013-01-01 10:56 -1.167818 +2013-01-01 10:57 -2.081748 +2013-01-01 10:58 -0.527146 +2013-01-01 10:59 0.802298 + +[60 rows x 1 columns] +``` + +与 `DatetimeIndex` 一样,终点包含在结果范围之内。下例中的切片数据就是从 10:00 到 11:59。 + +```python +In [372]: dfp['2013-01-01 10H':'2013-01-01 11H'] +Out[372]: + A +2013-01-01 10:00 -0.308975 +2013-01-01 10:01 0.542520 +2013-01-01 10:02 1.061068 +2013-01-01 10:03 0.754005 +2013-01-01 10:04 0.352933 +... ... +2013-01-01 11:55 -0.590204 +2013-01-01 11:56 1.539990 +2013-01-01 11:57 -1.224826 +2013-01-01 11:58 0.578798 +2013-01-01 11:59 -0.685496 + +[120 rows x 1 columns] +``` + +### 频率转换与 `PeriodIndex` 重采样 + +`Period` 与 `PeriodIndex` 的频率可以用 `asfreq` 转换。下列代码开始于 2011 财年,结束时间为十二月: + +```python +In [373]: p = pd.Period('2011', freq='A-DEC') + +In [374]: p +Out[374]: Period('2011', 'A-DEC') +``` + +可以把它转换为月频。使用 `how` 参数,指定是否返回开始或结束月份。 + +```python +In [375]: p.asfreq('M', how='start') +Out[375]: Period('2011-01', 'M') + +In [376]: p.asfreq('M', how='end') +Out[376]: Period('2011-12', 'M') +``` + +简称 `s` 与 `e` 用起来更方便: + +```python +In [377]: p.asfreq('M', 's') +Out[377]: Period('2011-01', 'M') + +In [378]: p.asfreq('M', 'e') +Out[378]: Period('2011-12', 'M') +``` + +转换为“超级 period”,(如,年频就是季频的超级 period),自动返回包含输入时间段的超级 period: + +```python +In [379]: p = pd.Period('2011-12', freq='M') + +In [380]: p.asfreq('A-NOV') +Out[380]: Period('2012', 'A-NOV') +``` + +注意,因为转换年频是在十一月结束的,2011 年 12 月的月时间段实际上是 `2012 A-NOV` period。 + +用锚定频率转换时间段,对经济学、商业等领域里的各种季度数据特别有用。很多公司都依据其财年开始月与结束月定义季度。因此,2011 年第一个季度有可能 2010 年就开始了,也有可能 2011 年过了几个月才开始。通过锚定频率,pandas 可以处理所有从 `Q-JAN` 至 `Q-DEC`的季度频率。 + +`Q-DEC` 定义的是常规日历季度: + +```python +In [381]: p = pd.Period('2012Q1', freq='Q-DEC') + +In [382]: p.asfreq('D', 's') +Out[382]: Period('2012-01-01', 'D') + +In [383]: p.asfreq('D', 'e') +Out[383]: Period('2012-03-31', 'D') +``` + +`Q-MAR` 定义的是财年结束于三月: + +```python +In [384]: p = pd.Period('2011Q4', freq='Q-MAR') + +In [385]: p.asfreq('D', 's') +Out[385]: Period('2011-01-01', 'D') + +In [386]: p.asfreq('D', 'e') +Out[386]: Period('2011-03-31', 'D') +``` + +### 不同表现形式之间的转换 + +`to_period` 把时间戳转换为 `PeriodIndex`,`to_timestamp` 则执行反向操作。 + +```python +In [387]: rng = pd.date_range('1/1/2012', periods=5, freq='M') + +In [388]: ts = pd.Series(np.random.randn(len(rng)), index=rng) + +In [389]: ts +Out[389]: +2012-01-31 1.931253 +2012-02-29 -0.184594 +2012-03-31 0.249656 +2012-04-30 -0.978151 +2012-05-31 -0.873389 +Freq: M, dtype: float64 + +In [390]: ps = ts.to_period() + +In [391]: ps +Out[391]: +2012-01 1.931253 +2012-02 -0.184594 +2012-03 0.249656 +2012-04 -0.978151 +2012-05 -0.873389 +Freq: M, dtype: float64 + +In [392]: ps.to_timestamp() +Out[392]: +2012-01-01 1.931253 +2012-02-01 -0.184594 +2012-03-01 0.249656 +2012-04-01 -0.978151 +2012-05-01 -0.873389 +Freq: MS, dtype: float64 +``` + +记住 `s` 与 `e` 返回 `period` 开始或结束的时间戳: + +```python +In [393]: ps.to_timestamp('D', how='s') +Out[393]: +2012-01-01 1.931253 +2012-02-01 -0.184594 +2012-03-01 0.249656 +2012-04-01 -0.978151 +2012-05-01 -0.873389 +Freq: MS, dtype: float64 +``` + +用便捷算数函数可以转换时间段与时间戳`。下例中,把以 11 月年度结束的季频转换为以下一个季度月末上午 9 点: + +```python +In [394]: prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV') + +In [395]: ts = pd.Series(np.random.randn(len(prng)), prng) + +In [396]: ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9 + +In [397]: ts.head() +Out[397]: +1990-03-01 09:00 -0.109291 +1990-06-01 09:00 -0.637235 +1990-09-01 09:00 -1.735925 +1990-12-01 09:00 2.096946 +1991-03-01 09:00 -1.039926 +Freq: H, dtype: float64 +``` + +## 界外跨度表示 + +数据在 `Timestamp` 限定边界外时,参阅 [Timestamp 限制](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-timestamp-limits),可以用 `PeriodIndex` 或 `Periods` 的 `Series` 执行计算。 + +```python +In [398]: span = pd.period_range('1215-01-01', '1381-01-01', freq='D') + +In [399]: span +Out[399]: +PeriodIndex(['1215-01-01', '1215-01-02', '1215-01-03', '1215-01-04', + '1215-01-05', '1215-01-06', '1215-01-07', '1215-01-08', + '1215-01-09', '1215-01-10', + ... + '1380-12-23', '1380-12-24', '1380-12-25', '1380-12-26', + '1380-12-27', '1380-12-28', '1380-12-29', '1380-12-30', + '1380-12-31', '1381-01-01'], + dtype='period[D]', length=60632, freq='D') +``` + +从基于 `int64` 的 `YYYYMMDD` 表示形式转换。 + +```python +In [400]: s = pd.Series([20121231, 20141130, 99991231]) + +In [401]: s +Out[401]: +0 20121231 +1 20141130 +2 99991231 +dtype: int64 + +In [402]: def conv(x): + .....: return pd.Period(year=x // 10000, month=x // 100 % 100, + .....: day=x % 100, freq='D') + .....: + +In [403]: s.apply(conv) +Out[403]: +0 2012-12-31 +1 2014-11-30 +2 9999-12-31 +dtype: period[D] + +In [404]: s.apply(conv)[2] +Out[404]: Period('9999-12-31', 'D') +``` + +轻轻松松就可以这些数据转换成 `PeriodIndex`: + +```python +In [405]: span = pd.PeriodIndex(s.apply(conv)) + +In [406]: span +Out[406]: PeriodIndex(['2012-12-31', '2014-11-30', '9999-12-31'], dtype='period[D]', freq='D') +``` + +## 时区控制 + +利用 `pytz` 与 `datetuil` 或标准库 `datetime.timezone` 对象,pandas 能以多种方式处理不同时区的时间戳。 + +### 处理时区 + +Pandas 对象默认不支持时区信息: + +```python +In [407]: rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D') + +In [408]: rng.tz is None +Out[408]: True +``` + +用 [`date_range()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html#pandas.date_range "pandas.date_range")、[`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 、[`DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex "pandas.DatetimeIndex") 的 `tz_localize` 方法或 `tz` 关键字参数,可以为这些日期加上本地时区,即,把指定时区分配给不带时区的日期。还可以传递 `pytz` 、 `dateutil` 时区对象或奥尔森时区数据库字符串。奥尔森时区字符串默认返回 `pytz` 时区对象。要返回 `dateutil` 时区对象,在字符串前加上 `datetuil/`。 + +* 用 `from pytz import common_timezones, all_timezones` 在 `pytz` 里查找通用时区。 + +* `dateutil` 使用操作系统时区,没有固定的列表,其通用时区名与 `pytz` 相同。 + +```python +In [409]: import dateutil + +# pytz +In [410]: rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D', + .....: tz='Europe/London') + .....: + +In [411]: rng_pytz.tz +Out[411]: + +# dateutil +In [412]: rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D') + +In [413]: rng_dateutil = rng_dateutil.tz_localize('dateutil/Europe/London') + +In [414]: rng_dateutil.tz +Out[414]: tzfile('/usr/share/zoneinfo/Europe/London') + +# dateutil - utc special case +In [415]: rng_utc = pd.date_range('3/6/2012 00:00', periods=3, freq='D', + .....: tz=dateutil.tz.tzutc()) + .....: + +In [416]: rng_utc.tz +Out[416]: tzutc() +``` + +*0.25.0 版新增。* + +```python +# datetime.timezone +In [417]: rng_utc = pd.date_range('3/6/2012 00:00', periods=3, freq='D', + .....: tz=datetime.timezone.utc) + .....: + +In [418]: rng_utc.tz +Out[418]: datetime.timezone.utc +``` + +注意, `dateutil` 的 `UTC` 时区是个特例,要显式地创建 `dateutil.tz.tzutc` 实例。可以先创建其它时区对象。 + +```python +In [419]: import pytz + +# pytz +In [420]: tz_pytz = pytz.timezone('Europe/London') + +In [421]: rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D') + +In [422]: rng_pytz = rng_pytz.tz_localize(tz_pytz) + +In [423]: rng_pytz.tz == tz_pytz +Out[423]: True + +# dateutil +In [424]: tz_dateutil = dateutil.tz.gettz('Europe/London') + +In [425]: rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D', + .....: tz=tz_dateutil) + .....: + +In [426]: rng_dateutil.tz == tz_dateutil +Out[426]: True +``` + +不同时区之间转换带时区的 pandas 对象时,用 `tz_convert` 方法。 + +```python +In [427]: rng_pytz.tz_convert('US/Eastern') +Out[427]: +DatetimeIndex(['2012-03-05 19:00:00-05:00', '2012-03-06 19:00:00-05:00', + '2012-03-07 19:00:00-05:00'], + dtype='datetime64[ns, US/Eastern]', freq='D') +``` + +::: tip 注意 + +使用 `pytz` 时区时,对于相同的输入时区,[`DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex "pandas.DatetimeIndex") 会构建一个与 [`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 不同的时区对象。[`DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex "pandas.DatetimeIndex") 具有一组 [`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 对象,UTC 偏移量也不同,不能用一个 `pytz` 时区实例简洁地表示,[`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 则可以用来指定 UTC 偏移量表示一个时点。 + +```python +In [428]: dti = pd.date_range('2019-01-01', periods=3, freq='D', tz='US/Pacific') + +In [429]: dti.tz +Out[429]: + +In [430]: ts = pd.Timestamp('2019-01-01', tz='US/Pacific') + +In [431]: ts.tz +Out[431]: +``` + +::: + +::: danger 警告 + +注意不同支持库之间的转换。一些时区,`pytz` 与 `datetuil` 对时区的定义不一样。与 `US/Eastern` 等“标准”时区相比,那些更少见的时区的问题更严重。 + +::: + +::: danger 警告 + +注意不同版本时区支持库对时区的定义并不一致。在处理本地存储数据时使用一种版本的支持库,在运算时使用另一种版本的支持库,可能会引起问题。参阅[本文](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-hdf5-notes)了解如何处理这种问题。 + +::: + +::: danger 警告 + +对于 `pytz` 时区,直接把时区对象传递给 `datetime.datetime` 构建器是不对的,如,`datetime.datetime(2011, 1, 1, tz=pytz.timezone('US/Eastern'))`。反之,datetime 要在 `pytz` 时区对象上使用 `localize` 方法。 + +::: + +在后台,所有 Timestamp 都存储为 UTC。含时区信息的 [`DatetimeIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.html#pandas.DatetimeIndex "pandas.DatetimeIndex") 或 [`Timestamp`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html#pandas.Timestamp "pandas.Timestamp") 的值有其自己的本地化时区字段(日、小时、分钟等)。不过,对于不同时区时间戳,如果其 UTC 值相同,将被视作是相等的时间。 + +```python +In [432]: rng_eastern = rng_utc.tz_convert('US/Eastern') + +In [433]: rng_berlin = rng_utc.tz_convert('Europe/Berlin') + +In [434]: rng_eastern[2] +Out[434]: Timestamp('2012-03-07 19:00:00-0500', tz='US/Eastern', freq='D') + +In [435]: rng_berlin[2] +Out[435]: Timestamp('2012-03-08 01:00:00+0100', tz='Europe/Berlin', freq='D') + +In [436]: rng_eastern[2] == rng_berlin[2] +Out[436]: True +``` + +不同时区 [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 之间的操作生成的是与 UTC 时间戳数据对齐的 UTC [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series")。 + +```python +In [437]: ts_utc = pd.Series(range(3), pd.date_range('20130101', periods=3, tz='UTC')) + +In [438]: eastern = ts_utc.tz_convert('US/Eastern') + +In [439]: berlin = ts_utc.tz_convert('Europe/Berlin') + +In [440]: result = eastern + berlin + +In [441]: result +Out[441]: +2013-01-01 00:00:00+00:00 0 +2013-01-02 00:00:00+00:00 2 +2013-01-03 00:00:00+00:00 4 +Freq: D, dtype: int64 + +In [442]: result.index +Out[442]: +DatetimeIndex(['2013-01-01 00:00:00+00:00', '2013-01-02 00:00:00+00:00', + '2013-01-03 00:00:00+00:00'], + dtype='datetime64[ns, UTC]', freq='D') +``` + +用 `tz_localize(None)` 或 `tz_convert(None)` 去掉时区信息。`tz_localize(None)` 去掉带本地时间表示的时区信息。`tz_convert(None)`先把时间戳转为 UTC 时间,再去掉时区信息。 + +```python +In [443]: didx = pd.date_range(start='2014-08-01 09:00', freq='H', + .....: periods=3, tz='US/Eastern') + .....: + +In [444]: didx +Out[444]: +DatetimeIndex(['2014-08-01 09:00:00-04:00', '2014-08-01 10:00:00-04:00', + '2014-08-01 11:00:00-04:00'], + dtype='datetime64[ns, US/Eastern]', freq='H') + +In [445]: didx.tz_localize(None) +Out[445]: +DatetimeIndex(['2014-08-01 09:00:00', '2014-08-01 10:00:00', + '2014-08-01 11:00:00'], + dtype='datetime64[ns]', freq='H') + +In [446]: didx.tz_convert(None) +Out[446]: +DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00', + '2014-08-01 15:00:00'], + dtype='datetime64[ns]', freq='H') + +# tz_convert(None) 等同于 tz_convert('UTC').tz_localize(None) +In [447]: didx.tz_convert('UTC').tz_localize(None) +Out[447]: +DatetimeIndex(['2014-08-01 13:00:00', '2014-08-01 14:00:00', + '2014-08-01 15:00:00'], + dtype='datetime64[ns]', freq='H') +``` + +### 本地化导致的混淆时间 + +`tz_localize` 不能决定时间戳的 UTC偏移量,因为本地时区的夏时制(DST)会引起一些时间在一天内出现两次的问题(“时钟回调”)。下面的选项是有效的: + +* `raise`:默认触发 `pytz.AmbiguousTimeError` +* `infer`:依据时间戳的单一性,尝试推断正确的偏移量 +* `NaT`:用 `NaT` 替换混淆时间 +* `bool`:`True` 代表夏时制(DST)时间,`False` 代表正常时间。数组型的 `bool` 值支持一组时间序列。 + +```python +In [448]: rng_hourly = pd.DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00', + .....: '11/06/2011 01:00', '11/06/2011 02:00']) + .....: +``` + +这种操作会引起混淆时间失败错误( '11/06/2011 01:00')。 + +```python +In [2]: rng_hourly.tz_localize('US/Eastern') +AmbiguousTimeError: Cannot infer dst time from Timestamp('2011-11-06 01:00:00'), try using the 'ambiguous' argument +``` + +用下列指定的关键字控制混淆时间。 + +```python +In [449]: rng_hourly.tz_localize('US/Eastern', ambiguous='infer') +Out[449]: +DatetimeIndex(['2011-11-06 00:00:00-04:00', '2011-11-06 01:00:00-04:00', + '2011-11-06 01:00:00-05:00', '2011-11-06 02:00:00-05:00'], + dtype='datetime64[ns, US/Eastern]', freq=None) + +In [450]: rng_hourly.tz_localize('US/Eastern', ambiguous='NaT') +Out[450]: +DatetimeIndex(['2011-11-06 00:00:00-04:00', 'NaT', 'NaT', + '2011-11-06 02:00:00-05:00'], + dtype='datetime64[ns, US/Eastern]', freq=None) + +In [451]: rng_hourly.tz_localize('US/Eastern', ambiguous=[True, True, False, False]) +Out[451]: +DatetimeIndex(['2011-11-06 00:00:00-04:00', '2011-11-06 01:00:00-04:00', + '2011-11-06 01:00:00-05:00', '2011-11-06 02:00:00-05:00'], + dtype='datetime64[ns, US/Eastern]', freq=None) +``` + +### 本地化时不存在的时间 + +夏时制转换会移位本地时间一个小时,这样会创建一个不存在的本地时间(“时钟春季前滚”)。这种本地化操作会导致时间序列出现不存在的时间,此问题可以用 `nonexistent` 参数解决。下列都是有效的选项: + +* `raise`:默认触发 `pytz.NonExistentTimeError` +* `NaT`:用 `NaT` 替换不存在的时间 +* `shift_forward`:把不存在的时间前移至最近的真实时间 +* `shift_backward`:把不存在的时间后滚至最近的真实时间 +* `Timedelta` 对象:用 `timedelta` 移位不存在的时间 + +```python +In [452]: dti = pd.date_range(start='2015-03-29 02:30:00', periods=3, freq='H') + +# 2:30 是不存在的时间 +``` + +对不存在的时间进行本地化操作默认会触发错误。 + +```python +In [2]: dti.tz_localize('Europe/Warsaw') +NonExistentTimeError: 2015-03-29 02:30:00 +``` + +把不存在的时间转换为 `NaT` 或移位时间 + +```python +In [453]: dti +Out[453]: +DatetimeIndex(['2015-03-29 02:30:00', '2015-03-29 03:30:00', + '2015-03-29 04:30:00'], + dtype='datetime64[ns]', freq='H') + +In [454]: dti.tz_localize('Europe/Warsaw', nonexistent='shift_forward') +Out[454]: +DatetimeIndex(['2015-03-29 03:00:00+02:00', '2015-03-29 03:30:00+02:00', + '2015-03-29 04:30:00+02:00'], + dtype='datetime64[ns, Europe/Warsaw]', freq='H') + +In [455]: dti.tz_localize('Europe/Warsaw', nonexistent='shift_backward') +Out[455]: +DatetimeIndex(['2015-03-29 01:59:59.999999999+01:00', + '2015-03-29 03:30:00+02:00', + '2015-03-29 04:30:00+02:00'], + dtype='datetime64[ns, Europe/Warsaw]', freq='H') + +In [456]: dti.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta(1, unit='H')) +Out[456]: +DatetimeIndex(['2015-03-29 03:30:00+02:00', '2015-03-29 03:30:00+02:00', + '2015-03-29 04:30:00+02:00'], + dtype='datetime64[ns, Europe/Warsaw]', freq='H') + +In [457]: dti.tz_localize('Europe/Warsaw', nonexistent='NaT') +Out[457]: +DatetimeIndex(['NaT', '2015-03-29 03:30:00+02:00', + '2015-03-29 04:30:00+02:00'], + dtype='datetime64[ns, Europe/Warsaw]', freq='H') +``` + +### 时区序列操作 + +无时区 [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 值的数据类型是 datetime64[ns]。 + +```python +In [458]: s_naive = pd.Series(pd.date_range('20130101', periods=3)) + +In [459]: s_naive +Out[459]: +0 2013-01-01 +1 2013-01-02 +2 2013-01-03 +dtype: datetime64[ns] +``` + +有时区 [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 值的数据类型是 datetime64[ns, tz],`tz` 指的是时区。 + +```python +In [460]: s_aware = pd.Series(pd.date_range('20130101', periods=3, tz='US/Eastern')) + +In [461]: s_aware +Out[461]: +0 2013-01-01 00:00:00-05:00 +1 2013-01-02 00:00:00-05:00 +2 2013-01-03 00:00:00-05:00 +dtype: datetime64[ns, US/Eastern] +``` + +这两种 [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series "pandas.Series") 的时区信息都可以用 `.dt` 访问器操控,参阅 [dt 访问器](https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#basics-dt-accessors)。 + +例如,本地化与把无时区时间戳转换为有时区时间戳。 + +```python +In [462]: s_naive.dt.tz_localize('UTC').dt.tz_convert('US/Eastern') +Out[462]: +0 2012-12-31 19:00:00-05:00 +1 2013-01-01 19:00:00-05:00 +2 2013-01-02 19:00:00-05:00 +dtype: datetime64[ns, US/Eastern] +``` + +时区信息还可以用 `astype` 操控。这种方法可以本地化并转换无时区时间戳或转换有时区时间戳。 + +```python +# 本地化,并把无时区转换为有时区 +In [463]: s_naive.astype('datetime64[ns, US/Eastern]') +Out[463]: +0 2012-12-31 19:00:00-05:00 +1 2013-01-01 19:00:00-05:00 +2 2013-01-02 19:00:00-05:00 +dtype: datetime64[ns, US/Eastern] + +# 把有时区变为无时区 +In [464]: s_aware.astype('datetime64[ns]') +Out[464]: +0 2013-01-01 05:00:00 +1 2013-01-02 05:00:00 +2 2013-01-03 05:00:00 +dtype: datetime64[ns] + +# 转换为新的时区 +In [465]: s_aware.astype('datetime64[ns, CET]') +Out[465]: +0 2013-01-01 06:00:00+01:00 +1 2013-01-02 06:00:00+01:00 +2 2013-01-03 06:00:00+01:00 +dtype: datetime64[ns, CET] +``` + +::: tip 注意 + +在 `Series` 上应用 [`Series.to_numpy()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.to_numpy.html#pandas.Series.to_numpy "pandas.Series.to_numpy"),返回数据的 NumPy 数组。虽然 NumPy 可以**输出**本地时区!但其实它当前并不支持时区,因此,有时区时间戳数据返回的是时间戳对象数组: + +```python +In [466]: s_naive.to_numpy() +Out[466]: +array(['2013-01-01T00:00:00.000000000', '2013-01-02T00:00:00.000000000', + '2013-01-03T00:00:00.000000000'], dtype='datetime64[ns]') + +In [467]: s_aware.to_numpy() +Out[467]: +array([Timestamp('2013-01-01 00:00:00-0500', tz='US/Eastern', freq='D'), + Timestamp('2013-01-02 00:00:00-0500', tz='US/Eastern', freq='D'), + Timestamp('2013-01-03 00:00:00-0500', tz='US/Eastern', freq='D')], + dtype=object) +``` + +通过转换时间戳数组,保留时区信息。例如,转换回 `Series` 时: + +```python +In [468]: pd.Series(s_aware.to_numpy()) +Out[468]: +0 2013-01-01 00:00:00-05:00 +1 2013-01-02 00:00:00-05:00 +2 2013-01-03 00:00:00-05:00 +dtype: datetime64[ns, US/Eastern] +``` + +如果需要 NumPy `datetime64[ns]` 数组(带已转为 UTC 的值)而不是对象数组,可以指定 `dtype` 参数: + +```python +In [469]: s_aware.to_numpy(dtype='datetime64[ns]') +Out[469]: +array(['2013-01-01T05:00:00.000000000', '2013-01-02T05:00:00.000000000', + '2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]') +``` + +::: diff --git a/Python/pandas/user_guide/visualization.md b/Python/pandas/user_guide/visualization.md new file mode 100644 index 00000000..f1a08ad1 --- /dev/null +++ b/Python/pandas/user_guide/visualization.md @@ -0,0 +1,1344 @@ +# Visualization + +We use the standard convention for referencing the matplotlib API: + +``` python +In [1]: import matplotlib.pyplot as plt + +In [2]: plt.close('all') +``` + +We provide the basics in pandas to easily create decent looking plots. +See the [ecosystem](https://pandas.pydata.org/pandas-docs/stable/ecosystem.html#ecosystem-visualization) section for visualization +libraries that go beyond the basics documented here. + +::: tip Note + +All calls to ``np.random`` are seeded with 123456. + +::: + +## Basic plotting: ``plot`` + +We will demonstrate the basics, see the [cookbook](cookbook.html#cookbook-plotting) for +some advanced strategies. + +The ``plot`` method on Series and DataFrame is just a simple wrapper around +[``plt.plot()``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot): + +``` python +In [3]: ts = pd.Series(np.random.randn(1000), + ...: index=pd.date_range('1/1/2000', periods=1000)) + ...: + +In [4]: ts = ts.cumsum() + +In [5]: ts.plot() +Out[5]: +``` + +![series_plot_basic](https://static.pypandas.cn/public/static/images/series_plot_basic.png) + +If the index consists of dates, it calls [``gcf().autofmt_xdate()``](https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.autofmt_xdate) +to try to format the x-axis nicely as per above. + +On DataFrame, [``plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot) is a convenience to plot all of the columns with labels: + +``` python +In [6]: df = pd.DataFrame(np.random.randn(1000, 4), + ...: index=ts.index, columns=list('ABCD')) + ...: + +In [7]: df = df.cumsum() + +In [8]: plt.figure(); + +In [9]: df.plot(); +``` + +![frame_plot_basic](https://static.pypandas.cn/public/static/images/frame_plot_basic.png) + +You can plot one column versus another using the *x* and *y* keywords in +[``plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot): + +``` python +In [10]: df3 = pd.DataFrame(np.random.randn(1000, 2), columns=['B', 'C']).cumsum() + +In [11]: df3['A'] = pd.Series(list(range(len(df)))) + +In [12]: df3.plot(x='A', y='B') +Out[12]: +``` + +![df_plot_xy](https://static.pypandas.cn/public/static/images/df_plot_xy.png) + +::: tip Note + +For more formatting and styling options, see +[formatting](#visualization-formatting) below. + +::: + +## Other plots + +Plotting methods allow for a handful of plot styles other than the +default line plot. These methods can be provided as the ``kind`` +keyword argument to [``plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot), and include: + +- [‘bar’](#visualization-barplot) or [‘barh’](#visualization-barplot) for bar plots +- [‘hist’](#visualization-hist) for histogram +- [‘box’](#visualization-box) for boxplot +- [‘kde’](#visualization-kde) or [‘density’](#visualization-kde) for density plots +- [‘area’](#visualization-area-plot) for area plots +- [‘scatter’](#visualization-scatter) for scatter plots +- [‘hexbin’](#visualization-hexbin) for hexagonal bin plots +- [‘pie’](#visualization-pie) for pie plots + +For example, a bar plot can be created the following way: + +``` python +In [13]: plt.figure(); + +In [14]: df.iloc[5].plot(kind='bar'); +``` + +![bar_plot_ex](https://static.pypandas.cn/public/static/images/bar_plot_ex.png) + +You can also create these other plots using the methods ``DataFrame.plot.`` instead of providing the ``kind`` keyword argument. This makes it easier to discover plot methods and the specific arguments they use: + +``` python +In [15]: df = pd.DataFrame() + +In [16]: df.plot. # noqa: E225, E999 +df.plot.area df.plot.barh df.plot.density df.plot.hist df.plot.line df.plot.scatter +df.plot.bar df.plot.box df.plot.hexbin df.plot.kde df.plot.pie +``` + +In addition to these ``kind`` s, there are the [DataFrame.hist()](#visualization-hist), +and [DataFrame.boxplot()](#visualization-box) methods, which use a separate interface. + +Finally, there are several [plotting functions](#visualization-tools) in ``pandas.plotting`` +that take a [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) or [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) as an argument. These +include: + +- [Scatter Matrix](#visualization-scatter-matrix) +- [Andrews Curves](#visualization-andrews-curves) +- [Parallel Coordinates](#visualization-parallel-coordinates) +- [Lag Plot](#visualization-lag) +- [Autocorrelation Plot](#visualization-autocorrelation) +- [Bootstrap Plot](#visualization-bootstrap) +- [RadViz](#visualization-radviz) + +Plots may also be adorned with [errorbars](#visualization-errorbars) +or [tables](#visualization-table). + +### Bar plots + +For labeled, non-time series data, you may wish to produce a bar plot: + +``` python +In [17]: plt.figure(); + +In [18]: df.iloc[5].plot.bar() +Out[18]: + +In [19]: plt.axhline(0, color='k'); +``` + +![bar_plot_ex](https://static.pypandas.cn/public/static/images/bar_plot_ex.png) + +Calling a DataFrame’s [``plot.bar()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.bar.html#pandas.DataFrame.plot.bar) method produces a multiple +bar plot: + +``` python +In [20]: df2 = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd']) + +In [21]: df2.plot.bar(); +``` + +![bar_plot_multi_ex](https://static.pypandas.cn/public/static/images/bar_plot_multi_ex.png) + +To produce a stacked bar plot, pass ``stacked=True``: + +``` python +In [22]: df2.plot.bar(stacked=True); +``` + +![bar_plot_stacked_ex](https://static.pypandas.cn/public/static/images/bar_plot_stacked_ex.png) + +To get horizontal bar plots, use the ``barh`` method: + +``` python +In [23]: df2.plot.barh(stacked=True); +``` + +![barh_plot_stacked_ex](https://static.pypandas.cn/public/static/images/barh_plot_stacked_ex.png) + +### Histograms + +Histograms can be drawn by using the [``DataFrame.plot.hist()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.hist.html#pandas.DataFrame.plot.hist) and [``Series.plot.hist()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.hist.html#pandas.Series.plot.hist) methods. + +``` python +In [24]: df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000), + ....: 'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c']) + ....: + +In [25]: plt.figure(); + +In [26]: df4.plot.hist(alpha=0.5) +Out[26]: +``` + +![hist_new](https://static.pypandas.cn/public/static/images/hist_new.png) + +A histogram can be stacked using ``stacked=True``. Bin size can be changed +using the ``bins`` keyword. + +``` python +In [27]: plt.figure(); + +In [28]: df4.plot.hist(stacked=True, bins=20) +Out[28]: +``` + +![hist_new_stacked](https://static.pypandas.cn/public/static/images/hist_new_stacked.png) + +You can pass other keywords supported by matplotlib ``hist``. For example, +horizontal and cumulative histograms can be drawn by +``orientation='horizontal'`` and ``cumulative=True``. + +``` python +In [29]: plt.figure(); + +In [30]: df4['a'].plot.hist(orientation='horizontal', cumulative=True) +Out[30]: +``` + +![hist_new_kwargs](https://static.pypandas.cn/public/static/images/hist_new_kwargs.png) + +See the [``hist``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.hist.html#matplotlib.axes.Axes.hist) method and the +[matplotlib hist documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist) for more. + +The existing interface ``DataFrame.hist`` to plot histogram still can be used. + +``` python +In [31]: plt.figure(); + +In [32]: df['A'].diff().hist() +Out[32]: +``` + +![hist_plot_ex](https://static.pypandas.cn/public/static/images/hist_plot_ex.png) + +[``DataFrame.hist()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.hist.html#pandas.DataFrame.hist) plots the histograms of the columns on multiple +subplots: + +``` python +In [33]: plt.figure() +Out[33]:
+ +In [34]: df.diff().hist(color='k', alpha=0.5, bins=50) +Out[34]: +array([[, + ], + [, + ]], + dtype=object) +``` + +![frame_hist_ex](https://static.pypandas.cn/public/static/images/frame_hist_ex.png) + +The ``by`` keyword can be specified to plot grouped histograms: + +``` python +In [35]: data = pd.Series(np.random.randn(1000)) + +In [36]: data.hist(by=np.random.randint(0, 4, 1000), figsize=(6, 4)) +Out[36]: +array([[, + ], + [, + ]], + dtype=object) +``` + +![grouped_hist](https://static.pypandas.cn/public/static/images/grouped_hist.png) + +### Box plots + +Boxplot can be drawn calling [``Series.plot.box()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.box.html#pandas.Series.plot.box) and [``DataFrame.plot.box()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.box.html#pandas.DataFrame.plot.box), +or [``DataFrame.boxplot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.boxplot.html#pandas.DataFrame.boxplot) to visualize the distribution of values within each column. + +For instance, here is a boxplot representing five trials of 10 observations of +a uniform random variable on [0,1). + +``` python +In [37]: df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E']) + +In [38]: df.plot.box() +Out[38]: +``` + +![box_plot_new](https://static.pypandas.cn/public/static/images/box_plot_new.png) + +Boxplot can be colorized by passing ``color`` keyword. You can pass a ``dict`` +whose keys are ``boxes``, ``whiskers``, ``medians`` and ``caps``. +If some keys are missing in the ``dict``, default colors are used +for the corresponding artists. Also, boxplot has ``sym`` keyword to specify fliers style. + +When you pass other type of arguments via ``color`` keyword, it will be directly +passed to matplotlib for all the ``boxes``, ``whiskers``, ``medians`` and ``caps`` +colorization. + +The colors are applied to every boxes to be drawn. If you want +more complicated colorization, you can get each drawn artists by passing +[return_type](#visualization-box-return). + +``` python +In [39]: color = {'boxes': 'DarkGreen', 'whiskers': 'DarkOrange', + ....: 'medians': 'DarkBlue', 'caps': 'Gray'} + ....: + +In [40]: df.plot.box(color=color, sym='r+') +Out[40]: +``` + +![box_new_colorize](https://static.pypandas.cn/public/static/images/box_new_colorize.png) + +Also, you can pass other keywords supported by matplotlib ``boxplot``. +For example, horizontal and custom-positioned boxplot can be drawn by +``vert=False`` and ``positions`` keywords. + +``` python +In [41]: df.plot.box(vert=False, positions=[1, 4, 5, 6, 8]) +Out[41]: +``` + +![box_new_kwargs](https://static.pypandas.cn/public/static/images/box_new_kwargs.png) + +See the [``boxplot``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.boxplot.html#matplotlib.axes.Axes.boxplot) method and the +[matplotlib boxplot documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot) for more. + +The existing interface ``DataFrame.boxplot`` to plot boxplot still can be used. + +``` python +In [42]: df = pd.DataFrame(np.random.rand(10, 5)) + +In [43]: plt.figure(); + +In [44]: bp = df.boxplot() +``` + +![box_plot_ex](https://static.pypandas.cn/public/static/images/box_plot_ex.png) + +You can create a stratified boxplot using the ``by`` keyword argument to create +groupings. For instance, + +``` python +In [45]: df = pd.DataFrame(np.random.rand(10, 2), columns=['Col1', 'Col2']) + +In [46]: df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']) + +In [47]: plt.figure(); + +In [48]: bp = df.boxplot(by='X') +``` + +![box_plot_ex2](https://static.pypandas.cn/public/static/images/box_plot_ex2.png) + +You can also pass a subset of columns to plot, as well as group by multiple +columns: + +``` python +In [49]: df = pd.DataFrame(np.random.rand(10, 3), columns=['Col1', 'Col2', 'Col3']) + +In [50]: df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']) + +In [51]: df['Y'] = pd.Series(['A', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A', 'B']) + +In [52]: plt.figure(); + +In [53]: bp = df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y']) +``` + +![box_plot_ex3](https://static.pypandas.cn/public/static/images/box_plot_ex3.png) + +::: danger Warning + +The default changed from ``'dict'`` to ``'axes'`` in version 0.19.0. + +::: + +In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both", None}``. +Faceting, created by ``DataFrame.boxplot`` with the ``by`` +keyword, will affect the output type as well: + +return_type= | Faceted | Output type +---|---|--- +None | No | axes +None | Yes | 2-D ndarray of axes +'axes' | No | axes +'axes' | Yes | Series of axes +'dict' | No | dict of artists +'dict' | Yes | Series of dicts of artists +'both' | No | namedtuple +'both' | Yes | Series of namedtuples + +``Groupby.boxplot`` always returns a ``Series`` of ``return_type``. + +``` python +In [54]: np.random.seed(1234) + +In [55]: df_box = pd.DataFrame(np.random.randn(50, 2)) + +In [56]: df_box['g'] = np.random.choice(['A', 'B'], size=50) + +In [57]: df_box.loc[df_box['g'] == 'B', 1] += 3 + +In [58]: bp = df_box.boxplot(by='g') +``` + +![boxplot_groupby](https://static.pypandas.cn/public/static/images/boxplot_groupby.png) + +The subplots above are split by the numeric columns first, then the value of +the ``g`` column. Below the subplots are first split by the value of ``g``, +then by the numeric columns. + +``` python +In [59]: bp = df_box.groupby('g').boxplot() +``` + +![groupby_boxplot_vis](https://static.pypandas.cn/public/static/images/groupby_boxplot_vis.png) + +### Area plot + +You can create area plots with [``Series.plot.area()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.area.html#pandas.Series.plot.area) and [``DataFrame.plot.area()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.area.html#pandas.DataFrame.plot.area). +Area plots are stacked by default. To produce stacked area plot, each column must be either all positive or all negative values. + +When input data contains *NaN*, it will be automatically filled by 0. If you want to drop or fill by different values, use ``dataframe.dropna()`` or ``dataframe.fillna()`` before calling *plot*. + +``` python +In [60]: df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd']) + +In [61]: df.plot.area(); +``` + +![area_plot_stacked](https://static.pypandas.cn/public/static/images/area_plot_stacked.png) + +To produce an unstacked plot, pass ``stacked=False``. Alpha value is set to 0.5 unless otherwise specified: + +``` python +In [62]: df.plot.area(stacked=False); +``` + +![area_plot_unstacked](https://static.pypandas.cn/public/static/images/area_plot_unstacked.png) + +### Scatter plot + +Scatter plot can be drawn by using the [``DataFrame.plot.scatter()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.scatter.html#pandas.DataFrame.plot.scatter) method. +Scatter plot requires numeric columns for the x and y axes. +These can be specified by the ``x`` and ``y`` keywords. + +``` python +In [63]: df = pd.DataFrame(np.random.rand(50, 4), columns=['a', 'b', 'c', 'd']) + +In [64]: df.plot.scatter(x='a', y='b'); +``` + +![scatter_plot](https://static.pypandas.cn/public/static/images/scatter_plot.png) + +To plot multiple column groups in a single axes, repeat ``plot`` method specifying target ``ax``. +It is recommended to specify ``color`` and ``label`` keywords to distinguish each groups. + +``` python +In [65]: ax = df.plot.scatter(x='a', y='b', color='DarkBlue', label='Group 1'); + +In [66]: df.plot.scatter(x='c', y='d', color='DarkGreen', label='Group 2', ax=ax); +``` + +![scatter_plot_repeated](https://static.pypandas.cn/public/static/images/scatter_plot_repeated.png) + +The keyword ``c`` may be given as the name of a column to provide colors for +each point: + +``` python +In [67]: df.plot.scatter(x='a', y='b', c='c', s=50); +``` + +![scatter_plot_colored](https://static.pypandas.cn/public/static/images/scatter_plot_colored.png) + +You can pass other keywords supported by matplotlib +[``scatter``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter). The example below shows a +bubble chart using a column of the ``DataFrame`` as the bubble size. + +``` python +In [68]: df.plot.scatter(x='a', y='b', s=df['c'] * 200); +``` + +![scatter_plot_bubble](https://static.pypandas.cn/public/static/images/scatter_plot_bubble.png) + +See the [``scatter``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter) method and the +[matplotlib scatter documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter) for more. + +### Hexagonal bin plot + +You can create hexagonal bin plots with [``DataFrame.plot.hexbin()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.hexbin.html#pandas.DataFrame.plot.hexbin). +Hexbin plots can be a useful alternative to scatter plots if your data are +too dense to plot each point individually. + +``` python +In [69]: df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b']) + +In [70]: df['b'] = df['b'] + np.arange(1000) + +In [71]: df.plot.hexbin(x='a', y='b', gridsize=25) +Out[71]: +``` + +![hexbin_plot](https://static.pypandas.cn/public/static/images/hexbin_plot.png) + +A useful keyword argument is ``gridsize``; it controls the number of hexagons +in the x-direction, and defaults to 100. A larger ``gridsize`` means more, smaller +bins. + +By default, a histogram of the counts around each ``(x, y)`` point is computed. +You can specify alternative aggregations by passing values to the ``C`` and +``reduce_C_function`` arguments. ``C`` specifies the value at each ``(x, y)`` point +and ``reduce_C_function`` is a function of one argument that reduces all the +values in a bin to a single number (e.g. ``mean``, ``max``, ``sum``, ``std``). In this +example the positions are given by columns ``a`` and ``b``, while the value is +given by column ``z``. The bins are aggregated with NumPy’s ``max`` function. + +``` python +In [72]: df = pd.DataFrame(np.random.randn(1000, 2), columns=['a', 'b']) + +In [73]: df['b'] = df['b'] = df['b'] + np.arange(1000) + +In [74]: df['z'] = np.random.uniform(0, 3, 1000) + +In [75]: df.plot.hexbin(x='a', y='b', C='z', reduce_C_function=np.max, gridsize=25) +Out[75]: +``` + +![hexbin_plot_agg](https://static.pypandas.cn/public/static/images/hexbin_plot_agg.png) + +See the [``hexbin``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.hexbin.html#matplotlib.axes.Axes.hexbin) method and the +[matplotlib hexbin documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hexbin) for more. + +### Pie plot + +You can create a pie plot with [``DataFrame.plot.pie()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.pie.html#pandas.DataFrame.plot.pie) or [``Series.plot.pie()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.pie.html#pandas.Series.plot.pie). +If your data includes any ``NaN``, they will be automatically filled with 0. +A ``ValueError`` will be raised if there are any negative values in your data. + +``` python +In [76]: series = pd.Series(3 * np.random.rand(4), + ....: index=['a', 'b', 'c', 'd'], name='series') + ....: + +In [77]: series.plot.pie(figsize=(6, 6)) +Out[77]: +``` + +![series_pie_plot](https://static.pypandas.cn/public/static/images/series_pie_plot.png) + +For pie plots it’s best to use square figures, i.e. a figure aspect ratio 1. +You can create the figure with equal width and height, or force the aspect ratio +to be equal after plotting by calling ``ax.set_aspect('equal')`` on the returned +``axes`` object. + +Note that pie plot with [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) requires that you either specify a +target column by the ``y`` argument or ``subplots=True``. When ``y`` is +specified, pie plot of selected column will be drawn. If ``subplots=True`` is +specified, pie plots for each column are drawn as subplots. A legend will be +drawn in each pie plots by default; specify ``legend=False`` to hide it. + +``` python +In [78]: df = pd.DataFrame(3 * np.random.rand(4, 2), + ....: index=['a', 'b', 'c', 'd'], columns=['x', 'y']) + ....: + +In [79]: df.plot.pie(subplots=True, figsize=(8, 4)) +Out[79]: +array([, + ], + dtype=object) +``` + +![df_pie_plot](https://static.pypandas.cn/public/static/images/df_pie_plot.png) + +You can use the ``labels`` and ``colors`` keywords to specify the labels and colors of each wedge. + +::: danger Warning + +Most pandas plots use the ``label`` and ``color`` arguments (note the lack of “s” on those). +To be consistent with [``matplotlib.pyplot.pie()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pie.html#matplotlib.pyplot.pie) you must use ``labels`` and ``colors``. + +::: + +If you want to hide wedge labels, specify ``labels=None``. +If ``fontsize`` is specified, the value will be applied to wedge labels. +Also, other keywords supported by [``matplotlib.pyplot.pie()``](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.pie.html#matplotlib.pyplot.pie) can be used. + +``` python +In [80]: series.plot.pie(labels=['AA', 'BB', 'CC', 'DD'], colors=['r', 'g', 'b', 'c'], + ....: autopct='%.2f', fontsize=20, figsize=(6, 6)) + ....: +Out[80]: +``` + +![series_pie_plot_options](https://static.pypandas.cn/public/static/images/series_pie_plot_options.png) + +If you pass values whose sum total is less than 1.0, matplotlib draws a semicircle. + +``` python +In [81]: series = pd.Series([0.1] * 4, index=['a', 'b', 'c', 'd'], name='series2') + +In [82]: series.plot.pie(figsize=(6, 6)) +Out[82]: +``` + +![series_pie_plot_semi](https://static.pypandas.cn/public/static/images/series_pie_plot_semi.png) + +See the [matplotlib pie documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.pie) for more. + +## Plotting with missing data + +Pandas tries to be pragmatic about plotting ``DataFrames`` or ``Series`` +that contain missing data. Missing values are dropped, left out, or filled +depending on the plot type. + +Plot Type | NaN Handling +---|--- +Line | Leave gaps at NaNs +Line (stacked) | Fill 0’s +Bar | Fill 0’s +Scatter | Drop NaNs +Histogram | Drop NaNs (column-wise) +Box | Drop NaNs (column-wise) +Area | Fill 0’s +KDE | Drop NaNs (column-wise) +Hexbin | Drop NaNs +Pie | Fill 0’s + +If any of these defaults are not what you want, or if you want to be +explicit about how missing values are handled, consider using +[``fillna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna) or [``dropna()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna) +before plotting. + +## Plotting Tools + +These functions can be imported from ``pandas.plotting`` +and take a [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) or [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) as an argument. + +### Scatter matrix plot + +You can create a scatter plot matrix using the +``scatter_matrix`` method in ``pandas.plotting``: + +``` python +In [83]: from pandas.plotting import scatter_matrix + +In [84]: df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd']) + +In [85]: scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde') +Out[85]: +array([[, + , + , + ], + [, + , + , + ], + [, + , + , + ], + [, + , + , + ]], + dtype=object) +``` + +![scatter_matrix_kde](https://static.pypandas.cn/public/static/images/scatter_matrix_kde.png) + +### Density plot + +You can create density plots using the [``Series.plot.kde()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.kde.html#pandas.Series.plot.kde) and [``DataFrame.plot.kde()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.kde.html#pandas.DataFrame.plot.kde) methods. + +``` python +In [86]: ser = pd.Series(np.random.randn(1000)) + +In [87]: ser.plot.kde() +Out[87]: +``` + +![kde_plot](https://static.pypandas.cn/public/static/images/kde_plot.png) + +### Andrews curves + +Andrews curves allow one to plot multivariate data as a large number +of curves that are created using the attributes of samples as coefficients +for Fourier series, see the [Wikipedia entry](https://en.wikipedia.org/wiki/Andrews_plot) +for more information. By coloring these curves differently for each class +it is possible to visualize data clustering. Curves belonging to samples +of the same class will usually be closer together and form larger structures. + +**Note**: The “Iris” dataset is available [here](https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv). + +``` python +In [88]: from pandas.plotting import andrews_curves + +In [89]: data = pd.read_csv('data/iris.data') + +In [90]: plt.figure() +Out[90]:
+ +In [91]: andrews_curves(data, 'Name') +Out[91]: +``` + +![andrews_curves](https://static.pypandas.cn/public/static/images/andrews_curves.png) + +### Parallel coordinates + +Parallel coordinates is a plotting technique for plotting multivariate data, +see the [Wikipedia entry](https://en.wikipedia.org/wiki/Parallel_coordinates) +for an introduction. +Parallel coordinates allows one to see clusters in data and to estimate other statistics visually. +Using parallel coordinates points are represented as connected line segments. +Each vertical line represents one attribute. One set of connected line segments +represents one data point. Points that tend to cluster will appear closer together. + +``` python +In [92]: from pandas.plotting import parallel_coordinates + +In [93]: data = pd.read_csv('data/iris.data') + +In [94]: plt.figure() +Out[94]:
+ +In [95]: parallel_coordinates(data, 'Name') +Out[95]: +``` + +![parallel_coordinates](https://static.pypandas.cn/public/static/images/parallel_coordinates.png) + +### Lag plot + +Lag plots are used to check if a data set or time series is random. Random +data should not exhibit any structure in the lag plot. Non-random structure +implies that the underlying data are not random. The ``lag`` argument may +be passed, and when ``lag=1`` the plot is essentially ``data[:-1]`` vs. +``data[1:]``. + +``` python +In [96]: from pandas.plotting import lag_plot + +In [97]: plt.figure() +Out[97]:
+ +In [98]: spacing = np.linspace(-99 * np.pi, 99 * np.pi, num=1000) + +In [99]: data = pd.Series(0.1 * np.random.rand(1000) + 0.9 * np.sin(spacing)) + +In [100]: lag_plot(data) +Out[100]: +``` + +![lag_plot](https://static.pypandas.cn/public/static/images/lag_plot.png) + +### Autocorrelation plot + +Autocorrelation plots are often used for checking randomness in time series. +This is done by computing autocorrelations for data values at varying time lags. +If time series is random, such autocorrelations should be near zero for any and +all time-lag separations. If time series is non-random then one or more of the +autocorrelations will be significantly non-zero. The horizontal lines displayed +in the plot correspond to 95% and 99% confidence bands. The dashed line is 99% +confidence band. See the +[Wikipedia entry](https://en.wikipedia.org/wiki/Correlogram) for more about +autocorrelation plots. + +``` python +In [101]: from pandas.plotting import autocorrelation_plot + +In [102]: plt.figure() +Out[102]:
+ +In [103]: spacing = np.linspace(-9 * np.pi, 9 * np.pi, num=1000) + +In [104]: data = pd.Series(0.7 * np.random.rand(1000) + 0.3 * np.sin(spacing)) + +In [105]: autocorrelation_plot(data) +Out[105]: +``` + +![autocorrelation_plot](https://static.pypandas.cn/public/static/images/autocorrelation_plot.png) + +### Bootstrap plot + +Bootstrap plots are used to visually assess the uncertainty of a statistic, such +as mean, median, midrange, etc. A random subset of a specified size is selected +from a data set, the statistic in question is computed for this subset and the +process is repeated a specified number of times. Resulting plots and histograms +are what constitutes the bootstrap plot. + +``` python +In [106]: from pandas.plotting import bootstrap_plot + +In [107]: data = pd.Series(np.random.rand(1000)) + +In [108]: bootstrap_plot(data, size=50, samples=500, color='grey') +Out[108]:
+``` + +![bootstrap_plot](https://static.pypandas.cn/public/static/images/bootstrap_plot.png) + +### RadViz + +RadViz is a way of visualizing multi-variate data. It is based on a simple +spring tension minimization algorithm. Basically you set up a bunch of points in +a plane. In our case they are equally spaced on a unit circle. Each point +represents a single attribute. You then pretend that each sample in the data set +is attached to each of these points by a spring, the stiffness of which is +proportional to the numerical value of that attribute (they are normalized to +unit interval). The point in the plane, where our sample settles to (where the +forces acting on our sample are at an equilibrium) is where a dot representing +our sample will be drawn. Depending on which class that sample belongs it will +be colored differently. +See the R package [Radviz](https://cran.r-project.org/package=Radviz/) +for more information. + +**Note**: The “Iris” dataset is available [here](https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv). + +``` python +In [109]: from pandas.plotting import radviz + +In [110]: data = pd.read_csv('data/iris.data') + +In [111]: plt.figure() +Out[111]:
+ +In [112]: radviz(data, 'Name') +Out[112]: +``` + +![radviz](https://static.pypandas.cn/public/static/images/radviz.png) + +## Plot Formatting + +### Setting the plot style + +From version 1.5 and up, matplotlib offers a range of pre-configured plotting styles. Setting the +style can be used to easily give plots the general look that you want. +Setting the style is as easy as calling ``matplotlib.style.use(my_plot_style)`` before +creating your plot. For example you could write ``matplotlib.style.use('ggplot')`` for ggplot-style +plots. + +You can see the various available style names at ``matplotlib.style.available`` and it’s very +easy to try them out. + +### General plot style arguments + +Most plotting methods have a set of keyword arguments that control the +layout and formatting of the returned plot: + +``` python +In [113]: plt.figure(); + +In [114]: ts.plot(style='k--', label='Series'); +``` + +![series_plot_basic2](https://static.pypandas.cn/public/static/images/series_plot_basic2.png) + +For each kind of plot (e.g. *line*, *bar*, *scatter*) any additional arguments +keywords are passed along to the corresponding matplotlib function +([``ax.plot()``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.plot.html#matplotlib.axes.Axes.plot), +[``ax.bar()``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.bar.html#matplotlib.axes.Axes.bar), +[``ax.scatter()``](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter)). These can be used +to control additional styling, beyond what pandas provides. + +### Controlling the legend + +You may set the ``legend`` argument to ``False`` to hide the legend, which is +shown by default. + +``` python +In [115]: df = pd.DataFrame(np.random.randn(1000, 4), + .....: index=ts.index, columns=list('ABCD')) + .....: + +In [116]: df = df.cumsum() + +In [117]: df.plot(legend=False) +Out[117]: +``` + +![frame_plot_basic_noleg](https://static.pypandas.cn/public/static/images/frame_plot_basic_noleg.png) + +### Scales + +You may pass ``logy`` to get a log-scale Y axis. + +``` python +In [118]: ts = pd.Series(np.random.randn(1000), + .....: index=pd.date_range('1/1/2000', periods=1000)) + .....: + +In [119]: ts = np.exp(ts.cumsum()) + +In [120]: ts.plot(logy=True) +Out[120]: +``` + +![series_plot_logy](https://static.pypandas.cn/public/static/images/series_plot_logy.png) + +See also the ``logx`` and ``loglog`` keyword arguments. + +### Plotting on a secondary y-axis + +To plot data on a secondary y-axis, use the ``secondary_y`` keyword: + +``` python +In [121]: df.A.plot() +Out[121]: + +In [122]: df.B.plot(secondary_y=True, style='g') +Out[122]: +``` + +![series_plot_secondary_y](https://static.pypandas.cn/public/static/images/series_plot_secondary_y.png) + +To plot some columns in a ``DataFrame``, give the column names to the ``secondary_y`` +keyword: + +``` python +In [123]: plt.figure() +Out[123]:
+ +In [124]: ax = df.plot(secondary_y=['A', 'B']) + +In [125]: ax.set_ylabel('CD scale') +Out[125]: Text(0, 0.5, 'CD scale') + +In [126]: ax.right_ax.set_ylabel('AB scale') +Out[126]: Text(0, 0.5, 'AB scale') +``` + +![frame_plot_secondary_y](https://static.pypandas.cn/public/static/images/frame_plot_secondary_y.png) + +Note that the columns plotted on the secondary y-axis is automatically marked +with “(right)” in the legend. To turn off the automatic marking, use the +``mark_right=False`` keyword: + +``` python +In [127]: plt.figure() +Out[127]:
+ +In [128]: df.plot(secondary_y=['A', 'B'], mark_right=False) +Out[128]: +``` + +![frame_plot_secondary_y_no_right](https://static.pypandas.cn/public/static/images/frame_plot_secondary_y_no_right.png) + +### Suppressing tick resolution adjustment + +pandas includes automatic tick resolution adjustment for regular frequency +time-series data. For limited cases where pandas cannot infer the frequency +information (e.g., in an externally created ``twinx``), you can choose to +suppress this behavior for alignment purposes. + +Here is the default behavior, notice how the x-axis tick labeling is performed: + +``` python +In [129]: plt.figure() +Out[129]:
+ +In [130]: df.A.plot() +Out[130]: +``` + +![ser_plot_suppress](https://static.pypandas.cn/public/static/images/ser_plot_suppress.png) + +Using the ``x_compat`` parameter, you can suppress this behavior: + +``` python +In [131]: plt.figure() +Out[131]:
+ +In [132]: df.A.plot(x_compat=True) +Out[132]: +``` + +![ser_plot_suppress_parm](https://static.pypandas.cn/public/static/images/ser_plot_suppress_parm.png) + +If you have more than one plot that needs to be suppressed, the ``use`` method +in ``pandas.plotting.plot_params`` can be used in a *with statement*: + +``` python +In [133]: plt.figure() +Out[133]:
+ +In [134]: with pd.plotting.plot_params.use('x_compat', True): + .....: df.A.plot(color='r') + .....: df.B.plot(color='g') + .....: df.C.plot(color='b') + .....: +``` + +![ser_plot_suppress_context](https://static.pypandas.cn/public/static/images/ser_plot_suppress_context.png) + +### Automatic date tick adjustment + +*New in version 0.20.0.* + +``TimedeltaIndex`` now uses the native matplotlib +tick locator methods, it is useful to call the automatic +date tick adjustment from matplotlib for figures whose ticklabels overlap. + +See the ``autofmt_xdate`` method and the +[matplotlib documentation](http://matplotlib.org/users/recipes.html#fixing-common-date-annoyances) for more. + +### Subplots + +Each ``Series`` in a ``DataFrame`` can be plotted on a different axis +with the ``subplots`` keyword: + +``` python +In [135]: df.plot(subplots=True, figsize=(6, 6)); +``` + +![frame_plot_subplots](https://static.pypandas.cn/public/static/images/frame_plot_subplots.png) + +### Using layout and targeting multiple axes + +The layout of subplots can be specified by the ``layout`` keyword. It can accept +``(rows, columns)``. The ``layout`` keyword can be used in +``hist`` and ``boxplot`` also. If the input is invalid, a ``ValueError`` will be raised. + +The number of axes which can be contained by rows x columns specified by ``layout`` must be +larger than the number of required subplots. If layout can contain more axes than required, +blank axes are not drawn. Similar to a NumPy array’s ``reshape`` method, you +can use ``-1`` for one dimension to automatically calculate the number of rows +or columns needed, given the other. + +``` python +In [136]: df.plot(subplots=True, layout=(2, 3), figsize=(6, 6), sharex=False); +``` + +![frame_plot_subplots_layout](https://static.pypandas.cn/public/static/images/frame_plot_subplots_layout.png) + +The above example is identical to using: + +``` python +In [137]: df.plot(subplots=True, layout=(2, -1), figsize=(6, 6), sharex=False); +``` + +The required number of columns (3) is inferred from the number of series to plot +and the given number of rows (2). + +You can pass multiple axes created beforehand as list-like via ``ax`` keyword. +This allows more complicated layouts. +The passed axes must be the same number as the subplots being drawn. + +When multiple axes are passed via the ``ax`` keyword, ``layout``, ``sharex`` and ``sharey`` keywords +don’t affect to the output. You should explicitly pass ``sharex=False`` and ``sharey=False``, +otherwise you will see a warning. + +``` python +In [138]: fig, axes = plt.subplots(4, 4, figsize=(6, 6)) + +In [139]: plt.subplots_adjust(wspace=0.5, hspace=0.5) + +In [140]: target1 = [axes[0][0], axes[1][1], axes[2][2], axes[3][3]] + +In [141]: target2 = [axes[3][0], axes[2][1], axes[1][2], axes[0][3]] + +In [142]: df.plot(subplots=True, ax=target1, legend=False, sharex=False, sharey=False); + +In [143]: (-df).plot(subplots=True, ax=target2, legend=False, + .....: sharex=False, sharey=False); + .....: +``` + +![frame_plot_subplots_multi_ax](https://static.pypandas.cn/public/static/images/frame_plot_subplots_multi_ax.png) + +Another option is passing an ``ax`` argument to [``Series.plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.html#pandas.Series.plot) to plot on a particular axis: + +``` python +In [144]: fig, axes = plt.subplots(nrows=2, ncols=2) + +In [145]: df['A'].plot(ax=axes[0, 0]); + +In [146]: axes[0, 0].set_title('A'); + +In [147]: df['B'].plot(ax=axes[0, 1]); + +In [148]: axes[0, 1].set_title('B'); + +In [149]: df['C'].plot(ax=axes[1, 0]); + +In [150]: axes[1, 0].set_title('C'); + +In [151]: df['D'].plot(ax=axes[1, 1]); + +In [152]: axes[1, 1].set_title('D'); +``` + +![series_plot_multi](https://static.pypandas.cn/public/static/images/series_plot_multi.png) + +### Plotting with error bars + +Plotting with error bars is supported in [``DataFrame.plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot) and [``Series.plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.html#pandas.Series.plot). + +Horizontal and vertical error bars can be supplied to the ``xerr`` and ``yerr`` keyword arguments to [``plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot). The error values can be specified using a variety of formats: + +- As a [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) or ``dict`` of errors with column names matching the ``columns`` attribute of the plotting [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) or matching the ``name`` attribute of the [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series). +- As a ``str`` indicating which of the columns of plotting [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) contain the error values. +- As raw values (``list``, ``tuple``, or ``np.ndarray``). Must be the same length as the plotting [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame)/[``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series). + +Asymmetrical error bars are also supported, however raw error values must be provided in this case. For a ``M`` length [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series), a ``Mx2`` array should be provided indicating lower and upper (or left and right) errors. For a ``MxN`` [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame), asymmetrical errors should be in a ``Mx2xN`` array. + +Here is an example of one way to easily plot group means with standard deviations from the raw data. + +``` python +# Generate the data +In [153]: ix3 = pd.MultiIndex.from_arrays([ + .....: ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], + .....: ['foo', 'foo', 'bar', 'bar', 'foo', 'foo', 'bar', 'bar']], + .....: names=['letter', 'word']) + .....: + +In [154]: df3 = pd.DataFrame({'data1': [3, 2, 4, 3, 2, 4, 3, 2], + .....: 'data2': [6, 5, 7, 5, 4, 5, 6, 5]}, index=ix3) + .....: + +# Group by index labels and take the means and standard deviations +# for each group +In [155]: gp3 = df3.groupby(level=('letter', 'word')) + +In [156]: means = gp3.mean() + +In [157]: errors = gp3.std() + +In [158]: means +Out[158]: + data1 data2 +letter word +a bar 3.5 6.0 + foo 2.5 5.5 +b bar 2.5 5.5 + foo 3.0 4.5 + +In [159]: errors +Out[159]: + data1 data2 +letter word +a bar 0.707107 1.414214 + foo 0.707107 0.707107 +b bar 0.707107 0.707107 + foo 1.414214 0.707107 + +# Plot +In [160]: fig, ax = plt.subplots() + +In [161]: means.plot.bar(yerr=errors, ax=ax, capsize=4) +Out[161]: +``` + +![errorbar_example](https://static.pypandas.cn/public/static/images/errorbar_example.png) + +### Plotting tables + +Plotting with matplotlib table is now supported in [``DataFrame.plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html#pandas.DataFrame.plot) and [``Series.plot()``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.html#pandas.Series.plot) with a ``table`` keyword. The ``table`` keyword can accept ``bool``, [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) or [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series). The simple way to draw a table is to specify ``table=True``. Data will be transposed to meet matplotlib’s default layout. + +``` python +In [162]: fig, ax = plt.subplots(1, 1) + +In [163]: df = pd.DataFrame(np.random.rand(5, 3), columns=['a', 'b', 'c']) + +In [164]: ax.get_xaxis().set_visible(False) # Hide Ticks + +In [165]: df.plot(table=True, ax=ax) +Out[165]: +``` + +![line_plot_table_true](https://static.pypandas.cn/public/static/images/line_plot_table_true.png) + +Also, you can pass a different [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) or [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series) to the +``table`` keyword. The data will be drawn as displayed in print method +(not transposed automatically). If required, it should be transposed manually +as seen in the example below. + +``` python +In [166]: fig, ax = plt.subplots(1, 1) + +In [167]: ax.get_xaxis().set_visible(False) # Hide Ticks + +In [168]: df.plot(table=np.round(df.T, 2), ax=ax) +Out[168]: +``` + +![line_plot_table_data](https://static.pypandas.cn/public/static/images/line_plot_table_data.png) + +There also exists a helper function ``pandas.plotting.table``, which creates a +table from [``DataFrame``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame) or [``Series``](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series), and adds it to an +``matplotlib.Axes`` instance. This function can accept keywords which the +matplotlib [table](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.table) has. + +``` python +In [169]: from pandas.plotting import table + +In [170]: fig, ax = plt.subplots(1, 1) + +In [171]: table(ax, np.round(df.describe(), 2), + .....: loc='upper right', colWidths=[0.2, 0.2, 0.2]) + .....: +Out[171]: + +In [172]: df.plot(ax=ax, ylim=(0, 2), legend=None) +Out[172]: +``` + +![line_plot_table_describe](https://static.pypandas.cn/public/static/images/line_plot_table_describe.png) + +**Note**: You can get table instances on the axes using ``axes.tables`` property for further decorations. See the [matplotlib table documentation](http://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes.table) for more. + +### Colormaps + +A potential issue when plotting a large number of columns is that it can be +difficult to distinguish some series due to repetition in the default colors. To +remedy this, ``DataFrame`` plotting supports the use of the ``colormap`` argument, +which accepts either a Matplotlib [colormap](http://matplotlib.org/api/cm_api.html) +or a string that is a name of a colormap registered with Matplotlib. A +visualization of the default matplotlib colormaps is available [here](https://matplotlib.org/examples/color/colormaps_reference.html). + +As matplotlib does not directly support colormaps for line-based plots, the +colors are selected based on an even spacing determined by the number of columns +in the ``DataFrame``. There is no consideration made for background color, so some +colormaps will produce lines that are not easily visible. + +To use the cubehelix colormap, we can pass ``colormap='cubehelix'``. + +``` python +In [173]: df = pd.DataFrame(np.random.randn(1000, 10), index=ts.index) + +In [174]: df = df.cumsum() + +In [175]: plt.figure() +Out[175]:
+ +In [176]: df.plot(colormap='cubehelix') +Out[176]: +``` + +![cubehelix](https://static.pypandas.cn/public/static/images/cubehelix.png) + +Alternatively, we can pass the colormap itself: + +``` python +In [177]: from matplotlib import cm + +In [178]: plt.figure() +Out[178]:
+ +In [179]: df.plot(colormap=cm.cubehelix) +Out[179]: +``` + +![cubehelix_cm](https://static.pypandas.cn/public/static/images/cubehelix_cm.png) + +Colormaps can also be used other plot types, like bar charts: + +``` python +In [180]: dd = pd.DataFrame(np.random.randn(10, 10)).applymap(abs) + +In [181]: dd = dd.cumsum() + +In [182]: plt.figure() +Out[182]:
+ +In [183]: dd.plot.bar(colormap='Greens') +Out[183]: +``` + +![greens](https://static.pypandas.cn/public/static/images/greens.png) + +Parallel coordinates charts: + +``` python +In [184]: plt.figure() +Out[184]:
+ +In [185]: parallel_coordinates(data, 'Name', colormap='gist_rainbow') +Out[185]: +``` + +![parallel_gist_rainbow](https://static.pypandas.cn/public/static/images/parallel_gist_rainbow.png) + +Andrews curves charts: + +``` python +In [186]: plt.figure() +Out[186]:
+ +In [187]: andrews_curves(data, 'Name', colormap='winter') +Out[187]: +``` + +![andrews_curve_winter](https://static.pypandas.cn/public/static/images/andrews_curve_winter.png) + +## Plotting directly with matplotlib + +In some situations it may still be preferable or necessary to prepare plots +directly with matplotlib, for instance when a certain type of plot or +customization is not (yet) supported by pandas. ``Series`` and ``DataFrame`` +objects behave like arrays and can therefore be passed directly to +matplotlib functions without explicit casts. + +pandas also automatically registers formatters and locators that recognize date +indices, thereby extending date and time support to practically all plot types +available in matplotlib. Although this formatting does not provide the same +level of refinement you would get when plotting via pandas, it can be faster +when plotting a large number of points. + +``` python +In [188]: price = pd.Series(np.random.randn(150).cumsum(), + .....: index=pd.date_range('2000-1-1', periods=150, freq='B')) + .....: + +In [189]: ma = price.rolling(20).mean() + +In [190]: mstd = price.rolling(20).std() + +In [191]: plt.figure() +Out[191]:
+ +In [192]: plt.plot(price.index, price, 'k') +Out[192]: [] + +In [193]: plt.plot(ma.index, ma, 'b') +Out[193]: [] + +In [194]: plt.fill_between(mstd.index, ma - 2 * mstd, ma + 2 * mstd, + .....: color='b', alpha=0.2) + .....: +Out[194]: +``` + +![bollinger](https://static.pypandas.cn/public/static/images/bollinger.png) + +## Trellis plotting interface + +::: danger Warning + +The ``rplot`` trellis plotting interface has been **removed**. Please use +external packages like [seaborn](https://github.com/mwaskom/seaborn) for +similar but more refined functionality and refer to our 0.18.1 documentation +[here](http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html) +for how to convert to using it. + +::: diff --git a/Python/readme.md b/Python/readme.md index fa4d3c6a..79b20c76 100644 --- a/Python/readme.md +++ b/Python/readme.md @@ -1,6 +1,27 @@ +## python3 + +1. 提供了基础语法 + ## numpy + +1. 提供了numpyarray多维数组对象。 +2. 提供了在多维数组对象上的基本运算和基础操作(加减乘除,数组变化) + ## scipy > 内容涉及大量的数学知识,不应该代码驱动学习,应该在学习或者使用相关的数学知识的时候,进行代码实现。 +1. 提供了基于numpyarray的科学计算。包括傅里叶变换、科学函数等。 + +## matplotlib + +1. 提供了基于numpyarray的数学图像可视化。 +2. 是matlab的数学图像替代。 + + +## pandas + +1. 提供了基于Numpyarray的数据分析。包括排序、添加删除、函数处理等。 +2. 是excel的Python替代。 +