Add New Notes

This commit is contained in:
geekard
2012-08-08 14:26:04 +08:00
commit 5ef7c20052
2374 changed files with 276187 additions and 0 deletions

View File

@@ -0,0 +1,302 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-20T11:40:50+08:00
====== CSRF攻击方式 ======
Created Thursday 20 October 2011
http://hi.baidu.com/studyit1314/blog/item/5304ddb6f0f2051019d81f97.html
===== CSRF 的基本概念特性 =====
跨站请求伪造CSRF的是 Web 应用程序一种常见的漏洞,其攻击特性是危害性大但非常隐蔽,尤其是在大量 Web 2.0 技术的应用的背景下CSRF 攻击完全可以在用户法毫无察觉的情况下发起攻击。国际上并未对 CSRF 攻击做出一个明确的定义,同时,攻击的发起手段方式繁多,下文会做详细介绍。可以解释的是发起的目标都是通过伪造一个用户请求,该请求不是用户想发出去的请求,而对服务器或服务来说这个请求是完全合法的一个请求,但是却完成了一个攻击者所期望的操作,比如添加一个用户到管理者的群组中,或将一个用户的现金转到另外的一个帐户中。通常开发人员对 CSRF 攻击的理解是有误区的,分为以下几方面,第一是如何攻击的,第二是危害到底在那里,第三是如何防范就才是一个完整的解决方案。本文就是要对这些基本的问题做一个详细的阐述,并且给出检测的有效方法。
===== CSRF 的危害实例 =====
大部分网站往往对脚本注入有严格的防范,但是对 CSRF 的防范做的就差很多。
实例 1假设某网站高级会员会享有某些特殊权限。而当一个普通用户付款完毕就可以让管理员将自己升级为高级会员。假设管理员将一个普通用户升级为高级会员的请求是
http://www.mysite.com/promoteUser.jsp?username=aaaaa
我们再假设普通用户有在网站某个论坛**发表话题的权限**,这样一个普通用户可以将这个 URL 发表在某些话题之中,然后用我们称为社会工程学的方法引诱网站管理员点击这个链接。当管理员点击这个链接时,这个请求就会从浏览器发送到后台服务器,从而完成身份的升级。当然,在实际攻击过程中,有很多手段使得让管理员**不点击**也能发送这样的请求,比如将这个 URL 设置为某个__图片的源__。
实例 2以一个二手跳蚤市场为例子比如某商业交易网站注册用户 Hacker01 和 Customer01。Hacker01 在上交易频道摆上 1 辆 9 成新的宝马,投标价格是 20000$,另外再摆上另外一量废旧车型标价 1000$,然而网站是允许加载图片显示车的状况的。所以宝马车主可以上载一个自己的图片,废旧车主也可以上载一个自己的图片。
宝马图片 url:http://myrepository/BMW.jpg car id 100000001
废旧车图片 url:http://myrepository/oldCar.jpg car id 100000002
而该拍卖网站是通过投标决定车的最终价格,假设是竞买者参加竞买宝马的时候点击购买按钮浏览器是通过发一个** GET 请求**到 http://e-bussiness-car/bid?value=20000$&carid=100000001 来提交自己的竞标价格。那么 Hacker01 则可以把废旧车__图片修改为__ http://e-bussiness-car/bid?value=20000$&carid=100000001%EF%BC%88或者其他的 value 参数的数值)。
这时候的情况是Customer01 访问宝马能看见正确的图片,并且没有任何问题。而访问废旧车发现图片是一个**无法看到的图片**,但当 Customer01 浏览旧车图片的时候浏览器已经向宝马车发送了一个竞标请求。这样在用户的控制之外发出了一个合法的请求并且被服务器接收。Hack01 可以在 Customer01 不知觉的情况下将自己的宝马车卖出。通过此例可以发现 CSRF 有着非常严重的危害性。
===== CSRF 攻击的基本路径及方法 =====
HTTP 协议中定义了GET/POST/PUT/DELETE 四种基本操作方法如图 1 标记-1 所示 GET/POST 是所有网站或服务器必须使用的操作方法,而 PUT/DELETE 功能强大,但是在以往的应用中并没有被广泛的使用,直到 Web 2.0 的出现,**Ajax** 的引用导致 PUT/DELETE 在 REST 框架下被发扬光大,大量使用,也使 CSRF 的攻击手段中多了一种攻击方式。本文以常用的 GET/POST 为实例,这两者是被浏览器用作与服务起进行**数据交互**的主要手段,并包含 Ajax 框架下的攻击介绍。
CSRF 攻击的方法多种多样,而对这些攻击方法的认识将更有助于去检查或在产品设计中加入对 CSRF 攻击的防范使整个产品的开发的代价更小。按照攻击的方式来看,分为**显式攻击和隐式攻击**。显示攻击对用户来说是可以察觉的,例如通过各种方法向受害者发送链接,而隐式攻击则很难察觉,往往是访问了一个有漏洞的页面,或者一个恶意的页面,使用频率更多的则是隐性攻击,因为其更具备可操作性。下边介绍到的攻击方法都可以采取隐式攻击方法。要注意的是,用户网站是否存在**脚本注入**的漏洞,并不影响 CSRF 攻击,通过使用第 3 方存在安全隐患的网站一样可以完成 CSRF 攻击。
对图 1 的基本解释,标记-1 是合法用户对用户网站的访问,执行合法有效的操作;标记-2 是通过邮件系统对用户发动攻击;标记 3 是利用 Web 的网站,包括用户的操作网站,普通网站,以及黑客网站,标记-4、5、6 指的是有害用户(标记-3利用的 3 种方式来攻击受害用户。
图 1. CSRF 攻击示意图
===== 对 GET 请求的 CSRF 漏洞的攻击方式 =====
GET 请求使用的频率最高,隐式的 GET 请求,例如 <img> <script><frame><iframe>在页面中引入上述页面元素并且设置__ SRC __属性就能在用户未知的情况下发出一个 GET 请求到想去攻击的网站。
以 IMG 标签为例,攻击者可以通过在图 1 中的标记-5、标记-6、标记-2、标记-4 的途径发起攻击。这种攻击的特征是无明显提示,但是已经发出一个具有完整合法的用户请求。
<img src=http://UserSite/admin/deletepage?id=74NBCDSEFG/>
对于一个大量采用 GET 请求的网站__隐式的通过 http 标签__发出一个 GET 请求将是致命的。
具体的可执行情形描述将在如何检测部分给出。
===== 对 POST 请求的 CSRF 漏洞的攻击方式 =====
对 CSRF 有一种理解是把 GET 改为 POST 请求就认为是可以防止被攻击实际上是一种错误的理解,通过使用 __<iframe>__ 一样可以完成一个隐式的 CSRF 攻击,具体脚本写法如下。
清单 1. Frame1.html 脚本
<script>
function post(url, fields) {
var p = document.createElement('form');
p.action = url;
p.innerHTML = fields;
p.target = '_self';
p.enctype = 'multipart/form-data';
p.method = 'post';
document.body.appendChild(p);
p.submit();
}
function csrf_hack() {
var fields;
var csrf="<addMember
dnName="CN=manager 9/OU=Managers/OU=Users/O=QDSVT/DC=CN/DC=IBM/DC=COM"
accessLevel="Author" isPerson="1" isLocal="0"/>";
fields += "<input type='' name='action' value='"+csrf+"'>";
unescape(fields);
post('http://usersite:80/dm/services/DocumentService?do401=true',fields);
alert("csrf_end");
}
csrf_hack();
alert('end')
</script>
清单 2. IFrame.html
<IFRAME src=./frame1.html width=0 height=0></IFRAME>
这段代码通过脚本构造一个表单提交,通过 IFRAME __加载页面__自动执行本例IFRAME 宽高属性设置成零的目的是为了达到隐式攻击的效果JAVASCRIPT 只对窗口的大小有不成文的规范,宽高不能小于 50 像素点,但是**对 iframe 并没有要求**,这为隐式的跨域 Post 攻击提供了一个量好的途径。写成脚本的形式并不是说明只要被检测的站点没有脚本注入就没有任何问题POST 隐式攻击方式一样可以通过第 3 方,如图 1456 攻击路径都适合本例的使用。
===== Web 2.0 攻击方式 =====
Web 2.0 技术因其能大幅度提升用户的体验,已经被非常广泛的使用,并且 Web 2.0 技术对跨站请求的提交有严格的检查,所以一般不用担心来自第三方的 __xmlhttp__ 发出的 CSRF 攻击。Web 2.0 技术如果在本站点存在脚本注入漏洞,将会产生严重的 CSRF 攻击问题;另外一条攻击路径则是通过邮件系统,向受害用户发送带有 xmlhttp 请求的脚本文件,是否产生危害取决于用户是否执行该文件,危害性明显低于前两种。
对于发邮件,或者网站上传的文件发起攻击的案例是由 IE 的特性造成,由于 IE 允许从本地域 (local domain) 对任意域发送,一个包含 Web 2.0 代码的例子就能使 IE 完成成一次离线状态的攻击IE 允许通过对策略的修改以达到严格的安全配置,从而禁止对同域内容的访问。
以下是通常使用的对 Web 2.0 类型的跨站漏洞的攻击代码。
清单 3. 通常使用的对 Web 2.0 类型的跨站漏洞的攻击代码
<script>
alert('start delete');
var payload="<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Header>
<serviceVersion>8.0.0
</serviceVersion></soap:Header><soap:Body><deleteDocument
xmlns="http://webservices.clb.content.ibm.com">
<path>/@Pcsrftestplace/@RMain.nsf/@F/@DE44FD4FF0956D07648257570002C42DA
</path>
</deleteDocument></soap:Body></soap:Envelope>";
alert(message);
var client = new XMLHttpRequest();
client.open("POST",
"http://usercite.com /files/form/api/collections/
2d0f6188-8872-4722-8922-3a3c842aa443/entry?format=xml ");
client.setRequestHeader("Content-Type", "text/plain;charset=UTF-8");
client.setRequestHeader("x-method-override","DELETE");
client.setRequestHeader("x-requested-with","XMLHttpRequest");
(you can customized the header if you need)
client.send("");
</script>
<html>
===== 登陆 CSRF 攻击方式 =====
登陆式的跨站请求伪造是一种较新的攻击方式,让用户错误的以为是用自己的帐户密码登陆,实际上是登录到一个 Hacker 的账户。这种攻击方式的最显著的特征是Hacker 可以监听到用户的实际操作,通过查询历史记录可以知道用户做了那些操作,如果是在商业网站则会在历史记录中留下信用卡号,如果是在个人信息相关系统则会留下用户的隐私操作。
===== 使用 Rational AppScan 对 CSRF 的检测 =====
APPSCAN 是 IBM 收购 WatchFire 之后获得一款强大的网络安全的检测工具,目前属于 Rational 产品线,功能集中在网络应用产品的检测防范上,分静态与动态两种不同的功能,覆盖代码与产品的两端检测需求。
APPSCAN 自从 7.7 的版本以后加入对 CSRF 的防范,基本原理是通过对同一个需要检测的 URL 或者 SERVICE 按照顺序发出两次请求,发送两次请求之间会做一次退出登录状态的操作,如果一个对 CSRF 已经进行防范的网站是会发送回两个不同的回应内容。实例的说明如下。
请求 1
GET/POST http://myproduct.com/services?action=remove&id=10002
Headers ….. …..
Content: ……
返回内容 1
Response 200
Headers …. …..
Content:…..
请求 2
GET/POST http://myproduct.com/services?action=remove&id=10002
Headers ….. …..
Content: ……
返回内容 2
Response 200
Headers …. …..
Content:…..
返回内容 1 和返回内容 2 如果是完全一致的则可以认为是有问题的,反之则可以认为是没有问题。看似简单的原理,在实际操作中有个很繁琐的逻辑问题,比如请求 1 是一个删除动作,那么如何去构造一个请求 2并且获得一个一致的结果呢解决的办法是要先做一个操作 1然后再创建一个同样的 1再做操作 2。
从上述的简单例子就可以发现有效监测 CSRF 是一个较为繁琐的过程。AppScan 的检测前提就是对目标资源的操作在不同的一个 Session 中返回的内容肯定是应该不一样的。
这里要注意的问题是误报Web 应用程序操作大多都是对一个固定的 URL 的请求,包含一些资源文件,以及一些功能性的请求。对于资源文件的操作,很多情况下都是一个静态的请求,在未使用 PUT/DELETE 的应用程序,是无需对 GET 请求进行 CSRF 测试,在这种情况下是不存在 CSRF 漏洞的。而如果使用了 Ajax 框架的应用程序如果存在 DELETE/PUT 操作则需注意很可能出现严重的 CSRF 问题。未使用 Ajax 的产品则集中在 GET/POST 请求,需要注意的是 GET/POST 请求对 CSRF 来说是同样具有可操作性的,对产品的危害性是一致的。
对 CSRF 测试的两个主要方向是路径覆盖测试,和精确测试。之所以是要做如此分类的原因是一个产品有大量的 URL 如果一一测试需要大量的时间精力,覆盖测试是由工具去完成的是为了保证覆盖到产品的各个路径,有些产品实际上已经对 CSRF 有很深的认识,在这种情况下大多数资源已经被很好的保护起来,没有 CSRF 的问题,这时候一个对全路径的测试就是很必要的。
精确测试是由人来完成的通过分析产品功能和开发人员的沟通,阅读设计文档来完成的。为何要做精确测试的原因是,所有 Web 应用程序非常关注的问题之一就是产品的性能,而对所有请求都做 CSRF 防范的话就比如在一个高速公路上设置一个人工收费站一样会大大影响性能,一个好的 Web 应用在对 CSRF 防范是有针对性的,对一个没有 CSRF 保护的产品,一个良好的 CSRF 保护开端可以是由精确测试的结果为发起的。通过对固定功能的检测,以及对设计文档的了解,基本就可以断定产品是否做了 CSRF 保护。
一个正常的使用 Appscan 来检测 CSRF 的流程如图 2 所示。
图 2. 一个正常的使用 Appscan 来检测 CSRF 的流程
图 2. 一个正常的使用 Appscan 来检测 CSRF 的流程
AppScan 使用流程AppScan 执行过程的一个分解,如图 3。
图 3. AppScan 执行过程的一个分解
图 3. AppScan 执行过程的一个分解
精确测试的方法,目的是为了检测是否存在 CSRF 保护。对 CSRF 保护有个范围约束的问题,并不是所有的请求都需要对 CSRF 攻击做防范。对静态资源除非有 DELETE/PUT 操作允许的情况下,才需要进行测试;而对于关键的业务逻辑,比如银行转帐,确认收货人信息,参加竞标,删除一个用户,赋予用户高级权限,等等,对这类定性问题的约束是根据不同的商业产品各异,要具体问题具体分析。
本例以常见的页面删除为实例,阐述一个可以的测试方法。大概分为以下几种情况 :
使用 GET 来删除页面的,使用 DELETE/PUT 来删除页面的,使用 POST 来删除页面的,都是服务器与客户端的交互过程,具体的实例分析起来要远比分类更为复杂,一个操作可能带有很多各样的请求,找到有威胁的请求才是最终目的,有时候哪怕是 AppScan 已经定位到具体是那个请求,也还需要通过手工将这个案例找出加以描述成为有实际操作价值的场景,这里就需要引入手工测试工具加以支持。
手工工具的介绍,做精确测试需要对 HTTP 请求做频繁的操作,如果需要查看请求的内容,还有对具体请求的操作的观察,推荐使用 Fiddler 或者 WebScarab。
开始手动验证之前,还需要清楚 CSRF 发生的条件。所有的问题的发生有个前提条件是用户常用的浏览器中有一个与目标服务器处于激活状态的会话。这个条件需要的原因是CSRF 攻击的模式是用户 A 被恶意用户 B 所攻击,攻击是 B 发起的被用户 A 执行实现的。
而 B 往往是在 A 常去的网站注入代码,或者发送链接或者包含附件的文件给 A而包含着恶意代码或者链接的页面要被执行条件是用户 A 已经处在和服务器的会话之中,这也是 CSRF 发生的前提条件,也是手工测试的基础。
对 GET CSRF 漏洞的测试
GET 请求的情况下,请求如 http://mysite/service?action=delete&pageid=100001 这类问题的验证最为直接,并且无需写脚本和使用 fiddler 工具去观察实际的请求的格式。检测方法就是在维持一个与服务器连接的前提下,在浏览器地址栏输入如下网址,如果实际的页面被删除了就是 CSRF 攻击成功了。对于如此清楚的实例基本看到 URL 已经可以证明没有任何 CSRF 保护。
可关联的攻击场景如下,在任何可以显示图片的地方写入如下 <img src=http://mysite/service?action=delete&pageid=100001 width=0 height=0/>,另外只需要指引有删除权限的用户访问一下包含这个图片标签的网页,往往是通过发一个邮件或者 MSN 一个简单的链接就可以完成删除页面的操作。
对 POST CSRF 漏洞的测试
POST 请求的操作并不能免除 CSRF 的攻击。在浏览器中要发出 POST 请求,可以使用两种方法,一个是通过脚本调用页面文档元素 form 直接进行提交操作,特点是可以进行跨域的脚本提交,隐式攻击。另一种是通过使用 Ajax 对象直接发出请求,但是由于不能跨域发出请求,可执行的力度并不高,但是还是有可能性。同样是一个删除页面的操作,如下所示结构。
POST http://mysite/service
Headers….
Action=delete&pageid=100001
这个不同于 GET 之处是不能简单的通过在浏览器直接输入一个链接就能测试。需要借助一下预设好的 HTTP 服务器如 IBM HTTP Server、Domino或者 IIS。将 IFrame.html 的清单拷贝到服务器的一个目录。通过修改 frame1.html 中的 csrf_hack() 如下。
清单 4. 修改 frame1.html 中的 csrf_hack()
function csrf_hack() {
var fields;
fields += "<input type='' name='action' value='"+"delete"+"'>";
fields += "<input type='' name=pageid value='"+"1000001" +"'>";
unescape(fields);
post('http://mysite/service ',fields);
alert("csrf_end");
}
可关联的攻击场景如下 ,通过邮件或者 MSN 发送一个链接 http://hackerWebServer/iframe 给可以删除页面的用户,该操作就会被执行,如果页面删除,攻击成功。通过在其他网站可以做脚本注入的将 iframe.html 脚本写在该网站,一样可以达到攻击效果。
另一类通过 Ajax 提交的 post 请求,这类结构中多采用 SOAP message 或者类似的 XML 消息体,或者 Jason 消息体提交请求。结构如下。
POST http://mysite/service
Headers….
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Header>
<serviceVersion>8.0.0</serviceVersion>
</soap:Header><soap:Body>
<deleteDocument
xmlns="http://webservices.clb.content.ibm.com">
<path>/@Pcsrftestplace/@RMain.nsf/@F/@DE44FD4FF0956D07648257570002C42DA
</path></deleteDocument></soap:Body></soap:Envelope>
在此类情况下,需要修改 form 的表单的 enctype 属性为 multipart/form-data因为在默认的情况下是 application/x-www-form-urlencoded所有字符都会做 URL 编码转换,提交的数据是不合法的无法被服务器端识别,所以需要修改 enctype 属性,在 multipart/formdata 的情况下,数据是不会被编码的,而在很多服务器的接收端有的就是使用 multipart/formdata 去接受数据。由于 javascript 出于对安全的考虑禁止脚本自动修改 form 中提交的 file 属性的输入的值,所以想通过脚本修改控制 enctype 是不允许的,这样不同于第一类 POST 请求。但是并不影响场景的合理性通过在有漏洞的网站伪造表单请求form 指向我们要操作的 URL 即可。这种情况下,需要构造一个完整的表单,并通过用户点击一个任意方式发送的链接达到攻击效果。
对 DELETE/PUT CSRF 漏洞的测试
DELETE/PUT 请求依赖于 Web 2.0 技术,由于本身的限制,自由发出跨站的伪造请求是不可能的。更多使用的是离线攻击,或者本站点的脚本注入攻击。在存在本站点脚本注入攻击的情况下,所有这 4 种情况下,都可以完成隐式的攻击方式。代码请参照 Web 2.0 攻击章节的实例。
CSRF 的防范
CSRF 的防范机制有很多种,防范的方法也根据 CSRF 攻击方式的不断升级而不断演化。常用的有检查 Refer 头部信息,使用一次性令牌,使用验证图片等手段。出于性能的考虑,如果每个请求都加入令牌验证将极大的增加服务器的负担,具体采用那种方法更合理,需要谨慎审视每种保护的优缺点。
1. 检查 HTTP 头部 Refer 信息,这是防止 CSRF 的最简单容易实现的一种手段。根据 RFC 对于 HTTP 协议里面 Refer 的定义Refer 信息跟随出现在每个 Http 请求头部。Server 端在收到请求之后,可以去检查这个头信息,只接受来自本域的请求而忽略外部域的请求,这样就可以避免了很多风险。当然这种检查方式由于过于简单也有它自身的弱点:
a) 首先是检查 Refer 信息并不能防范来自本域的攻击。在企业业务网站上,经常会有同域的论坛,邮件等形式的 Web 应用程序存在,来自这些地方的 CSRF 攻击所携带的就是本域的 Refer 域信息,因此不能被这种防御手段所阻止。
b) 同样,某些直接发送 HTTP 请求的方式(指非浏览器,比如用后台代码等方法)可以伪造一些 Refer 信息,虽然直接进行头信息伪造的方式属于直接发送请求,很难跟随发送 cookie但由于目前客户端手段层出不穷flashjavascript 等大规模使用,从客户端进行 refer 的伪造,尤其是在客户端浏览器安装了越来越多的插件的情况下已经成为可能了。
2. 使用一次性令牌,这是当前 Web 应用程序的设计人员广泛使用的一种方式,方法是对于 Get 请求,在 URL 里面加入一个令牌,对于 Post 请求,在隐藏域中加入一个令牌。这个令牌由 server 端生成,由编程人员控制在客户端发送请求的时候使请求携带本令牌然后在 Server 端进行验证。但在令牌的设计上目前存在着几个错误的方案:
a) 使用和 Session 独立的令牌生成方式。这种令牌的值和 Session 无关,因此容易被其他用户伪造。这里的其他用户指的是当前 Web 应用程序的其他用户和活跃在网络传输阶段各个设置上的监听者,这种恶意用户可能使用自己的令牌来进行替换以便达到伪造的目的。
b) 完全使用 Session 认证信息作为令牌的生成方式。这种保护方式对于保护 CSRF 是起了作用的,但是可能会造成其他危害,具体来说,如果某些 URL 或者网页被拷贝下来与其他人共享,那么这些 URL 或者拷贝下来的网页中可能会含有用户的会话信息,这种信息一旦被恶意用户获得,就能造成极大的危害。
因此,一个正确的令牌设计应该是使用 Session 信息做 Hash用得出的哈希值来做 CSRF 的令牌。
3. 使用验证图片,这种方法的出现的作用是对于机器人暴力攻击的防止。但在 CSRF 的防范上,也有一些安全性要求比较高的的应用程序结合验证图片和一次性令牌来做双重保护。由于这种图片验证信息很难被恶意程序在客户端识别,因此能够提高更强的保护。当客户端的浏览器可能已经处于一种不安全的环境中的情况下(比如客户端的安全级别设置较低,客户端浏览器安装了不安全的插件等)。
以上给的这些只是防范 CSRF 的比较通用的一些方法Web 开发人员可以根据自己对自己的应用程序的功能的理解来确定安全级别的要求从而选择使用不同的保护措施,也推荐在同一应用程序内部结合使用多种方法来进行保护。
总结
CSRF 攻击作为一个存在已久的攻击方式,在大量的商业网站上都可以找出,应用本文的知识作出一个合理的分析,有针对性的提出改进方案才是本文作者希望看到的,在即不损害应用程序的性能的前提下,提高安全性;而对即将开发的网络应用程序来说,深刻理解其的危害性,在设计阶段就考虑到对 CSRF 的防范,无疑能收到更好的效果。
<!-- CMA ID: 498856 --><!-- Site ID: 10 --><!-- XSLT stylesheet used to transform this file: dw-article-6.0-beta.xsl -->

View File

@@ -0,0 +1,82 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-20T14:53:47+08:00
====== CSRF攻击原理解析 ======
Created Thursday 20 October 2011
http://www.80sec.com/csrf-securit.html
|=——————————————————————=|
|=————–=[ CSRF攻击原理解析 ]=——————=|
|=——————————————————————=|
|=——————-=[ By rayh4c ]=————————=|
|=————-=[ rayh4c@80sec.com ]=——————-=|
|=——————————————————————=|
Author: rayh4c [80sec]
EMail: rayh4c#80sec.com
Site: http://www.80sec.com
Date: 2008-9-21
===== 0×00. 前言 =====
在Web程序中普通用户一般只在Web界面里完成他想要的操作Web程序接受的正常客户端请求一般来自用户的**点击链接和表单提交**等行为,可是恶意攻击者却可以**依靠脚本和浏览器的安全缺陷**来劫持客户端会话、伪造客户端请求。
===== 0×01. CSRF攻击分类 =====
CSRF是__伪造客户端请求__的一种攻击CSRF的英文全称是Cross Site Request Forgery字面上的意思是__跨站点伪造请求__。这种攻击方式是国外的安全人员于2000年提出国内直到06年初才被关注早期我们团队的剑心使用过CSRF攻击实现了DVBBS后台的SQL注射同时网上也出现过动易后台管理员添加的CSRF漏洞等08年CSRF攻击方式开始在BLOG、SNS等大型社区类网站的脚本蠕虫中使用。
CSRF的定义是强迫受害者的浏览器向一个易受攻击的Web应用程序发送请求,最后达到攻击者所需要的操作行为。CSRF漏洞的攻击一般分为**站内和站外**两种类型:
CSRF站内类型的漏洞在一定程度上是由于程序员滥用$_REQUEST类变量造成的一些敏感的操作本来是要求用户从表单提交发起**POST请求**传参给程序,但是由于使用了$_REQUEST等变量程序也接收**GET请求**传参这样就给攻击者使用CSRF攻击创造了条件一般攻击者只要把预测好的请求参数放在**站内一个贴子或者留言的图片链接里**,受害者浏览了这样的页面就会被强迫发起请求。
CSRF站外类型的漏洞其实就是传统意义上的__外部提交数据__问题一般程序员会考虑给一些留言评论等的__表单加上水印__以防止SPAM问题但是为了用户的体验性一些操作可能没有做任何限制所以攻击者可以先预测好请求的参数在站外的Web页面里编写javascript脚本伪造文件请求或和自动提交的表单来实现GET、POST请求**用户在会话状态下点击链接访问站外的Web页面客户端就被强迫发起请求**。
===== 0×02. 浏览器的安全缺陷 =====
现在的Web应用程序几乎都是使用Cookie来识别**用户身份以及保存会话状态**但是所有的浏览器在最初加入Cookie功能时并没有考虑安全因素从WEB页面产生的文件请求都会带上COOKIE如下图所示Web页面中的一个正常的图片所产生的请求也会带上COOKIE
<img src=”http://website/logo.jpg”>
GET http://website.com/log.jpg
Cookie: session_id
客户端 ——————————————————-服务器
浏览器的这种安全缺陷给CSRF漏洞的攻击创造了最基本的条件因为Web页面中的任意文件请求都会带上COOKIE所以我们将文件地址替换为一个链接的话用户访问Web页面就相当于会话状态下自动点击了链接而且带有SRC属性具有文件请求的HTML标签如图片、FLASH、音乐等相关的应用都会产生伪造GET请求的CSRF安全问题。一个web应用程序可能会因为最基本的渲染页面的HTML标签应用而导致程序里所有的GET类型传参都不可靠。
===== 0×03. 浏览器的会话安全特性 =====
参照Set-Cookie的标准格式现今浏览器支持的cookie实际上分为两种形式
Set-Cookie: <name>=<value>[; <name>=<value>] [; expires=<date>][; domain=<domain_name>] [; path=<some_path>][; secure][; HttpOnly]
一种是内存COOKIE在没有设定COOKIE值的expires参数也就是没有设置COOKIE的失效时间情况下这个COOKIE在关闭浏览器后将失效并且不会保存在本地。另外一种是本地保存COOKIE也就是设置了expires参数COOKIE的值指定了失效时间那么这个COOKIE会保存在本地关闭浏览器后再访问网站在COOKIE有效时间内所有的请求都会带上这个本地保存COOKIE。
Internet Explorer有一个隐私报告功能其实这是一个安全功能它会阻挡所有的第三方COOKIE比如A域Web页面嵌入了B域的文件客户端浏览器访问了A域的Web页面后对B域所发起的文件请求所带上的COOKIE会被IE拦截。除开文件请求情况A域的Web页面如果使用IFRAME帧包含B域的Web页面访问A域的Web页面后B域的Web页面里的所有请求包括文件请求带上的COOKIE同样会被IE拦截。不过Internet Explorer的这个安全功能有两个特性一是不会拦截内存COOKIE二是在网站设置了P3P头的情况下会允许跨域访问COOKIE隐私报告功能就不会起作用了。
所以在Internet Explorer的这个安全特性的前提下攻击者要进行站外的CSRF攻击使用文件请求来伪造GET请求的话受害者必须在使用内存COOKIE也就是没有保存登陆的会话状态下才可能成功。而Firefox浏览器并没有考虑使用这样的功能站外的CSRF攻击完全没有限制。
===== 0×04. 关于Javascript劫持技术 =====
近年来的web程序频繁使用__Ajax__技术__JSON__也开始取代XML做为AJAX的数据传输格式JSON实际上就是一段javascript大部分都是定义的数组格式。fortify公司的三位安全人员在2007年提出了Javascript劫持技术这是一种针对JSON动态数据的攻击方式实际上这也是一种变相的CSRF攻击。攻击者从站外调用一个script标签包含站内的一个JSON动态数据接口因为<script src=”>这种脚本标签的文件请求会带上COOKIE用户访问后相当于被迫从站外发起了一个带有身份认证COOKIE的GET请求web程序马上返回了用户相关的JSON数据攻击者就可以取得这些关键的JSON数据加以利用整个过程相当于一个站外类型的CSRF攻击。
WEB应用中的JSON数据大部分使用在个人资料、好友列表等隐私功能里这类数据一般是web蠕虫最重要的传播功能所需要的数据而CSRF攻击结合Javascript劫持技术完全可以分析这类数据制作自动传播的web蠕虫在一定情况下这种web蠕虫比网站出现跨站脚本漏洞制作的web蠕虫更具威胁性几乎不受网站架构的限制因为攻击者利用的不是传统的Web漏洞而是网站自身正常的功能如果出现这类CSRF蠕虫对网站的打击将是灾难性的。
===== 0×05. 安全提醒 =====
各个大型社区类网站必须警惕CSRF攻击和相关web蠕虫的爆发,并且针对这类web攻击制定有效的应急措施。同建议程序员不要滥用$_REQUEST类变量在必要的情况下给某些敏感的操作加上水印考虑使用类似DISCUZ论坛的formhash技术提高黑客预测请求参数的难度注意JSON数据接口的安全问题等。最后希望大家全面的考虑客户端和服务端整体的安全注意Internet Explorer等客户端浏览器一些安全缺陷和安全特性防止客户端程序的安全问题影响整个Web应用程序。
参考:
http://blog.csdn.net/lake2/archive/2008/04/02/2245754.aspx
http://www.cgisecurity.com/articles/csrf-faq.shtml
http://www.playhack.net/view.php?id=31
http://www.fortify.com/servlet/downloads/user/JavaScript_Hijacking.pdf
http://www.w3.org/P3P/

View File

@@ -0,0 +1,249 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-07T19:31:35+08:00
====== Django+python+BeautifulSoup组合的垂直搜索爬虫 ======
Created Friday 07 October 2011
http://www.yiihsia.com/2010/11/djangopythonbeautifulsoup%E7%BB%84%E5%90%88%E7%9A%84%E5%9E%82%E7%9B%B4%E6%90%9C%E7%B4%A2%E7%88%AC%E8%99%AB/
使用python+BeautifulSoup完成爬虫抓取特定数据的工作并使用Django搭建一个管理平台用来协调抓取工作。
因为自己很喜欢Django admin后台所以这次用这个后台对抓取到的链接进行管理使我的爬虫可以应对各种后期的需求。比如分时段抓取定期的对已经抓取的地址重新抓取。数据库是用python自带的sqlite3所以很方便。
大三的时候做一个电影推荐系统,需要些电影数据。本文的例子是对豆瓣电影抓取特定的数据。
第一步建立Django模型
模仿nutch的爬虫思路这里简化了。每次抓取任务开始先从数据库里找到未保存的(is_save = False)的链接,放到抓取链表里。你也可以根据自己的需求去过滤链接。
python代码
class Crawl_URL(models.Model):
url = models.URLField('抓取地址',max_length=100, unique=True)
weight = models.SmallIntegerField('抓取深度',default = 0)#抓取深度起始1
is_save = models.BooleanField('是否已保存',default= False)#
date = models.DateTimeField('保存时间',auto_now_add=True,blank=True,null=True)
def __unicode__(self):
return self.url
然后生成相应的表。
还需要一个admin管理后台
class Crawl_URLAdmin(admin.ModelAdmin):
list_display = ('url','weight','is_save','date',)
ordering = ('-id',)
list_filter = ('is_save','weight','date',)
fields = ('url','weight','is_save',)
admin.site.register(Crawl_URL, Crawl_URLAdmin)
第二步,编写爬虫代码
爬虫是单线程,并且每次抓取后都有相应的暂定,豆瓣网会禁止一定强度抓取的爬虫
爬虫根据深度来控制每次都是先生成链接然后抓取并解析出更多的链接最后将抓取过的链接is_save=true并把新链接存入数据库中。每次一个深度抓取完后都需要花比较长的时候把链接导入数据库。因为需要判断链接是否已存入数据库。
这个只对满足正则表达式 http://movie.douban.com/subject/(\d+)/ 的地址进行数据解析。并且直接忽略掉不是电影模块的链接。
第一次抓取需要在后台加个链接比如http://movie.douban.com/chart这是个排行榜的页面电影比较受欢迎。
python代码
# coding=UTF-8
import urllib2
from BeautifulSoup import *
from urlparse import urljoin
from pysqlite2 import dbapi2 as sqlite
from movie.models import *
from django.contrib.auth.models import User
from time import sleep
image_path = 'C:/Users/soul/djcodetest/picture/'
user = User.objects.get(id=1)
def crawl(depth=10):
for i in range(1,depth):
print '开始抓取 for %d....'%i
pages = Crawl_URL.objects.filter(is_save=False)
newurls={}
for crawl_page in pages:
page = crawl_page.url
try:
c=urllib2.urlopen(page)
except:
continue
try:
#解析元数据和url
soup=BeautifulSoup(c.read())
#解析电影页面
if re.search(r'^http://movie.douban.com/subject/(\d+)/$',page):
read_html(soup)
#解析出有效的链接放入newurls
links=soup('a')
for link in links:
if 'href' in dict(link.attrs):
url=urljoin(page,link['href'])
if url.find("'")!=-1: continue
if len(url) &gt; 60: continue
url=url.split('#')[0] # removie location portion
if re.search(r'^http://movie.douban.com', url):
newurls[url]= crawl_page.weight + 1 #连接有效。存入字典中
try:
print 'add url :'
except:
pass
except Exception.args:
try:
print "Could not parse : %s" % args
except:
pass
#newurls存入数据库 is_save=False weight=i
crawl_page.is_save = True
crawl_page.save()
#休眠2.5秒
sleep(2.5)
save_url(newurls)
#保存url放到数据库里
def save_url(newurls):
for (url,weight) in newurls.items():
url = Crawl_URL(url=url,weight=weight)
try:
url.save()
except:
try:
print 'url重复:'
except:
pass
return True
第三步用BeautifulSoup解析页面
抽取出电影标题图片剧情介绍主演标签地区。关于BeautifulSoup的使用可以看这里BeautifulSoup技术文档
#抓取数据
def read_html(soup):
#解析出标题
html_title = soup.html.head.title.string
title = html_title[:len(html_title)-5]
#解析出电影介绍
try:
intro = soup.find('span',attrs={'class':'all hidden'}).text
except:
try:
node = soup.find('div',attrs={'class':'blank20'}).previousSibling
intro = node.contents[0]+node.contents[2]
except:
try:
contents = soup.find('div',attrs={'class':'blank20'}).previousSibling.previousSibling.text
intro = contents[:len(contents)-22]
except:
intro = u'暂无'
#取得图片
html_image = soup('a',href=re.compile('douban.com/lpic'))[0]['href']
data = urllib2.urlopen(html_image).read()
image = '201003/'+html_image[html_image.rfind('/')+1:]
f = file(image_path+image,'wb')
f.write(data)
f.close()
#解析出地区
try:
soup_obmo = soup.find('div',attrs={'class':'obmo'}).findAll('span')
html_area = soup_obmo[0].nextSibling.split('/')
area = html_area[0].lstrip()
except:
area = ''
#time = soup_obmo[1].nextSibling.split(' ')[1]
#time = time.strptime(html_time,'%Y-%m-%d')
#生成电影对象
new_movie = Movie(title=title,intro=intro,area=area,version='暂无',upload_user=user,image=image)
new_movie.save()
try:
actors = soup.find('div',attrs={'id':'info'}).findAll('span')[5].nextSibling.nextSibling.string.split(' ')[0]
actors_list = Actor.objects.filter(name = actors)
if len(actors_list) == 1:
actor = actors_list[0]
new_movie.actors.add(actor)
else:
actor = Actor(name=actors)
actor.save()
new_movie.actors.add(actor)
except:
pass
#tag
tags = soup.find('div',attrs={'class':'blank20'}).findAll('a')
for tag_html in tags:
tag_str = tag_html.string
if len(tag_str) > 4:
continue
tag_list = Tag.objects.filter(name = tag_str)
if len(tag_list) == 1:
tag = tag_list[0]
new_movie.tags.add(tag)
else:
tag = Tag(name=tag_str)
tag.save()
new_movie.tags.add(tag)
#try:
#except Exception.args:
# print "Could not download : %s" % args
print r'download success'
豆瓣的电影页面并不是很对称,所以有时候抓取的结果可能会有点出入

View File

@@ -0,0 +1,170 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-17T21:23:41+08:00
====== Django 如何处理一个请求 ======
Created Monday 17 October 2011
http://blog.huyo.org/?p=345
本文翻译自 James Bennett 的 How Django processes a request对于学习 Django 的朋友,我想能有所助益。翻译有不妥当的地方,请留言告诉我。
在 Jonathan Snook 昨天的评论中,他提出了一个很棒的挑战:说说 Django 是如何处理一个 request 的,从开始到结束,对于内部调用的各种东西要有足够的细节,并且要链接到恰当的文档。
Simon Willison 曾经写过这样的文档,但它是从一个很高层的角度而且从那以来很多东西都有变化,因此我决定自己写一篇,希望它易于理解。
注意:这是第一份草稿,不是完成的产品,也不是完整的列表。随着工作的进行,它会经常改变。理想情况下,我会得到一些帮助来产生一个某种程度上的可视化的文档,但现在我坚持用纯文本。
有 官方文档的条目,我会为它做链接,没有的,我会链接到 Django 仓库中相关代码的位置──这些位置经常改变,特别是因为我总是链接到各自文件的行号,但我会尽力保持它们是最新的。如果你看到错误的地方,或者我遗漏的东 西,或者应该解释的更好的地方,请留言让我知道。
我们开始吧。
===== Request 来了!首先发生的是一些和 Django 有关的初始化工作(只执行一次) =====
分别是:
* 如果是 Apache/mod_python 提供服务request 由 mod_python 创建的 django.core.handlers.modpython.ModPythonHandler 实例传递给 Django。
* 如果是其他服务器,则必须兼容 WSGI这样服务器将创建一个 django.core.handlers.wsgi.WsgiHandler 实例。
这两个类都继承自 django.core.handlers.base.BaseHandler它包含对任何类型的 request 来说都需要的公共代码。
当上面其中一个处理器实例化后,紧接着发生了一系列的事情:
* 这个处理器导入你的 Django 配置文件。
// 240 from django.conf import settings//
* 这个处理器导入 Django 的自定义例外类。
* 这个处理器呼叫它自己的 load_middleware 方法,加载所有列在 MIDDLEWARE_CLASSES 中的 middleware 类并且内省它们。
250 self.load_middleware()
最后一条有点复杂,我们仔细瞧瞧。
一 个 middleware 类可以渗入处理过程的四个阶段__requestviewresponse 和 exception__。要做到这一点只需要定义指定的、恰当的方法process_requestprocess_view process_response 和 process_exception。middleware 可以定义其中任何一个或所有这些方法,这取决于你想要它提供什么样的功能。
当处理器内省 middleware 时它查找上述名字的方法并__建立四个列表作为处理器的实例变量__
_request_middleware 是一个保存 process_request 方法的列表(在每一种情况下,它们是真正的方法,可以直接呼叫),这些方法来自于任一个定义了它们的 middleware 类。
_view_middleware 是一个保存 process_view 方法的列表,这些方法来自于任一个定义了它们的 middleware 类。
_response_middleware 是一个保存 process_response 方法的列表,这些方法来自于任一个定义了它们的 middleware 类。
_exception_middleware 是一个保存 process_exception 方法的列表,这些方法来自于任一个定义了它们的 middleware 类。
//from base.py
33 self._view_middleware = []
34 self._template_response_middleware = []
35 self._response_middleware = []
36 self._exception_middleware = []
38 request_middleware = []
===== 绿灯:现在开始(处理器开始处理request) =====
现在处理器已经准备好真正开始处理了,因此它给调度程序发送一个信号 **request_started**Django 内部的调度程序允许各种不同的组件声明它们正在干什么,并可以写一些代码监听特定的事件。关于这一点目前还没有官方的文档,但在 wiki 上有一些注释。)。
// 259 signals.request_started.send(sender=self.__class__)//
接下来它(处理器)实例化一个** django.http.HttpRequest** 的子类。
根据不同的处理器,可能是**django.core.handlers.modpython.ModPythonRequest** 的一个实例,
也可能是** django.core.handlers.wsgi.WSGIRequest** 的一个实例。
需要两个不同的类是因为 mod_python 和 WSGI APIs 以不同的格式传入 request 信息,这个信息需要解析为 Django 能够处理的**一个单独的标准格式**。
// 237 request_class = WSGIRequest//
272 response = self.get_response(request)
一旦一个 HttpRequest 或者类似的东西存在了处理器就呼叫它自己的__ get_response__ 方法,传入这个 HttpRequest 作为唯一的参数。
__该方法基本上处理了所有的Django流程__
=== 第一回合: ===
get_response 做的第一件事就是遍历处理器的** _request_middleware** 实例变量并呼叫其中的每一个方法,传入 HttpRequest 的实例作为参数。
这些方法可以选择 __短路剩下的处理并立即让 get_response 返回__如果它们这样做返回值必须是 django.http.HttpResponse 的一个实例后面会讨论到到主处理器代码get_response 不会等着看其它剩下的 middleware 类想要做什么,然后处理器进入 response 阶段。然而,更一般的情况是,这里应用的 middleware 方法**简单地做一些处理并决定是否增加,删除或补充 request 的属性**。
假设没有一个作用于 request 的** requrest **middleware 直接返回 response处理器下一步会尝试**解析请求的 URL**。它在配置文件中寻找一个叫做 **ROOT_URLCONF** 的配置,用这个配置加上根 URL /,作为参数来创建 **django.core.urlresolvers.RegexURLResolver **的一个实例,然后呼叫它的 resolve 方法来解析请求的 URL 路径(解析后返回一个view可调用对象及其参数)。
URL resolver 遵循一个相当简单的模式。对于在 URL 配置文件中根据 ROOT_URLCONF 的配置产生的每一个在** urlpatterns** 列表中的条目,它会检查请求的 URL 路径是否与这个条目的正则表达式相匹配,如果是的话,有两种选择:
* 如果这个条目有一个可以呼叫的** include**resolver __截取__匹配的 URL转到 include 指定的 URL 配置文件并开始遍历其中 urlpatterns 列表中的每一个条目。根据你 URL 的深度和模块性,这可能重复好几次。
* 否 则resolver 返回三个条目:匹配的条目指定的 view function一个从 URL 得到的**未命名匹配组**(被用来作为 view 的位置参数);一个关键字参数字典,它由从 URL 得到的任意**命名匹配组**和从 URLConf 中得到的任意其它**关键字参数**组合而成。
注意这一过程会在匹配到**第一个**指定了 view 的条目时停止,因此最好让你的 URL 配置__从复杂的正则过渡到简单的正则__这样能确保 resolver 不会首先匹配到简单的那一个而返回错误的 view function。
如果没有找到匹配的条目resolver 会产生 django.core.urlresolvers.**Resolver404** 例外,它是 django.http.Http404 异常的子类。后面我们会知道它是如何处理的。
=== 第二回合, ===
一旦知道了所需的 view function 和相关的参数处理器就会查看它的__ _view_middleware__ 列表,并呼叫其中的方法,传入 HttpRequstview function针对这个 view 的位置参数列表和关键字参数字典。与第一回合类似任何一个view middleware如果返回的response非None处理程序会立即返回。
如果处理过程这时候还在继续的话,处理器会呼叫 view function。Django 中的 Views 不很严格因为它只需要满足几个条件:
* 必须可以被呼叫。
* 必须接受 django.http.HttpRequest 的实例作为第一位值参数。
* 必须能产生一个异常或返回 django.http.HttpResponse 的一个实例。
view函数只要满足这些条件就可以了。尽管如此一般来说views 会使用 Django 的 **database API **来创建检索更新和删除数据库的某些东西还会__加载并渲染__一个模板来呈现一些东西给最终用户。
===== 模板 =====
Django 的模板系统有两个部分:一部分是给设计师使用的混入少量其它东西的 HTML另一部分是给程序员使用纯 Python。
从一个 HTML 作者的角度Django 的模板系统非常简单,需要知道的仅有三个结构:
* 变量引用。
* 模板过滤。在上面的例子中使用过滤竖线, 通常这用来__格式化输出__比如运行 Textile格式化日期等等
* 模板标签。是这样:{% baz %}。这是模板的“__逻辑__”实现的地方你可以 {% if foo %}{% for bar in foo %}等等if 和 for 都是模板标签。
1变量引用以一种非常简单的方式工作。如果你只是要打印变量模板系统就会输出它。这里唯一的复杂情况是 ,这时模板系统按顺序尝试几件事:
* 首先它尝试一个**字典**方式的查找,看看 foo['bar'] 是否存在。如果存在,则它的值被输出,这个过程也随之结束。
* 如果字典查找失败,模板系统尝试**属性**查找,看看 foo.bar 是否存在。同时它还检查这个属性是否可以被**呼叫**,如果可以,呼叫之。
* 如果属性查找失败,模板系统尝试把它作为列表**索引**进行查找。
如果所有这些都失败了,模板系统输出配置 **TEMPLATE_STRING_IF_INVALID** 的值,默认是空字符串.这不会引起异常。
2模板过滤就是简单的 Python functions它接受一个值和一个参数返回一个新的值。比如date 过滤用一个 Python datetime 对象作为它的值,一个标准的 strftime 格式化字符串作为它的参数,返回对 datetime 对象应用了格式化字符串之后的结果。
3模板标签用在事情有一点点复杂的地方它是你了解 Django 的模板系统是如何真正工作的地方。
== Django 模板的结构 ==
在内部,一个 Django 模板体现为 **“nodes” 集合**,它们都是从基本的 django.template.Node 类继承而来。Nodes 可以做各种处理,但有一个共同点:每一个 Node 必须有一个叫做 __render 的方法__它接受的第二个参数第一个参数显然是 Node 实例是__ django.template.Context__ 的一个实例,这是一个**类似于字典**的对象,包含**所有模板可以获得的变量**。Node 的 render 方法必须返回**一个字符串**,但如果 Node 的工作不是输出(比如,它是要通过增加,删除或修改传入的 Context 实例变量中的变量来修改模板上下文),可以返回**空字符串**。
Django 包含许多 Node 的子类来提供有用的功能。比如,每个**内置的模板标签都被一个 Node 的子类处理**比如IfNode 实现了 if 标签ForNode 实现了 for 标签,等等)。所有内置标签可以在 **django.template.defaulttags **找到。实际上,上面介绍的所有模板结构都是某种形式的 Nodes纯文本也不例外。变量查找由 VariableNode 处理,出于自然,过滤也应用在 VariableNode 上,标签是各种类型的 Nodes纯文本是一个 TextNode。
一般来说,一个 view 渲染一个模板要经过下面的步骤,依次是:
* 加载需要渲染的模板。这是由 **django.template.loader.get_template **完成的,它能利用这许多方法中的任意一个来定位需要的模板文件。**get_template** 函数返回一个 django.template.Template 实例其中包含__经过解析的模板和用到的方法__。
* 实例化一个 Context 用来**渲染模板**。如果用的是 Context 的子类** django.template.RequestContext**那么附带的__上下文处理函数__就会自动添加在 view 中没有定义的变量。Context 的构造函数用一个键/值对的字典(对于模板,它将变为名/值变量作为它唯一的参数RequestContext 则用 HttpRequest 的一个实例和一个字典。
* 呼叫 Template 实例的 render 方法Context 对象作为第一个位置参数。
Template 的 render 方法的返回值是一个字符串,它由 Template 中**所有 Nodes 的 render 方法返回的值连接而成**,呼叫顺序为它们出现在 Template 中的顺序。
关于 Response一旦一个模板完成渲染或者产生了其它某些合适的输出view 就会负责产生一个 **django.http.HttpResponse** 实例,它的构造函数接受两个可选的参数:
* 一个作为 response 主体的**字符串**(它应该是第一位置参数,或者是关键字参数 content。大部分时间这将作为渲染一个模板的输出但不是必须这样在这里你可以传入**任何有效的 Python 字符串**。
* 作 为 response 的__ Content-Type header__ 的值(它应该是第二位置参数,或者是关键字参数 mine_type。如果没有提供这个参数Django 将会使用配置中 DEFAULT_MIME_TYPE 的值和 DEFAULT_CHARSET 的值,如果你没有在 Django 的全局配置文件中更改它们的话,分别是 “text/html” 和 “utf-8”。
===== 第三回合:异常 =====
如果 view 函数,或者其中的什么东西,发生了例外,那么 get_response我知道我们已经花了些时间深入 views 和 templates但是一旦 view 返回或产生例外,我们仍将重拾处理器中间的 get_response 方法)将遍历它的 **_exception_middleware** 实例变量并呼叫那里的每个方法,传入 **HttpRequest**和这个 exception 作为参数。如果顺利,这些方法中的一个会实例化一个 HttpResponse 并返回它。
这时候有可能还是没有得到一个 HttpResponse这可能有几个原因
* view 可能没有返回值。
* view 可能产生了例外但没有一个 middleware 能处理它。
* 一个 middleware 方法试图处理一个例外时自己又产生了一个新的例外。
这时候get_response 会回到自己的例外处理机制中,它们有几个层次:
* 如果 exception 是 Http404 并且 DEBUG 设置为 Trueget_response 将执行 view django.views.debug.technical_404_response传入 HttpRequest 和 exception 作为参数。这个 view 会展示 URL resolver 试图匹配的模式信息。如果 DEBUG 是 False 并且例外是 Http404get_response 会呼叫** URL resolver 的 resolve_404 方法**。这个方法查看 URL 配置以判断哪一个 view 被指定用来处理 404 错误。默认是 django.views.defaults.page_not_found但可以在 URL 配置中给 handler404 变量赋值来更改。
* 对于任何其它类型的例外,如果 DEBUG 设置为 Trueget_response 将执行 view django.views.debug.technical_500_response传入 HttpRequest 和 exception 作为参数。这个 view 提供了关于例外的详细信息,包括 traceback每一个层次 stack 中的本地变量HttpRequest 对象的详细描述和所有无效配置的列表。 如果 DEBUG 是 Falseget_response 会呼叫 URL resolver 的 resolve_500 方法,它和 resolve_404 方法非常相似,这时默认的 view 是 django.views.defaults.server_error但可以在 URL 配置中给 handler500 变量赋值来更改。
此 外,对于除了 django.http.Http404 或 Python 内置的 SystemExit 之外的任何例外,处理器会给调度者发送信号 got_request_exception在返回之前构建一个关于例外的描述把它发送给列在 Django 配置文件的 ADMINS 配置中的每一个人。
===== 最后回合 =====
现在,无论 get_response 在哪一个层次上发生错误,它都会返回一个 HttpResponse 实例,因此我们回到处理器的主要部分。一旦它获得一个 HttpResponse 它做的第一件事就是遍历它的 _response_middleware 实例变量并应用那里的方法,传入 HttpRequest 和 HttpResponse 作为参数。
注意对于任何想改变点什么的 middleware 来说这是它们的__最后机会__。
The check is in the mail
是该结束的时候了。一旦 response middleware 完成了最后环节,**处理器**将给调度者发送信号** request_finished**,对与想在当前的 request 中执行的任何东西来说这绝对是最后的呼叫。监听这个信号的处理者会__清空并释放任何使用中的资源__。比如Django 的 request_finished 监听者会关闭所有数据库连接。
这件事发生以后,处理器会构建一个合适的返回值送返给实例化它的任何东西(现在,是一个恰当的 mod_python response 或者一个 WSGI 兼容的 response这取决于处理器并返回。
结束了,从开始到结束,这就是 Django 如何处理一个 request。

View File

@@ -0,0 +1,147 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T22:00:44+08:00
====== 我的总结 ======
Created Sunday 23 October 2011
__#/usr/lib/python2.7/site-packages/django/contrib/admin/__init__.py __
**from django.contrib.admin.sites import AdminSite, site **
def autodiscover(): #遍历项目中使用的每一个安装的APP的admin.py模块将其中注册的每一个model
class 添加到admin界面中。
"""
Auto-discover INSTALLED_APPS admin.py modules and fail silently when
not present. This forces an import on them to register any admin bits they
may want.
"""
**from django.conf import**
=== settings ===
** # 而安装的APP存放在django.conf.settings中的**
INSTALL_APPS变量中这个变量中的每个值来自django.conf.global_settings
和DJANGO_SETTINGS_MODULE environment variable中定义的用户项目settings文件。
from django.utils.importlib import import_module
from django.utils.module_loading import module_has_submodule
**for app in **
=== settings.INSTALLED_APPS: ===
因为admin包中的__init__.py导入了site模块定义了autodiscover()函数,因此可以如下
使用:
**#使用方式**(项目里的urls.py文件节选)
__/home/geekard/djcode/mysite/urls.py__
from django.contrib import admin
admin.**autodiscover()**
# And include this URLpattern...
urlpatterns = patterns('',
# ...
(r'^admin/', include(**admin.site.urls**)),
# ...
)
# admin模块导入的文件settings
__#/usr/lib/python2.7/site-packages/django/conf/__init__.py__
"""
Settings and configuration for Django.
Values will be read from the module specified by the DJANGO_SETTINGS_MODULE environment
variable, and then from django.conf.global_settings; see the global settings file for a list of all possible variables.
"""
注意定义在DJANGO_SETTINGS_MODULE中同名变量会覆盖global settings中的同名变量值。
from django.conf import
=== global_settings ===
from django.utils.functional import LazyObject
from django.utils import importlib
ENVIRONMENT_VARIABLE =
=== "DJANGO_SETTINGS_MODULE" ===
class LazySettings(LazyObject):
"""
A lazy proxy for either global Django settings or a custom settings object.
The user can manually configure settings prior to using them. Otherwise,
Django uses the settings module pointed to by DJANGO_SETTINGS_MODULE.
"""
def _setup(self):
"""
Load the settings module pointed to by the environment variable. This
is used the first time we need any settings at all, if the user has not
previously configured the settings manually.
"""
=== settings ===
**= LazySettings()**
__/usr/lib/python2.7/site-packages/django/contrib/admin/sites.py__
from django.conf import settings
class AdminSite(object):
"""
An AdminSite object encapsulates an instance of the Django admin application, ready
to be hooked in to your URLconf. Models are registered with the AdminSite using the
register() method, and the get_urls() method can then be used to access Django view
functions that present a full admin interface for the collection of registered
models.
"""
def** get_urls**(self):
from django.conf.urls.defaults import patterns, url, include
if settings.DEBUG:
self.check_dependencies()
def wrap(view, cacheable=False):
def wrapper(*args, **kwargs):
return self.admin_view(view, cacheable)(*args, **kwargs)
return update_wrapper(wrapper, view)
# Admin-site-wide views.
urlpatterns = patterns('',
url(r'^$',
wrap(self.index),
name='index'),
url(r'^logout/$',
wrap(self.logout),
name='logout'),
url(r'^password_change/$',
wrap(self.password_change, cacheable=True),
name='password_change'),
url(r'^password_change/done/$',
wrap(self.password_change_done, cacheable=True),
name='password_change_done'),
url(r'^jsi18n/$',
wrap(self.i18n_javascript, cacheable=True),
name='jsi18n'),
url(r'^r/(?P<content_type_id>\d+)/(?P<object_id>.+)/$',
wrap(contenttype_views.shortcut)),
url(r'^(?P<app_label>\w+)/$',
wrap(self.app_index),
name='app_list')
)
# Add in each model's views.
for model, model_admin in self._registry.iteritems():
urlpatterns += patterns('',
url(r'^%s/%s/' % (model._meta.app_label, model._meta.module_name),
include(model_admin.urls))
)
return urlpatterns
** @property**
** def **
=== urls ===
**(self):**
return self.get_urls(), self.app_name, self.name
=== site ===
= AdminSite()

View File

@@ -0,0 +1,347 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-18T17:19:55+08:00
====== Django新手需要注意的10个要点 ======
Created Tuesday 18 October 2011
http://shinyzhu.iteye.com/blog/593427
接触django是从上个月开始学习python时间也不长但我经常在社区看看别人发表的文章早上看到一篇不错的博客却一直不能访问最终从bing的缓存里找到因为害怕丢失和忘掉所以顺便翻译过来放到这里同时也分享给大家贡献给各位django初学的朋友们希望能有一些帮助
原文地址是http://zeroandone.posterous.com/top-10-tips-to-a-new-django-developer
==== 1不要将项目名称包含在引用代码里 ====
比如你创建了一个名为"project"的项目,包含一个名为"app"的应用,那么如下代码是不好的:
from project.app.models import Author
缺点在于:应用和项目变成了**紧耦合**,无法将应用轻易变得可重用。如果将来要换一个项目名称,那你可有得受了。
推荐的做法是:
from app.models import Author
请注意,你需要将项目的路径配置在**PYTHONPATH**中。
===== 2不要硬编码MEDIA_ROOT和TEMPLATE_DIRS =====
项目配置文件settings.py中不要使用如下代码
TEMPLATE_DIRS = ( "/home/html/project/templates",)
MEDIA_ROOT = "/home/html/project/appmedia/"
当你在部署到生产环境,或者迁移服务器的时候,就会发生问题。
推荐使用如下方式:
SITE_ROOT = os.path.realpath(os.path.dirname(__file__))
MEDIA_ROOT = os.path.join(SITE_ROOT, '//appmedia//')
TEMPLATE_DIRS = ( os.path.join(SITE_ROOT, 'templates'),)
也可以使用abspath跟realpath的区别请参考http://rob.cogit8.org/blog/2009/May/05/django-and-relativity-updated/
===== 3不要将静态文件的路径硬编码在模板中 =====
模板中链接CSSjavascript或图片的时候不建议使用如下方式
<link rel="stylesheet" type="text/css" href="/appmedia/amazing.css" />
<script type="text/javascript" src="/appmedia/jquery.min.js"></script>
当你的项目需要将静态文件用**其他服务器**提供的时候通常会是另外一个http地址那么你就得把所有的/appmedia/替换成新的地址,做网站写代码已经够乏味的了。
没有后顾之忧的解决方法是使用{{ MEDIA_URL }}代替硬编码的路径:
<link rel="stylesheet" type="text/css" href="{{ MEDIA_URL }}amazing.css" />
<script type="text/javascript" src="{{ MEDIA_URL }}jquery.min.js"></script>
模板上下文变量怎么获取到呢?请使用 **RequestContext**即可:
return render_to_response("app/template.html", {'var': 'foo'},
context_instance=RequestContext(request))
从RequestContext里还可以获取到当前用户等信息更详细的介绍请参考http://www.b-list.org/weblog/2006/jun/14/django-tips-template-context-processors/
===== 4不要将业务逻辑代码写到视图里 =====
不要迷惑虽然你可能看过很多书和例子它们把逻辑都写在了views.py里但请你别这么做。因为这样不利于单元测试不利于重用代码。
那我的业务逻辑应该放哪里呢推荐放到__模型__里或者单独建立一个**辅助helper模块**。
当然从模型得到一个Author获取Author列表的代码是可以放到视图里面的。
===== 5部署时别忘记将DEBUG设置成False =====
我们常常忘记在部署时禁用DEBUG有很多种方法自动来处理这个配置
if socket.gethostname() == 'productionserver.com':
DEBUG = False
else:
DEBUG = True
此方法请参考http://nicksergeant.com/blog/django/automatically-setting-debug-your-django-app-based-server-hostname
另一种途径是使用不同的配置文件:
#文件名settings_debug.py
#包含调试模式的配置信息
#使用python manage.py runserver settings=settings_debug.py来运行项目
from settings import *
DEBUG = True
#还可以配置更多在调试时使用的变量:)
此方法请参考http://blog.dpeepul.com/2009/07/02/from-now-you-will-never-forget-to-put-debug-true-in-django-production-environment/
===== 6只加载一次自定义的模板标签 =====
当需要使用自定义或者第三方的模板标签和模板过滤器时,通常要在模板中使用:
{% load template_tags %}
实际情况是需要在所有用到自定义模板标签和模板过滤器的模板中都使用上面的代码这样就不DRY了。
from django import template
template.add_to_builtins('app.templatetags.custom_tag_module')
请将以上代码放到项目启动时能加载的模块中settings.py, urls.py, models.py等即可。
上面代码的作用是在项目启动时就把自定义模板标签或过滤器加载进来,模板中任何一个地方都可以使用它们,而不需要{% load template_tags %}。
===== 7合理配置和使用URL =====
不要将URL全都配置在一个urls.py文件中比如
Python代码 收藏代码
urlpatterns = patterns('',
url(r'^askalumini/question/$','.....registerInstitution',name='iregister'),
url(r'^askalumin/answer/$','someview.....',name='newmemberurl'),
url(r'^institution/member/$','someview.....',name="dashboardurl"),
url(r'^institution/faculty/$','editInstitute',name="editinstituteurl"),
url(r'^memeber/editprofile/$','editProfile',name="editprofileurl"),
url(r'^member/changepassword/$','changePassword',name="changepasswordurl"),
url(r'^member/forgotpassword/$','forgotPassword',name="forgotpasswordurl"),
url(r'^member/changepicture/$','changePicture',name="changepictureurl"),
url(r'^member/logout/$','memeberlogout',name="logouturl"), ,
)
建议的方式是将各应用的URL配置在各自的urls.py中这样可以使应用更容易重复使用到不同项目里
Python代码 收藏代码
urlpatterns = patterns('',
(r'^$', include('institution.urls')),
(r'^institution/', include('institution.urls')),
(r'^askalumini/', include('askalumini.urls')),
(r'^member/', include('member.urls')),
)
如下是应用askalumini的urls.py
Python代码 收藏代码
urlpatterns = patterns('askalumini.views',
url(r'^$','askHome',name='askaluminiurl'),
url(r'^questions/(?P<questionno>\d+)/$','displayQuestion',name='askquestiondisplay'),
url(r'^askquestions/$','askQuestion',name='askquestionurl'),
url(r'^postcomment/$','postComment',name="askquestioncomment")
)
刚才提到静态文件路径不要硬编码url的处理方式也尽量不要硬编码否则当你更改一个地址时会牵涉到多处的修改可以使用一些url函数来处理。
在/project/askalumini/urls.py中为每一个url定义了name它可以帮助我们有效地在视图、模板和模型中处理url而不是硬编码。
为保证名称的唯一请遵照将url命名为<appname>/<somelabel>的习惯用法。
举例来说在views.py文件中有如下代码
Python代码 收藏代码
HttpResponseRedirect("/askalumini/questions/54")
请改为:
Python代码 收藏代码
from django.core.urlresolvers import reverse
HttpResponseRedirect(reverse('askquestiondisplay',kwargs={'questionno':q.id}))
在模型中使用models.permalink装饰器来格式url
Python代码 收藏代码
@models.permalink
def get_absolute_url(self):
return ('profileurl2',(),{'userid': self.user.id})
在模板中使用url标签代替硬编码
Html代码 收藏代码
{% url askquestiondisplay 345 %}
<a href="{% url askquestiondisplay 345 %}"> Ask Question </a>
8调试
调试通常会借助一些第三方工具来获得更多的运行时信息。
一个请求执行了多少句SQL花了多长时间
调用的哪个模板客户端设置了什么COOKIESESSION呢。。。
你可以使用django-debug-toolbar查看上面甚至更多的信息http://github.com/robhudson/django-debug-toolbar
另一个工具是Werkzeug debugger它可以在错误页面打开python shell让你更方便的跟踪错误信息请访问http://blog.dpeepul.com/2009/07/14/python-shell-right-on-the-django-error-page/ 获得更多信息。
还有pdb一个强大的调试工具http://ericholscher.com/blog/2008/aug/31/using-pdb-python-debugger-django-debugging-series-/
9了解pinax备用
django最大的优点是代码重用DRYpinax就是这样一个平台包含了许多可拿来直接使用的代码比如openid电子邮件验证等等。请访问http://pinaxproject.com/
10了解一些著名的第三方应用
1数据库升级工具
什么是数据库升级工具你运行了syncdb运行了一年之后对模型做了更改添加了字段删除了字段要再运行syncdb吗或者ALTER TABLE ...
django-evolutions可以帮你完成上面的事情但它好像不够强壮http://code.google.com/p/django-evolution/
South能很强壮地完成上面的事情但是需要学学怎么用http://south.aeracode.org/
2模板系统
django自带的模板系统是可以替换的并且各自有优缺点。
template-utils增强了模板的比较标签等功能 并提供其他的一些实用特性http://django-template-utils.googlecode.com/svn/trunk/docs/
Jinja是一个完整的第三方模板系统可以替换默认模板系统它提供了许多优越的特性http://jinja.pocoo.org/2/
3第三方应用
django command extensions提供了很多实用的命令行功能
shell_plus加载所有django模型
runserver_plus整合了Werkzeug调试工具
生成模型图表,你可以展示给你的老板
……
请参考http://ericholscher.com/blog/2008/sep/12/screencast-django-command-extensions/
Sorl可以生成缩略图http://code.google.com/p/sorl-thumbnail/
…………
---END---
另外,从原文的评论里也有不少发现:
用django.shortcuts的redirect代替HttpResponseRedirecthttp://docs.djangoproject.com/en/dev/topics/http/shortcuts/#redirect
使用VirtualEnv部署django项目
django项目规范http://ericholscher.com/projects/django-conventions/project/
上面提到的10点中第2和第4是最容易在新手中发生的。
第6点并不适合于团队协作

View File

@@ -0,0 +1,143 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T15:06:55+08:00
====== Django构建一个Blog入门例子 ======
Created 星期二 05 七月 2011
1、创建项目文件。windows下在cmd中进入Django项目存放的目录下,比如在D盘符中,取名为mysite进入D盘输入
python django-admin.py startproject mysite
可以看到在D盘中新生成了一个Django项目文件名为mysite文件中包含
__init__.py # 在python中__init__.py文件说明此项目目录为一个python包
manage.py # Django管理工具使用manage.py help查看其使用方法
settings.py # 项目的默认配置,放置的一些静态变量
urls.py # 配置django URL映射
2、启动服务器。在此目录下,输入:
python ./manage.py runserver # 启动Django
输入127.0.0.18000即可看到Its working字样
3、创建项目web应用文件。输入
python ./manage.py startapp blog
可以看到在mysite下新建了一个blog文件文件包含
__init__.py # 视为一个python包
models.py # 编写django数据类创建数据库等
views.py # 编写视图函数,从数据中获取数据显示,映射到模板
4、在models.py中输入
from django.db import models
from django.contrib import admin
class BlogPost(models.Model):
title=models.CharField(max_length=150)
body=models.TextField()
timestamp=models.DateTimeField()
5、设置数据库这里以SQLite3为例在settings.py设置
DATABASE_ENGINE = 'sqlite3' # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = r'D:/djangoapp/db/blog.db' # Or path to database file if using sqlite3.
DATABASE_USER = '' # Not used with sqlite3.
DATABASE_PASSWORD = '' # Not used with sqlite3.
DATABASE_HOST = '' # Set to empty string for localhost. Not used with sqlite3.
DATABASE_PORT = '' # Set to empty string for default. Not used with sqlite3.
6、重启启动Django服务器(python ./manage.py runserver),使用:
python ./manage.py syncdb # 因为Django数据库是ORM对象关系映射,通过models.py定义的类创建数据库对应表
创建数据库过程中会提示设置Django admin后台超级管理员用户和密码用户名输入email
7、设置启动自动admin应用后台在settings.py中
LANGUAGE_CODE = 'zh-CN'
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.admin',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'mysite.blog',
)
在目录中输入:
python ./manage.py syncdb
创建django_admin_log表
修改urls.py去掉注释
# Uncomment the next line to enable the admin:
(r'^admin/', include(admin.site.urls)),
8、修改models.py在admin注册models
from django.db import models
from django.contrib import admin
class BlogPost(models.Model): #继承Django models
title=models.CharField(max_length=150)
body=models.TextField()
timestamp=models.DateTimeField()
admin.site.register(BlogPost)
9、在127.0.0.1:8000/admin/输入刚设置admin用户名和密码
在blog posts中填加数据.
10、定制admin后台添加BlogPostAdmin修改models.py并在admin中注册
from django.db import models
from django.contrib import admin
class BlogPost(models.Model):
title=models.CharField(max_length=150)
body=models.TextField()
timestamp=models.DateTimeField()
class BlogPostAdmin(admin.ModelAdmin):
list_display=('title','timestamp')
admin.site.register(BlogPost,BlogPostAdmin)
11、建立template模板
在mysite/themes/default/templates/archive.html,使用Django模板块标签他将模板渲染到序列中的每个元素
{% extends "base.html" %}
{% block content %}
{% for post in posts %}
<h2>{{post.title}}</h2>
<p>{{post.timestamp|date:"F j, Y"}}</p>
<p>{{post.body}}</p>
{% endfor %}
{% endblock %}
在 base.htmlhtml标签加入:
{% block content %}
{% endblock %}
12、修改view.py编写视图函数渲染模板
# Create your views here.
from django.template import loader, Context
from django.http import HttpResponse
from mysite.blog.models import BlogPost
def archive(request):
posts=BlogPost.objects.all()
t=loader.get_template("archive.html")
c=Context({'posts':posts})
return HttpResponse(t.render(c))
13、激活blog url映射在mysite/urls.py
urlpatterns = patterns('',
# Example:
# (r'^mysite/', include('mysite.foo.urls')),
# Uncomment the admin/doc line below and add 'django.contrib.admindocs'
# to INSTALLED_APPS to enable admin documentation:
# (r'^admin/doc/', include('django.contrib.admindocs.urls')),
# Uncomment the next line to enable the admin:
(r'^admin/', include(admin.site.urls)),
(r'^blog/', include('mysite.blog.urls')), #添加映射
(r'^themes/(?P<path>.*)$', 'django.views.static.serve',
{'document_root': os.path.dirname(os.path.abspath(__file__)) + '/themes/'}),
)
# (r'^themes/(?P<path>.*)$', 'django.views.static。。。因为Django外部调用css无效需要通过url映射进行调用此映射到项目静态文件
如base.html中<link href="/themes/default/style.css" rel="stylesheet" type="text/css"/>
通过映射,/themes/便会匹配到mysite/themes/目录下即可成功加载css文件
创建mysite/blog/urls.py
#coding=utf-8
from django.conf.urls.defaults import *
from mysite.blog.views import archive # 导入视图函数
# 映射URL
urlpatterns=patterns('',
(r'^$',archive),
)
输入http://127.0.0.1:8000/mysite/blog
即可看到文章显示
在base.html中使用了一个免费的css模板文章参考Django开发指南例子本人初涉Django有不足之处望指教共勉共进
源码下载mysite.rar

View File

@@ -0,0 +1,186 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-06T14:57:10+08:00
====== IPython ======
Created Friday 06 January 2012
http://ipython.org/ipython-doc/rel-0.12/overview.html
===== Introduction =====
==== Overview ====
One of Pythons most useful features is __its interactive interpreter__. This system allows very fast __testing of ideas__ without the overhead of creating test files as is typical in most programming languages. However, the interpreter supplied with the standard Python distribution is **somewhat limited** for extended interactive use.
The goal of IPython is to __create a comprehensive environment for interactive and exploratory computing__. To support this goal, IPython has two main components:
* An enhanced interactive Python shell.
* An architecture for **interactive parallel computing**.
All of IPython is open source (released under the revised BSD license).
==== Enhanced interactive Python shell ====
IPythons interactive shell (ipython), has the following goals, amongst others:
* Provide an__ interactive shell __superior to Pythons default. IPython has many features for **object introspection**, __system shell access__, and its own__ special command system__ for adding functionality when working interactively. It tries to be a very efficient environment both for Python code development and for exploration of problems using Python objects (in situations like data analysis).
* Serve as an __embeddable, ready to use interpreter __for your own programs. IPython can be started with a single call from inside another program, providing access to the __current namespace__. This can be very useful both for __debugging__ purposes and for situations where a blend of batch-processing and interactive exploration are needed. New in the 0.9 version of IPython is a reusable wxPython based IPython widget.
* Offer a flexible framework which can be used as the base environment for other systems with Python as the underlying language. Specifically scientific environments like Mathematica, IDL and Matlab inspired its design, but similar ideas can be useful in many fields.
* Allow interactive testing of threaded graphical toolkits. IPython has support for interactive, non-blocking control of GTK, Qt and WX applications via special threading flags. The normal Python shell can only do this for Tkinter applications.
==== Main features of the interactive shell ====
* Dynamic object introspection. One can access docstrings, function definition prototypes, source code, source files and other details of any object accessible to the interpreter with a single keystroke (__?, and using ??__ provides additional detail).
* Searching through modules and namespaces with * wildcards, both when using the ? system and via the__ %psearch__ command.
* Completion in the **local namespace**, by typing __TAB__ at the prompt. This works for keywords, modules, methods, variables and **files** in the current directory. This is supported via the__ readline library__, and full access to configuring readlines behavior is provided. Custom completers can be implemented easily for different purposes (system commands, magic arguments etc.)
* Numbered input/output prompts with __command history__ (persistent across sessions and tied to each profile), full searching in this history and caching of all input and output.
* User-extensible__ magic commands__. A set of commands prefixed with % is available for controlling IPython itself and provides directory control, namespace information and many __aliases __to common system shell commands.
* Alias facility for defining your own system aliases.
* Complete __system shell access__. Lines starting with ! are passed directly to the system shell, and using __!! or var = !cmd__ captures shell output into python variables for further use.
* Background execution of Python commands in a separate thread. IPython has an __internal job manager__ called jobs, and a convenience backgrounding magic function called__ %bg__.
* The ability to **expand python variables when calling the system shell**. In a shell command, any python variable prefixed with __$ __is expanded. A double $$ allows passing a literal $ to the shell (for access to** shell and environment variables** like PATH).
* Filesystem navigation, via a magic__ %cd__ command, along with a persistent bookmark system (using __%bookmark__) for fast access to frequently visited directories.
* A lightweight** persistence framework** via the__ %store__ command, which allows you to save arbitrary Python variables. These get restored automatically when your session restarts.
* Automatic indentation (optional) of code as you type (through the readline library).
* __Macro system__ for quickly re-executing multiple lines of previous input with a single name. Macros can be stored persistently via __%store__ and edited via __%edit__.
* Session logging (you can then later use these logs as code in your programs). Logs can optionally __timestamp all input,__ and also store session output (marked as comments, so the log remains valid Python source code).
* Session restoring: logs can be__ replayed__ to restore a previous session to the state where you left it.
* Verbose and colored **exception traceback** printouts. Easier to parse visually, and in verbose mode they produce a lot of useful debugging information (basically a terminal version of the cgitb module).
* Auto-parentheses: callable objects can be __executed without parenthese__s: sin 3 is automatically converted to sin(3).
* Auto-quoting:__ using , or ; as the first character__ forces auto-quoting of the rest of the line: ,my_function a b becomes automatically my_function("a","b"), while ;my_function a b becomes my_function("a b").
* Extensible input syntax. You can define filters that pre-process user input to simplify input in special situations. This allows for example pasting multi-line code fragments which start with >>> or ... such as those from other python sessions or the standard Python documentation.
* Flexible configuration system. It uses a **configuration file** which allows permanent setting of all command-line options, module loading, code and file execution. The system allows__ recursive file inclusion__, so you can have a base file with defaults and layers which load other customizations for particular projects.
* Embeddable. You can call IPython as a python shell __inside your own python programs__. This can be used both for debugging code or for providing interactive abilities to your programs with knowledge about the** local namespaces** (very useful in debugging and data analysis situations).
* Easy debugger access. You can set IPython to call up an enhanced version of the Python debugger (pdb) every time there is an uncaught exception. This drops you inside the code which triggered the exception with all the data live and it is possible to navigate the stack to rapidly isolate the source of a bug. The __%run __magic command (with the -d option) can **run any script under pdbs control,** automatically setting initial breakpoints for you. This version of pdb has IPython-specific improvements, including tab-completion and traceback coloring support. For even easier debugger access, try __%debug__ after seeing an exception. winpdb is also supported, see ipy_winpdb extension.
* Profiler support. You can run single statements (similar to profile.run()) or complete programs under the profilers control. While this is possible with standard cProfile or profile modules, IPython wraps this functionality with magic commands (see %prun and %run -p) convenient for rapid interactive work.
* Doctest support. The special__ %doctest_mode__ command toggles a mode that allows you to paste existing doctests (with leading >>> prompts and whitespace) and uses doctest-compatible prompts and output, so you can use IPython sessions as doctest code.
===== Interactive parallel computing =====
Increasingly, parallel computer hardware, such as multicore CPUs, clusters and supercomputers, is becoming ubiquitous. Over the last 3 years, we have developed an architecture within IPython that allows such hardware to be used quickly and easily from Python. Moreover, this architecture is designed to support interactive and collaborative parallel computing.
The main features of this system are:
Quickly parallelize Python code from an interactive Python/IPython session.
A flexible and dynamic process model that be deployed on anything from multicore workstations to supercomputers.
An architecture that supports many different styles of parallelism, from message passing to task farming. And all of these styles can be handled interactively.
Both blocking and fully asynchronous interfaces.
High level APIs that enable many things to be parallelized in a few lines of code.
Write parallel code that will run unchanged on everything from multicore workstations to supercomputers.
Full integration with Message Passing libraries (MPI).
Capabilities based security model with full encryption of network connections.
Share live parallel jobs with other users securely. We call this collaborative parallel computing.
Dynamically load balanced task farming system.
Robust error handling. Python exceptions raised in parallel execution are gathered and presented to the top-level code.
For more information, see our overview of using IPython for parallel computing.
===== Portability and Python requirements =====
As of the 0.11 release, IPython works with Python 2.6 and 2.7. Versions 0.9 and 0.10 worked with Python 2.4 and above. IPython now also supports Python 3, although for now the code for this is separate, and kept up to date with the main IPython repository. In the future, these will converge to a single codebase which can be automatically translated using 2to3.
IPython is known to work on the following operating systems:
Linux
Most other Unix-like OSs (AIX, Solaris, BSD, etc.)
Mac OS X
Windows (CygWin, XP, Vista, etc.)
See here for instructions on how to install IPython.
===== Whats new in IPython =====
This section documents the changes that have been made in various versions of IPython. Users should consult these pages to learn about new features, bug fixes and backwards incompatibilities. Developers should summarize the development work they do here in a user friendly format.
==== Release 0.12 ====
IPython 0.12 contains several major new features, as well as a large amount of bug and regression fixes. The 0.11 release brought with it a lot of new functionality and major refactorings of the codebase; by and large this has proven to be a success as the number of contributions to the project has increased dramatically, proving that the code is now much more approachable. But in the refactoring inevitably some bugs were introduced, and we have also squashed many of those as well as recovered some functionality that had been temporarily disabled due to the API changes.
The following major new features appear in this version.
==== An interactive browser-based Notebook with rich media support ====
A powerful new interface puts IPython in your browser. You can start it with the command__ ipython notebook__:
The new IPython notebook showing text, mathematical expressions in__ LaTeX__, code, results and embedded figures created with __Matplotlib__.
This new interface maintains all the features of IPython you are used to, as it is **a new client that communicates with the same IPython kernels** used by the terminal and Qt console. But the web notebook provides for a different workflow where you can integrate, along with code execution, also text, mathematical expressions, graphics, video, and virtually any content that a modern browser is capable of displaying.
You can save your work sessions as documents that retain all these elements and which can be version controlled, emailed to colleagues or saved as HTML or PDF files for printing or publishing statically on the web. The internal storage format is a__ JSON__ file that can be easily manipulated for manual exporting to other formats.
This Notebook is a major milestone for IPython, as for years we have tried to build this kind of system. We were inspired originally by the excellent implementation in Mathematica, we made a number of attempts using older technologies in earlier Summer of Code projects in 2005 (both students and Robert Kern developed early prototypes), and in recent years we have seen the excellent implementation offered by the Sage <http://sagemath.org> system. But we continued to work on something that would be consistent with the rest of IPythons design, and it is clear now that the effort was worth it: based on the ZeroMQ communications architecture introduced in version 0.11, the notebook can now__ retain 100% of the features of the real IPython__. But it can also provide the rich media support and high quality Javascript libraries that were not available in browsers even one or two years ago (such as high-quality mathematical rendering or built-in video).
The notebook has too many useful and important features to describe in these release notes; our documentation now contains a directory called examples/notebooks with several notebooks that illustrate various aspects of the system. You should start by reading those named 00_notebook_tour.ipynb and 01_notebook_introduction.ipynb first, and then can proceed to read the others in any order you want.
To start the notebook server, go to a directory containing the notebooks you want to open (or where you want to create new ones) and type:
**ipython notebook**
You can see all the relevant options with:
ipython notebook --help
ipython notebook --help-all # even more
and just like the Qt console, you can start the notebook server with pylab support by using:
**ipython notebook --pylab**
for floating matplotlib windows or:
**ipython notebook --pylab inline**
for plotting support with automatically inlined figures. Note that it is now possible also to activate pylab support at runtime via %pylab, so you do not need to make this decision when starting the server.
See the Notebook docs for technical details.
===== Two-process terminal console =====
Based on the same architecture as the notebook and the Qt console, we also have now a terminal-based console that can connect to an external IPython kernel (the same kernels used by the Qt console or the notebook, in fact). While this client behaves almost identically to the usual IPython terminal application, this capability can be very useful to __attach an interactive console to an existing kernel__ that was started externally. It lets you use the interactive %debug facilities in a notebook, for example (the web browser cant interact directly with the debugger) or debug a third-party code where you may have embedded an IPython kernel.
This is also something that we have wanted for a long time, and which is a culmination (as a team effort) of the work started last year during the 2010 Google Summer of Code project.
===== Tabbed QtConsole =====
The QtConsole now supports starting multiple kernels in tabs, and has a menubar, so it looks and behaves more like a real application. Keyboard enthusiasts can disable the menubar with ctrl-shift-M (PR #887).
The improved Qt console for IPython, now with tabs to control multiple kernels and full menu support.
===== Full Python 3 compatibility =====
IPython can now be installed from a single codebase on Python 2 and Python 3. The installation process for Python 3 automatically runs 2to3. The same default profile is now used for Python 2 and 3 (the previous version had a separate python3 profile).
Standalone Kernel
The ipython kernel subcommand has been added, to allow starting a standalone kernel, that can be used with various frontends. You can then later connect a Qt console or a terminal console to this kernel by typing e.g.:
ipython qtconsole --existing
if its the only one running, or by passing explicitly the connection parameters (printed by the kernel at startup).
===== PyPy support =====
The terminal interface to IPython now runs under __PyPy__. We will continue to monitor PyPys progress, and hopefully before long at least well be able to also run the notebook. The Qt console may take longer, as Qt is a very complex set of bindings to a huge C++ library, and that is currently the area where PyPy still lags most behind. But for everyday interactive use at the terminal, with this release and PyPy 1.7, things seem to work quite well from our admittedly limited testing.
===== Other important new features =====
* SSH Tunnels: In 0.11, the IPython.parallel Client could tunnel its connections to the Controller via ssh. Now, the QtConsole supports ssh tunneling, as do parallel engines.
* relaxed command-line parsing: 0.11 was released with overly-strict command-line parsing, preventing the ability to specify arguments with spaces, e.g. ipython --pylab qt or ipython -c "print 'hi'". This has been fixed, by using argparse. The new parsing is a strict superset of 0.11, so any commands in 0.11 should still work in 0.12.
* HistoryAccessor: The HistoryManager class for interacting with your IPython SQLite history database has been split, adding a parent HistoryAccessor class, so that users can write code to access and search their IPython history without being in an IPython session (PR #824).
* kernel %gui and %pylab: The %gui and %pylab magics have been restored to the IPython kernel (e.g. in the qtconsole or notebook). This allows activation of pylab-mode, or eventloop integration after starting the kernel, which was unavailable in 0.11. Unlike in the terminal, this can be set only once, and cannot be changed.
* %config: A new %config magic has been added, giving easy access to the IPython configuration system at runtime (PR #923).
* Multiline History: Multiline readline history has been restored to the Terminal frontend by default (PR #838).
* %store: The %store magic from earlier versions has been updated and re-enabled (storemagic; PR #1029). To autorestore stored variables on startup, specify c.StoreMagic.autorestore = True in ipython_config.py.

View File

@@ -0,0 +1,222 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-06T20:30:56+08:00
====== IPython as a system shell ======
Created Friday 06 January 2012
Warning
As of the 0.11 version of IPython, most of the APIs used by the shell profile have been changed, so the profile currently does very little beyond changing the IPython prompt. To help restore the shell profile to past functionality described here, the old code is found in IPython/deathrow, which needs to be updated to use the APIs in 0.11.
===== Overview =====
The sh profile __optimizes IPython for system shell __usage. Apart from certain** job control** functionality that is present in unix (ctrl+z does “suspend”), the sh profile should provide you with__ most of the functionality__ you use daily in system shell, and more. Invoke IPython in sh profile by doing **ipython -p sh**, or (in win32) by launching the “pysh” shortcut in start menu.
If you want to use the features of sh profile as your defaults (which might be a good idea if you use other profiles a lot of the time but still want the convenience of sh profile), add import ipy_profile_sh to your $IPYTHON_DIR/ipy_user_conf.py.
The sh profile is different from the default profile in that:
Prompt shows the current directory
Spacing between prompts and input is more compact (no padding with empty lines). The startup banner is more compact as well.
System commands are directly available (in alias table) without requesting %rehashx - however, if you install new programs along your PATH, you might want to run %rehashx to update the persistent alias table
Macros are stored in raw format by default. That is, instead of _ip.system(“cat foo”), the macro will contain text cat foo)
Autocall is in full mode
Calling “up” does “cd ..”
The sh profile is different from the now-obsolete (and unavailable) pysh profile in that the $$var = command and $var = command syntax is not supported anymore. Use var = !command instead (which is available in all IPython profiles).
Aliases
All of your $PATH has been loaded as IPython aliases, so you should be able to type any normal system command and have it executed. See %alias? and %unalias? for details on the alias facilities. See also %rehashx? for details on the mechanism used to load $PATH.
Directory management
Since each command passed by ipython to the underlying system is executed in a subshell which exits immediately, you can NOT use !cd to navigate the filesystem.
IPython provides its own builtin %cd magic command to move in the filesystem (the % is not required with automagic on). It also maintains a list of visited directories (use %dhist to see it) and allows direct switching to any of them. Type cd? for more details.
%pushd, %popd and %dirs are provided for directory stack handling.
Enabled extensions
Some extensions, listed below, are enabled as default in this profile.
envpersist
%env can be used to “remember” environment variable manipulations. Examples:
%env - Show all environment variables
%env VISUAL=jed - set VISUAL to jed
%env PATH+=;/foo - append ;foo to PATH
%env PATH+=;/bar - also append ;bar to PATH
%env PATH-=/wbin; - prepend /wbin; to PATH
%env -d VISUAL - forget VISUAL persistent val
%env -p - print all persistent env modifications
ipy_which
%which magic command. Like which in unix, but knows about ipython aliases.
Example:
[C:/ipython]|14> %which st
st -> start .
[C:/ipython]|15> %which d
d -> dir /w /og /on
[C:/ipython]|16> %which cp
cp -> cp
== c:\bin\cp.exe
c:\bin\cp.exe
ipy_app_completers
Custom tab completers for some apps like svn, hg, bzr, apt-get. Try apt-get install <TAB> in debian/ubuntu.
ipy_rehashdir
Allows you to add system command aliases for commands that are not along your path. Lets say that you just installed Putty and want to be able to invoke it without adding it to path, you can create the alias for it with rehashdir:
[~]|22> cd c:/opt/PuTTY/
[c:opt/PuTTY]|23> rehashdir .
<23> ['pageant', 'plink', 'pscp', 'psftp', 'putty', 'puttygen', 'unins000']
Now, you can execute any of those commams directly:
[c:opt/PuTTY]|24> cd
[~]|25> putty
(the putty window opens).
If you want to store the alias so that it will always be available, do %store putty. If you want to %store all these aliases persistently, just do it in a for loop:
[~]|27> for a in _23:
|..> %store $a
|..>
|..>
Alias stored: pageant (0, 'c:\\opt\\PuTTY\\pageant.exe')
Alias stored: plink (0, 'c:\\opt\\PuTTY\\plink.exe')
Alias stored: pscp (0, 'c:\\opt\\PuTTY\\pscp.exe')
Alias stored: psftp (0, 'c:\\opt\\PuTTY\\psftp.exe')
...
mglob
Provide the magic function %mglob, which makes it easier (than the find command) to collect (possibly recursive) file lists. Examples:
[c:/ipython]|9> mglob *.py
[c:/ipython]|10> mglob *.py rec:*.txt
[c:/ipython]|19> workfiles = %mglob !.svn/ !.hg/ !*_Data/ !*.bak rec:.
Note that the first 2 calls will put the file list in result history (_, _9, _10), and the last one will assign it to workfiles.
Prompt customization
The sh profile uses the following prompt configurations:
c.PromptManager.in_template = r'{color.LightGreen}\u@\h{color.LightBlue}[{color.LightCyan}\Y1{color.LightBlue}]{color.Green}|\#> '
c.PromptManager.in2_template = r'{color.Green}|{color.LightGreen}\D{color.Green}> '
c.PromptManager.out_template = r'<\#> '
You can change the prompt configuration to your liking by editing ipython_config.py.
String lists
String lists (IPython.utils.text.SList) are handy way to process output from system commands. They are produced by var = !cmd syntax.
First, we acquire the output of ls -l:
[Q:doc/examples]|2> lines = !ls -l
==
['total 23',
'-rw-rw-rw- 1 ville None 1163 Sep 30 2006 example-demo.py',
'-rw-rw-rw- 1 ville None 1927 Sep 30 2006 example-embed-short.py',
'-rwxrwxrwx 1 ville None 4606 Sep 1 17:15 example-embed.py',
'-rwxrwxrwx 1 ville None 1017 Sep 30 2006 example-gnuplot.py',
'-rwxrwxrwx 1 ville None 339 Jun 11 18:01 extension.py',
'-rwxrwxrwx 1 ville None 113 Dec 20 2006 seteditor.py',
'-rwxrwxrwx 1 ville None 245 Dec 12 2006 seteditor.pyc']
Now, lets take a look at the contents of lines (the first number is the list element number):
[Q:doc/examples]|3> lines
<3> SList (.p, .n, .l, .s, .grep(), .fields() available). Value:
0: total 23
1: -rw-rw-rw- 1 ville None 1163 Sep 30 2006 example-demo.py
2: -rw-rw-rw- 1 ville None 1927 Sep 30 2006 example-embed-short.py
3: -rwxrwxrwx 1 ville None 4606 Sep 1 17:15 example-embed.py
4: -rwxrwxrwx 1 ville None 1017 Sep 30 2006 example-gnuplot.py
5: -rwxrwxrwx 1 ville None 339 Jun 11 18:01 extension.py
6: -rwxrwxrwx 1 ville None 113 Dec 20 2006 seteditor.py
7: -rwxrwxrwx 1 ville None 245 Dec 12 2006 seteditor.pyc
Now, lets filter out the embed lines:
[Q:doc/examples]|4> l2 = lines.grep('embed',prune=1)
[Q:doc/examples]|5> l2
<5> SList (.p, .n, .l, .s, .grep(), .fields() available). Value:
0: total 23
1: -rw-rw-rw- 1 ville None 1163 Sep 30 2006 example-demo.py
2: -rwxrwxrwx 1 ville None 1017 Sep 30 2006 example-gnuplot.py
3: -rwxrwxrwx 1 ville None 339 Jun 11 18:01 extension.py
4: -rwxrwxrwx 1 ville None 113 Dec 20 2006 seteditor.py
5: -rwxrwxrwx 1 ville None 245 Dec 12 2006 seteditor.pyc
Now, we want strings having just file names and permissions:
[Q:doc/examples]|6> l2.fields(8,0)
<6> SList (.p, .n, .l, .s, .grep(), .fields() available). Value:
0: total
1: example-demo.py -rw-rw-rw-
2: example-gnuplot.py -rwxrwxrwx
3: extension.py -rwxrwxrwx
4: seteditor.py -rwxrwxrwx
5: seteditor.pyc -rwxrwxrwx
Note how the line with total does not raise IndexError.
If you want to split these (yielding lists), call fields() without arguments:
[Q:doc/examples]|7> _.fields()
<7>
[['total'],
['example-demo.py', '-rw-rw-rw-'],
['example-gnuplot.py', '-rwxrwxrwx'],
['extension.py', '-rwxrwxrwx'],
['seteditor.py', '-rwxrwxrwx'],
['seteditor.pyc', '-rwxrwxrwx']]
If you want to pass these separated with spaces to a command (typical for lists if files), use the .s property:
[Q:doc/examples]|13> files = l2.fields(8).s
[Q:doc/examples]|14> files
<14> 'example-demo.py example-gnuplot.py extension.py seteditor.py seteditor.pyc'
[Q:doc/examples]|15> ls $files
example-demo.py example-gnuplot.py extension.py seteditor.py seteditor.pyc
SLists are inherited from normal python lists, so every list method is available:
[Q:doc/examples]|21> lines.append('hey')
Real world example: remove all files outside version control
First, capture output of “hg status”:
[Q:/ipython]|28> out = !hg status
==
['M IPython\\extensions\\ipy_kitcfg.py',
'M IPython\\extensions\\ipy_rehashdir.py',
...
'? build\\lib\\IPython\\Debugger.py',
'? build\\lib\\IPython\\extensions\\InterpreterExec.py',
'? build\\lib\\IPython\\extensions\\InterpreterPasteInput.py',
...
(lines starting with ? are not under version control).
[Q:/ipython]|35> junk = out.grep(r'^\?').fields(1)
[Q:/ipython]|36> junk
<36> SList (.p, .n, .l, .s, .grep(), .fields() availab
...
10: build\bdist.win32\winexe\temp\_ctypes.py
11: build\bdist.win32\winexe\temp\_hashlib.py
12: build\bdist.win32\winexe\temp\_socket.py
Now we can just remove these files by doing rm $junk.s.
The .s, .n, .p properties
The .s property returns one string where lines are separated by single space (for convenient passing to system commands). The .n property return one string where the lines are separated by a newline (i.e. the original output of the function). If the items in string list are file names, .p can be used to get a list of “path” objects for convenient file manipulation.

View File

@@ -0,0 +1,709 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-06T16:12:44+08:00
====== IPython reference ======
Created Friday 06 January 2012
http://ipython.org/ipython-doc/rel-0.12/interactive/reference.html
===== Command-line usage =====
You start IPython with the command:
**$ ipython [options] files**
Note
For IPython on Python 3, use ipython3 in place of ipython.
If invoked with no options, it executes all the files listed in sequence and __drops you into the interpreter __while still acknowledging any options you may have set in your __ipython_config.py__. This behavior is different from standard Python, which when called as python -i will only execute one file and ignore your configuration setup.
Please note that some of the configuration options are not available at the command line, simply because they are not practical here. Look into your configuration files for details on those. There are **separate** configuration files for each profile, and the files look like __“ipython_config.py” or “ipython_config_<frontendname>.py”__. Profile directories look like “profile_profilename” and are typically installed in the IPYTHON_DIR directory. For Linux users, this will be $HOME/.config/ipython, and for other users it will be $HOME/.ipython. For Windows users, $HOME resolves to C:\Documents and Settings\YourUserName in most instances.
===== Eventloop integration =====
Previously IPython had command line options for controlling GUI event loop integration (-gthread, -qthread, -q4thread, -wthread, -pylab). As of IPython version 0.11, these have been removed. Please see the new %gui magic command or this section for details on the new interface, or specify the gui at the commandline:
$ ipython --gui=qt
===== Command-line Options =====
To see the options IPython accepts, use ipython --help (and you probably should run the output through a pager such as ipython --help | less for more convenient reading). This shows all the options that have a __single-word alias__ to control them, but IPython lets you configure all of its objects from the command-line by passing the __full class name and a corresponding value__; type ipython --help-all to see this full list. For example:
ipython --pylab qt #single-word形式
is equivalent to:
ipython --TerminalIPythonApp.pylab='qt' #full class name and a corresponding value形式。
Note that in the second form, you must use the __equal sign__, as the expression is evaluated as __an actual Python assignment.__ While in the above example the short form is more convenient, only the most common options have a short form, while any configurable variable in IPython can be set at the command-line by using the long form. This long form is the **same syntax used in the configuration files**, if you want to set these options permanently.
===== Interactive use =====
IPython is meant to work as a __drop-in replacement__ for the standard interactive interpreter. As such, any code which is valid python should execute normally under IPython (cases where this is not true should be reported as bugs). It does, however, offer many features which are not available at a standard python prompt. What follows is a list of these.
* **Magic command system**
IPython will treat any line whose** first character** is a % as a special call to a__ magic function__. These allow you to control the behavior of IPython itself, plus a lot of system-type features. They are all prefixed with a % character, but **parameters are given without parentheses or quotes**.
Example: typing %cd mydir changes your working directory to mydir, if it exists.
If you have __automagic __enabled (as it by default), you __dont need to__ type in the % explicitly. IPython will scan its **internal list of magic functions** and call one if it exists. With automagic on you can then just type **cd mydir** to go to directory mydir. The automagic system has the __lowest possible precedence__ in name searches, so defining an identifier with the same name as an existing magic function will __shadow__ it for automagic use. You can still access the shadowed magic function by explicitly using the % character at the beginning of the line.
An example (with automagic on) should clarify all this:
In [1]: cd ipython # %cd is called by automagic
/home/fperez/ipython
In [2]: cd=1 # now cd is just a variable
In [3]: cd .. # and __doesn't work as a function anymore__
File "<ipython-input-3-9fedb3aff56c>", line 1
cd ..
^
SyntaxError: invalid syntax
In [4]: %cd .. # __but %cd always works__
/home/fperez
In [5]: **del cd ** # if you remove the cd variable, automagic works again
In [6]: cd ipython
/home/fperez/ipython
You can __define your own magic functions__ to extend the system. The following example defines a new magic command, %impall:
ip = get_ipython()
def doimp(self, arg):
ip = self.api
ip.ex("import %s; reload(%s); from %s import *" % (arg,arg,arg) )
ip.define_magic('impall', doimp)
Type __%magic__ for more information, including a list of all available magic functions at any time and their docstrings. You can also type %magic_function_name? (see below for information on the ? system) to get information about any particular magic function you are interested in.
The API documentation for the** IPython.core.magic** module contains the full docstrings of all currently available magic commands.
**• Access to the standard Python help**
Simply type __help()__ to access Pythons standard help system. You can alclso type __help(object)__ for information about **a given object**, or __help('keyword')__ for information on a keyword. You may need to configure your PYTHONDOCS environment variable for this feature to work correctly.
* **Dynamic object information**
Typing __?word__ or __word?__ prints detailed information about an object. If certain strings in the object are too long (e.g. function signatures) they get snipped in the center for brevity. This system gives access variable** types and values, docstrings, function prototypes** and other useful information.
If the information will not fit in the terminal, it is displayed in a__ pager__ (less if available, otherwise a basic internal pager).
Typing __??word or word??__ gives access to the full information, including the source code where possible. Long strings are not snipped.
The following magic functions are particularly useful for** gathering information** about your working environment. You can get more details by typing **%magic** or querying them individually (__%function_name?__); this is just a summary:
* %pdoc <object>: Print (or run through a pager if too long) the__ docstring__ for an object. If the given object is a class, it will print both the class and the constructor docstrings.
* %pdef <object>: Print the definition header for any** callable object**. If the object is a class, print the constructor information.
* %psource <object>: Print (or run through a pager if too long) the source code for an object.
* %pfile <object>: Show the **entire source file** where an object was defined via a pager, opening it at the line where the object definition begins.
* __%who/%whos:__ These functions give information about** identifiers** you have defined interactively (not things you loaded or defined in your configuration files). %who just prints a list of identifiers and %whos prints a table with some basic details about each identifier.
Note that the dynamic object information functions (__?/??, %pdoc, %pfile, %pdef, %psource__) work on object attributes, as well as directly __on variables__. For example, after doing import os, you can use __os.path.abspath??__.
* **Readline-based features**
These features require the **GNU readline library,** so they wont work if your Python installation lacks readline support. We will first describe the default behavior IPython uses, and then how to change it to suit your preferences.
**Command line completion**
__At any time__, hitting TAB will complete any available python commands or variable names, and show you a list of the possible completions if theres no unambiguous one. It will also complete** filenames** in the current directory if no python names match what youve typed so far.
**Search command history**
IPython provides two ways for searching through previous input and thus reduce the need for repetitive typing:
* Start typing, and then use **Ctrl-p (previous,up) and Ctrl-n** (next,down) to search through only the history items that match what youve typed so far. If you use Ctrl-p/Ctrl-n at a blank prompt, they just behave like normal arrow keys.
* Hit **Ctrl-r**: opens __a search prompt__. Begin typing and the system searches your history for lines that contain what youve typed so far, completing as much as it can.
**Persistent command history across sessions**
IPython will save your__ input history__ when it leaves and reload it next time you restart it. By default, the history file is named $IPYTHON_DIR/profile_<name>/**history.sqlite**. This allows you to keep separate histories related to various tasks: commands related to numerical work will not be clobbered by a system shell history, for example.
**Autoindent**
IPython can recognize lines __ending in : __and indent the next line, while also un-indenting automatically after raise or return.
This feature uses the readline library, so it will honor your ~/.inputrc configuration (or whatever file your INPUTRC variable points to). Adding the following lines to your .inputrc file can make indenting/unindenting more convenient (M-i indents, M-u unindents):
$if Python
"\M-i": " "
"\M-u": "\d\d\d\d"
$endif
Note that there are __4 spaces__ between the quote marks after “M-i” above.
Warning
__Autoindent is ON__ by default, but it can cause problems with the pasting of **multi-line indented code** (the pasted code gets re-indented on each line). A magic function %autoindent allows you to toggle it on/off at runtime. You can also disable it permanently on in your ipython_config.py file (set TerminalInteractiveShell.autoindent=False).
If you want to paste multiple lines in the terminal, it is recommended that you use __%paste__.
**Customizing readline behavior**
All these features are based on the GNU readline library, which has an extremely customizable interface. Normally, readline is configured via a file which defines the behavior of the library; the details of the syntax for this can be found in the **readline documentation **available with your system or on the Internet. IPython **doesnt read this file** (if it exists) directly, but it does support passing to readline valid options via a simple interface. In brief, you can customize readline by setting the following options in your configuration file (note that these options can not be specified at the command line):
* readline_parse_and_bind: this holds a list of strings to be executed via a readline.parse_and_bind() command. The syntax for valid commands of this kind can be found by reading the documentation for the GNU readline library, as these commands are of the kind which readline accepts in its configuration file.
* readline_remove_delims: a string of characters to be removed from the default word-delimiters list used by readline, so that completions may be performed on strings which contain them. Do not change the default value unless you know what youre doing.
* **Session logging and restoring**
You can__ log all input__ from a session either by starting IPython with the command line switch **--logfile=foo.py** (see here) or by activating the logging at any moment with the magic function __%logstart__.
Log files **can later be reloaded** by __running them as scripts__ and IPython will attempt to replay the log by executing all the lines in it, thus restoring the state of a previous session. This feature is not quite perfect, but can still be useful in many cases.
The log files can also be used as a way to have **a permanent record** of any code you wrote while experimenting. Log files are regular text files which you can later open in your favorite text editor to extract code or to clean them up before using them to replay a session.
The __%logstart__ function for activating logging in mid-session is used as follows:
//%logstart [log_name [log_mode]]//
If no name is given, it defaults to a file named **ipython_log.py** in your current working directory, in** rotate** mode (see below).
%logstart name saves to file name in__ backup__ mode. It saves your history up to that point and then continues logging.
%logstart takes a second optional parameter: logging mode. This can be one of (note that the modes are given unquoted):
* [over:] overwrite existing log_name.
* [backup:] rename (if exists) to log_name~ and start log_name.
* [append:] well, that says it.
* [rotate:] create rotating logs log_name.1~, log_name.2~, etc.
The __%logoff and %logon __functions allow you to temporarily stop and resume logging to a file which had previously been started with %logstart. They will fail (with an explanation) if you try to use them before logging has been started.
* **System shell access**
Any input line __beginning with a ! character__ is passed** verbatim **(minus the !, of course) to the underlying operating system. For example, typing !ls will run ls in the current directory.
* **Manual capture of command output**
You can __assign the result of a system command to a Python variable __with the syntax **myfiles = !ls**. This gets machine readable output from stdout (e.g. without colours), and **splits on newlines**.
To explicitly get this sort of output **without assigning to a variable**, use two exclamation marks__ (!!ls) or the %sx __magic command.
系统命令的输出**按行被保存到列表变量中**。
The captured list has some convenience features.__ myfiles.n or myfiles.s __returns a string delimited by **newlines or spaces, respectively**. __myfiles.p__ produces path objects from the list items. See String lists for details.
IPython also allows you to__ expand the value of python variables__ when making system calls. Wrap __variables or expressions in {braces}__:
In [1]: pyvar = 'Hello world'
In [2]: !echo "A python variable: {pyvar}"
A python variable: Hello world
In [3]: import math
In [4]: x = 8
In [5]: !echo {math.factorial(x)}
40320
For simple cases, you can alternatively prepend__ $ to a variable name__:
In [6]: !echo $sys.argv
[/home/fperez/usr/bin/ipython]
In [7]: !echo "A system variable: $$HOME" # __Use $$ for literal $__
A system variable: /home/fperez
===== System command aliases =====
The__ %alias __magic function allows you to define magic functions which are in fact** system shell commands**. These aliases can have** parameters**.
__%alias alias_name cmd #defines alias_name as an alias for cmd__
Then, typing **alias_name **params will execute the system command cmd params (from your underlying operating system).
直接输入alias_name就会执行system shell命令。
You can also__ define aliases with parameters using %s specifiers__ (one per parameter). The following example defines the parts function as an alias to the command echo first %s second %s where each %s will be replaced **by a positional parameter **to the call to %parts:
In [1]: %alias parts echo first %s second %s
In [2]: parts A B
first A second B
In [3]: parts A
ERROR: Alias <parts> requires 2 arguments, 1 given.
If called with no parameters, __%alias __prints the table of currently defined aliases.
The __%rehashx __magic allows you to load your entire $PATH as ipython aliases. See its docstring for further details.
也就是说将$PATH下的所有可执行命令都建立相应的别名这样在ipython中可以直接调用它们。
===== Recursive reload =====
The** IPython.lib.deepreload** module allows you to__ recursively reload a module__: changes made to any of its dependencies will be reloaded without having to exit. To start using it, do:
**from IPython.lib.deepreload import reload as dreload**
#然后使用
dreload(module-name)
即可递归地重载module-name及其依赖的模块。
===== Verbose and colored exception traceback printouts =====
IPython provides the option to see** very detailed exception tracebacks**, which can be especially useful when debugging large programs. You can run any Python file with the __%run__ function to benefit from these detailed tracebacks. Furthermore, both normal and verbose tracebacks can be** colored** (if your terminal supports it) which makes them much easier to parse visually.
See the magic __xmode__ and __colors__ functions for details (just type** %magic**).
These features are basically a terminal version of Ka-Ping Yees cgitb module, now part of the standard Python library.
===== Input caching system =====
IPython offers __numbered prompts __(In/Out) with input and output caching (also referred to as** input history**). All input is saved and can be__ retrieved as variables__ (besides the usual arrow key recall), in addition to the __%rep __magic command that brings a history entry up for editing on the next command line.
The following** GLOBAL** variables always exist (so dont overwrite them!):
* ___i, _ii, _iii__: store previous, next previous and next-next previous inputs.
* __In, _ih__ : a list of all inputs; _ih[n] is the input from line n. If you overwrite In with a variable of your own, you can remake the assignment to the internal list with a simple In=_ih.
Additionally, global variables named ___i<n> __are dynamically created (<n> being the prompt counter), so _i<n> == _ih[<n>] == In[<n>].
For example, what you typed at prompt 14 is available as ___i14, _ih[14] and In[14]__.
This allows you to easily** cut and paste multi line interactive prompts by printing them out**: they print like a **clean string**, without prompt characters. You can also manipulate them like** regular variables** (they are strings), modify or exec them (typing __exec _i9 __will re-execute the contents of input prompt 9.
You can also **re-execute** multiple lines of input easily by using the magic __%rerun or %macro__ functions. The macro system also allows you to re-execute previous lines which include magic function calls (which require special processing). Type %macro? for more details on the macro system.
A history function__ %hist __allows you to see any part of your input history by printing a range of the _i variables.
You can also search (grep) through your history by typing __%hist -g__ somestring. This is handy for searching for URLs, IP addresses, etc. You can bring history entries listed by %hist -g up for editing with the__ %recall __command, or run them immediately with __%rerun__.
===== Output caching system =====
For output that is returned from actions, a system similar to the input cache exists but using _____ instead of _i. __Only actions that produce a result (NOT assignments, for example) are cached__. If you are familiar with Mathematica, IPythons _ variables behave exactly like Mathematicas % variables.
The following GLOBAL variables always exist (so dont overwrite them!):
* [_] (a single underscore) : stores previous output, like Pythons default interpreter.
* [__] (two underscores): next previous.
* [___] (three underscores): next-next previous.
Additionally, global variables named ___<n>__ are dynamically created (<n> being the prompt counter), such that the result of **output <n>** is always available as _<n> (dont use the angle brackets, just the number, e.g.__ _21__).
These variables are also stored in a global dictionary (not a list, since it only has entries for lines which returned a result) available under the names__ _oh__ and __Out__ (similar to _ih and In). So the output from line 12 can be obtained as__ _12, Out[12] or _oh[12]__. If you accidentally overwrite the Out variable you can recover it by typing Out=_oh at the prompt.
This system obviously can potentially put **heavy memory demands** on your system, since it prevents Pythons garbage collector from removing any previously computed results. You can control how many results are kept in memory with the option (at the command line or in your configuration file) __cache_size__. If you set it to 0, the whole system is completely disabled and the prompts revert to the classic **>>>** of normal Python.
===== Directory history =====
Your history of visited directories is kept in the global list ___dh__, and the magic** %cd** command can be used to go to any entry in that list. The **%dhist** command allows you to view this history. Do cd -<TAB> to conveniently view the directory history.
===== Automatic parentheses and quotes =====
These features were adapted from Nathan Grays LazyPython. They are meant to allow less typing for common situations.
=== Automatic parentheses ===
**Callable objects** (i.e. functions, methods, etc) can be invoked like this (notice the **commas** between the arguments):
In [1]: __callable_ob arg1, arg2, arg3__
------> callable_ob(arg1, arg2, arg3)
You can force automatic parentheses by using / as the first character of a line. For example:
In [2]: /globals # becomes 'globals()'
Note that the / MUST be the first character on the line! This wont work:
In [3]: print /globals # syntax error
In most cases the automatic algorithm should work, so you should rarely need to explicitly invoke /. One notable exception is if you are trying to call a function **with a list of tuples as arguments** (the parenthesis will confuse IPython):
In [4]: zip (1,2,3),(4,5,6) # won't work
but this will work:
In [5]:__ /zip __(1,2,3),(4,5,6)
------> zip ((1,2,3),(4,5,6))
Out[5]: [(1, 4), (2, 5), (3, 6)]
IPython tells you that it has altered your command line by displaying the new command line preceded by ->. e.g.:
In [6]: callable list
------> callable(list)
===== Automatic quoting =====
You can force automatic quoting of a functions arguments by using__ , or ; __as the** first character of a line**. For example:
In [1]: ,my_function /home/me # becomes** my_function("/home/me")**
If you use ; __the whole argument is quoted as a single string, while , splits on whitespace__:
In [2]: ,my_function a b c # becomes my_function("a","b","c")
In [3]: ;my_function a b c # becomes my_function("a b c")
Note that the , or ; MUST be the first character on the line! This wont work:
In [4]: x = ,my_function /home/me # syntax error
===== IPython as your default Python environment =====
Python honors the environment variable **PYTHONSTARTUP** and will execute at startup the file referenced by this variable. If you put the following code at the end of that file, then IPython will be your __working environment__ anytime you start Python:
from IPython.frontend.terminal.ipapp import launch_new_instance
launch_new_instance()
raise SystemExit
The raise SystemExit is needed to exit Python when it finishes, otherwise youll be back at the normal Python >>> prompt.
This is probably useful to developers who manage multiple Python versions and dont want to have correspondingly multiple IPython versions. Note that in this mode,** there is no way to pass IPython any command-line options**, as those are trapped first by Python itself.
===== Embedding IPython =====
It is possible to start an IPython instance **inside your own Python programs**. This allows you to evaluate dynamically the state of your code, operate with your variables, analyze them, etc. Note however that __any changes you make to values while in the shell do not propagate back to the running code__, so it is safe to modify your values because you wont break your code in bizarre ways by doing so.
Note
At present, trying to embed IPython from inside IPython causes problems. Run the code samples below **outside** IPython.
This feature allows you to easily have a fully functional python environment for doing __object introspection__ anywhere in your code with a simple function call. In some cases a simple print statement is enough, but if you need to do more detailed analysis of a code fragment this feature can be very valuable.
It can also be useful in scientific computing situations where it is common to need to do some automatic, computationally intensive part and then stop to look at data, plots, etc. Opening an IPython instance will give you **full access to your data and functions**, and you can resume program execution once you are done with the interactive part (perhaps to stop again later, as many times as needed).
The following code snippet is the bare minimum you need to include in your Python programs for this to work (detailed examples follow later):
**from IPython import embed**
**embed() # this call anywhere in your program will start IPython**
You can run embedded instances even in code which is itself being run at the IPython interactive prompt with __%run <filename>__. Since its easy to get lost as to where you are (in your top-level IPython or in your embedded one), its a good idea in such cases to __set the in/out prompts to something different __for the embedded instances. The code examples below illustrate this.
You can also have multiple IPython instances in your program and open them separately, for example with different options for data presentation. If you close and open the same instance multiple times, its prompt counters simply continue from each execution to the next.
Please look at the docstrings in the __embed__ module for more details on the use of this system.
The following sample file illustrating how to use the embedding functionality is provided in the examples directory as** example-embed.py**. It should be fairly self-explanatory:
#!/usr/bin/env python
"""An example of how to embed an IPython shell into a running program.
Please see the documentation in the__ IPython.Shell__ module for more details.
The accompanying file **example-embed-short.py **has quick code fragments for
embedding which you can cut and paste in your code once you understand how
things work.
The code in this file is deliberately extra-verbose, meant for learning."""
# The basics to get you going:
# IPython sets the __IPYTHON__ variable so you can know if you have nested
# copies running.
# Try running this code both at the command line and from inside IPython (with
# %run example-embed.py)
from IPython.config.loader import Config
try:
**get_ipython**
except NameError:
nested = 0
cfg = Config()
prompt_config = cfg.PromptManager
prompt_config.in_template = 'In <\\#>: '
prompt_config.in2_template = ' .\\D.: '
prompt_config.out_template = 'Out<\\#>: '
else:
print "Running nested copies of IPython."
print "The prompts for the nested copy have been modified"
cfg = Config()
nested = 1
# First import the embeddable shell class
from IPython.frontend.terminal.embed import InteractiveShellEmbed
# Now create an instance of the embeddable shell. The first argument is a
# string with options exactly as you would type them if you were starting
# IPython at the system command line. Any parameters you want to define for
# configuration can thus be specified here.
ipshell = InteractiveShellEmbed(config=cfg,
banner1 = 'Dropping into IPython',
exit_msg = 'Leaving Interpreter, back to program.')
# Make a second instance, you can have as many as you want.
cfg2 = cfg.copy()
prompt_config = cfg2.PromptManager
prompt_config.in_template = 'In2<\\#>: '
if not nested:
prompt_config.in_template = 'In2<\\#>: '
prompt_config.in2_template = ' .\\D.: '
prompt_config.out_template = 'Out<\\#>: '
ipshell2 = InteractiveShellEmbed(config=cfg,
banner1 = 'Second IPython instance.')
print '\nHello. This is printed from the main controller program.\n'
# You can then call ipshell() anywhere you need it (with an optional
# message):
ipshell('***Called from top level. '
'Hit Ctrl-D to exit interpreter and continue program.\n'
'Note that if you use %kill_embedded, you can fully deactivate\n'
'This embedded instance so it will never turn on again')
print '\nBack in caller program, moving along...\n'
#---------------------------------------------------------------------------
# More details:
# InteractiveShellEmbed instances don't print the standard system banner and
# messages. The IPython banner (which actually may contain initialization
# messages) is available as get_ipython().banner in case you want it.
# InteractiveShellEmbed instances print the following information everytime they
# start:
# - A global startup banner.
# - A call-specific header string, which you can use to indicate where in the
# execution flow the shell is starting.
# They also print an exit message every time they exit.
# Both the startup banner and the exit message default to None, and can be set
# either at the instance constructor or at any other time with the
# by setting the banner and exit_msg attributes.
# The shell instance can be also put in 'dummy' mode globally or on a per-call
# basis. This gives you fine control for debugging without having to change
# code all over the place.
# The code below illustrates all this.
# This is how the global banner and exit_msg can be reset at any point
ipshell.banner = 'Entering interpreter - New Banner'
ipshell.exit_msg = 'Leaving interpreter - New exit_msg'
def foo(m):
s = 'spam'
ipshell('***In foo(). Try %whos, or print s or m:')
print 'foo says m = ',m
def bar(n):
s = 'eggs'
ipshell('***In bar(). Try %whos, or print s or n:')
print 'bar says n = ',n
# Some calls to the above functions which will trigger IPython:
print 'Main program calling foo("eggs")\n'
foo('eggs')
# The shell can be put in 'dummy' mode where calls to it silently return. This
# allows you, for example, to globally turn off debugging for a program with a
# single call.
ipshell.dummy_mode = True
print '\nTrying to call IPython which is now "dummy":'
ipshell()
print 'Nothing happened...'
# The global 'dummy' mode can still be overridden for a single call
print '\nOverriding dummy mode manually:'
ipshell(dummy=False)
# Reactivate the IPython shell
ipshell.dummy_mode = False
print 'You can even have multiple embedded instances:'
ipshell2()
print '\nMain program calling bar("spam")\n'
bar('spam')
print 'Main program finished. Bye!'
#********************** End of file <example-embed.py> ***********************
Once you understand how the system functions, you can use the following code fragments in your programs which are ready for cut and paste:
"""Quick code snippets for embedding IPython into other programs.
See example-embed.py for full details, this file has the bare minimum code for
cut and paste use once you understand how to use the system."""
#---------------------------------------------------------------------------
# This code loads IPython but modifies a few things if it detects it's running
# embedded in another IPython session (helps avoid confusion)
try:
get_ipython
except NameError:
banner=exit_msg=''
else:
banner = '*** Nested interpreter ***'
exit_msg = '*** Back in main IPython ***'
# First import the embed function
from IPython.frontend.terminal.embed import InteractiveShellEmbed
# Now create the IPython shell instance. Put ipshell() anywhere in your code
# where you want it to open.
ipshell = InteractiveShellEmbed(banner1=banner, exit_msg=exit_msg)
#---------------------------------------------------------------------------
# This code will load an embeddable IPython shell always with no changes for
# nested embededings.
from IPython import embed
# Now embed() will open IPython anywhere in the code.
#---------------------------------------------------------------------------
# This code loads an embeddable shell only if NOT running inside
# IPython. Inside IPython, the embeddable shell variable ipshell is just a
# dummy function.
try:
get_ipython
except NameError:
from IPython.frontend.terminal.embed import InteractiveShellEmbed
ipshell = InteractiveShellEmbed()
# Now ipshell() will open IPython anywhere in the code
else:
# Define a dummy ipshell() so the same code doesn't crash inside an
# interactive IPython
def ipshell(): pass
#******************* End of file <example-embed-short.py> ********************
===== Using the Python debugger (pdb) =====
==== Running entire programs via pdb ====
pdb, the Python debugger, is a powerful __interactive debugger __which allows you to step through code, set breakpoints, watch variables, etc. IPython makes it very easy to start any script under the control of pdb, regardless of whether you have wrapped it into a main() function or not. For this, simply type __%run -d myscript__ at an IPython prompt. See the %run commands documentation (via %run? or in Sec. magic for more details, including how to control where pdb will stop execution first.
For more information on the use of the pdb debugger, read the included __pdb.doc__ file (part of the standard Python distribution). On a stock Linux system it is located at /usr/lib/python2.3/pdb.doc, but the easiest way to read it is by using the help() function of the pdb module as follows (in an IPython prompt):
In [1]: import pdb
In [2]: **pdb.help()**
This will load the pdb.doc document in a file viewer for you automatically.
==== Automatic invocation of pdb on exceptions ====
IPython, if __started with the --pdb option__ (or if the option is set in your config file) can** call the Python pdb debugger every time your code triggers an uncaught exception.** This feature can also be toggled at any time with the__ %pdb __magic command. This can be extremely useful in order to find the origin of subtle bugs, because pdb opens up at the point in your code which triggered the exception, and while your program is at this point dead, all **the data is still available** and you can walk up and down the stack frame and understand the origin of the problem.
Furthermore, you can use these debugging facilities both __with the embedded IPython mode and without IPython at all__. For an embedded shell (see sec. Embedding), simply call the constructor with** --pdb** in the argument string and pdb will automatically be called if an uncaught exception is triggered by your code.
For stand-alone use of the feature in your programs which **do not use IPython at all**, put the following lines toward the top of your main routine:
**import sys**
**from IPython.core import ultratb**
**sys.excepthook = ultratb.FormattedTB(mode='Verbose', color_scheme='Linux', call_pdb=1)**
The __mode__ keyword can be either Verbose or Plain, giving either very detailed or normal tracebacks respectively. The color_scheme keyword can be one of NoColor, Linux (default) or LightBG. These are the same options which can be set in IPython with --colors and --xmode.
This will give any of your programs detailed, colored tracebacks with automatic invocation of pdb.
===== Extensions for syntax processing =====
This isnt for the faint of heart, because the potential for breaking things is quite high. But it can be a very powerful and useful feature. In a nutshell, you can **redefine the way IPython processes the user input line to accept new, special extensions to the syntax without needing to change any of IPythons own code.**
In the IPython/extensions directory you will find some examples supplied, which we will briefly describe now. These can be used __as is __(and both provide very useful functionality), or you can use them as a starting point for writing your own extensions.
=== Pasting of code starting with Python or IPython prompts ===
IPython is smart enough to** filter out input prompts**, be they plain Python ones (>>> and ...) or IPython ones (In [N]: and `` ...:``). You can therefore copy and **paste from existing interactive sessions without worry**.
The following is a screenshot of how things work, copying an example from the standard Python tutorial:
In [1]: >>> # Fibonacci series:
In [2]: ... # the sum of two elements defines the next
In [3]: ... a, b = 0, 1
In [4]: >>> while b < 10:
...: ... print b
...: ... a, b = b, a+b
...:
1
1
2
3
5
8
And pasting from IPython sessions works equally well:
In [1]: In [5]: def f(x):
...: ...: "A simple function"
...: ...: return x**2
...: ...:
In [2]: f(3)
Out[2]: 9
GUI event loop support
New in version 0.11: The %gui magic and IPython.lib.inputhook.
IPython has excellent support for working interactively with Graphical User Interface (GUI) toolkits, such as wxPython, PyQt4/PySide, PyGTK and Tk. This is implemented using Pythons builtin PyOSInputHook hook. This implementation is extremely robust compared to our previous thread-based version. The advantages of this are:
GUIs can be enabled and disabled dynamically at runtime.
The active GUI can be switched dynamically at runtime.
In some cases, multiple GUIs can run simultaneously with no problems.
There is a developer API in IPython.lib.inputhook for customizing all of these things.
For users, enabling GUI event loop integration is simple. You simple use the %gui magic as follows:
%gui [GUINAME]
With no arguments, %gui removes all GUI support. Valid GUINAME arguments are wx, qt, gtk and tk.
Thus, to use wxPython interactively and create a running wx.App object, do:
%gui wx
For information on IPythons Matplotlib integration (and the pylab mode) see this section.
For developers that want to use IPythons GUI event loop integration in the form of a library, these capabilities are exposed in library form in the IPython.lib.inputhook and IPython.lib.guisupport modules. Interested developers should see the module docstrings for more information, but there are a few points that should be mentioned here.
First, the PyOSInputHook approach only works in command line settings where readline is activated. The integration with various eventloops is handled somewhat differently (and more simply) when using the standalone kernel, as in the qtconsole and notebook.
Second, when using the PyOSInputHook approach, a GUI application should not start its event loop. Instead all of this is handled by the PyOSInputHook. This means that applications that are meant to be used both in IPython and as standalone apps need to have special code to detects how the application is being run. We highly recommend using IPythons support for this. Since the details vary slightly between toolkits, we point you to the various examples in our source directory docs/examples/lib that demonstrate these capabilities.
Warning
The WX version of this is currently broken. While --pylab=wx works fine, standalone WX apps do not. See https://github.com/ipython/ipython/issues/645 for details of our progress on this issue.
Third, unlike previous versions of IPython, we no longer “hijack” (replace them with no-ops) the event loops. This is done to allow applications that actually need to run the real event loops to do so. This is often needed to process pending events at critical points.
Finally, we also have a number of examples in our source directory docs/examples/lib that demonstrate these capabilities.
PyQt and PySide
When you use --gui=qt or --pylab=qt, IPython can work with either PyQt4 or PySide. There are three options for configuration here, because PyQt4 has two APIs for QString and QVariant - v1, which is the default on Python 2, and the more natural v2, which is the only API supported by PySide. v2 is also the default for PyQt4 on Python 3. IPythons code for the QtConsole uses v2, but you can still use any interface in your code, since the Qt frontend is in a different process.
The default will be to import PyQt4 without configuration of the APIs, thus matching what most applications would expect. It will fall back of PySide if PyQt4 is unavailable.
If specified, IPython will respect the environment variable QT_API used by ETS. ETS 4.0 also works with both PyQt4 and PySide, but it requires PyQt4 to use its v2 API. So if QT_API=pyside PySide will be used, and if QT_API=pyqt then PyQt4 will be used with the v2 API for QString and QVariant, so ETS codes like MayaVi will also work with IPython.
If you launch IPython in pylab mode with ipython --pylab=qt, then IPython will ask matplotlib which Qt library to use (only if QT_API is not set), via the backend.qt4 rcParam. If matplotlib is version 1.0.1 or older, then IPython will always use PyQt4 without setting the v2 APIs, since neither v2 PyQt nor PySide work.
Warning
Note that this means for ETS 4 to work with PyQt4, QT_API must be set to work with IPythons qt integration, because otherwise PyQt4 will be loaded in an incompatible mode.
It also means that you must not have QT_API set if you want to use --gui=qt with code that requires PyQt4 API v1.
Plotting with matplotlib
Matplotlib provides high quality 2D and 3D plotting for Python. Matplotlib can produce plots on screen using a variety of GUI toolkits, including Tk, PyGTK, PyQt4 and wxPython. It also provides a number of commands useful for scientific computing, all with a syntax compatible with that of the popular Matlab program.
To start IPython with matplotlib support, use the --pylab switch. If no arguments are given, IPython will automatically detect your choice of matplotlib backend. You can also request a specific backend with --pylab=backend, where backend must be one of: tk, qt, wx, gtk, osx.
===== Interactive demos with IPython =====
IPython ships with a basic system for running scripts interactively in sections,__ useful when presenting code to audiences__. A few tags embedded in comments (so that the script remains valid Python code) divide a file into **separate blocks**, and the demo can be run one block at a time, with IPython printing (with syntax highlighting) the block before executing it, and returning to the interactive prompt after each block. The interactive namespace is updated after each block is run with the contents of the demos namespace.
This__ allows you to show a piece of code__, run it and then execute interactively commands based on the variables just created. Once you want to continue, you simply execute the next block of the demo. The following listing shows the__ markup__ necessary for dividing a script into sections for execution as a demo:
"""A simple interactive demo to illustrate the use of IPython's Demo class.
Any python script can be run as a demo, but that does little more than showing
it on-screen, syntax-highlighted in one shot. If you add a little simple
markup, you can stop at specified intervals and return to the ipython prompt,
resuming execution later.
"""
print 'Hello, welcome to an interactive IPython demo.'
print 'Executing this block should require confirmation before proceeding,'
print 'unless auto_all has been set to true in the demo object'
# The mark below defines a block boundary, which is a point where IPython will
# stop execution and return to the interactive prompt.
# Note that in actual interactive execution,
# <demo> --- stop ---
x = 1
y = 2
# <demo> --- stop ---
# the mark below makes this block as silent
# <demo> silent
print 'This is a silent block, which gets executed but not printed.'
# <demo> --- stop ---
# <demo> auto
print 'This is an automatic block.'
print 'It is executed without asking for confirmation, but printed.'
z = x+y
print 'z=',x
# <demo> --- stop ---
# This is just another normal block.
print 'z is now:', z
print 'bye!'
In order to run a file as a demo, you must first make a Demo object out of it. If the file is named myscript.py, the following code will make a demo:
**from IPython.lib.demo import Demo**
**mydemo = Demo('myscript.py')**
This creates the mydemo object, whose blocks you run one at a time by simply calling the object with no arguments. If you have autocall active in IPython (the default), all you need to do is type:
**mydemo**
and IPython will call it, executing each block. Demo objects can be restarted, you can move forward or back skipping blocks, re-execute the last block, etc. Simply use the Tab key on a demo object to see its methods, and call ? on them to see their docstrings for more usage details. In addition, the demo module itself contains a comprehensive docstring, which you can access via:
**from IPython.lib import demo**
**demo?**
Limitations: It is important to note that these demos are limited to fairly simple uses. In particular, you cannot break up sections within indented code (loops, if statements, function definitions, etc.) Supporting something like this would basically require tracking the internal execution state of the Python interpreter, so only top-level divisions are allowed. If you want to be able to open an IPython instance at an arbitrary point in a program, you can use IPythons embedding facilities, see IPython.embed() for details.

View File

@@ -0,0 +1,123 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-06T15:30:30+08:00
====== Using IPython for interactive work ======
Created Friday 06 January 2012
http://ipython.org/ipython-doc/rel-0.12/interactive/index.html
Introducing IPython
Tab completion
Exploring your objects
Magic functions
History
System shell commands
Configuration
IPython Tips & Tricks
Embed IPython in your programs
Run doctests
Use IPython to present interactive demos
Suppress output
Lightweight version control
IPython reference
Command-line usage
Interactive use
IPython as your default Python environment
Embedding IPython
Using the Python debugger (pdb)
Extensions for syntax processing
GUI event loop support
Plotting with matplotlib
Interactive demos with IPython
IPython as a system shell
Overview
Aliases
Directory management
Enabled extensions
Prompt customization
String lists
A Qt Console for IPython
%loadpy
Pylab
Saving and Printing
Colors and Highlighting
Fonts
Process Management
Qt and the QtConsole
Regressions
An HTML Notebook IPython
Basic Usage
Security
Quick Howto: running a public notebook server
The notebook format
Known Issues
===== Introducing IPython =====
You dont need to know anything beyond Python to start using IPython just type commands as you would at the standard Python prompt. But IPython can do __much more__ than the standard prompt. Some key features are described here. For more information, check the tips page, or look at examples in the IPython cookbook.
==== • Tab completion ====
Tab completion, especially for attributes, is a convenient way to explore the structure of any object youre dealing with. Simply type__ object_name.<TAB>__ to view the objects attributes (see the readline section for more). Besides Python objects and keywords, tab completion also works on file and directory names.
===== • Exploring your objects =====
Typing __object_name?__ will print all sorts of details about any object, including docstrings, function definition lines (for call arguments) and constructor details for classes. To get specific information on an object, you can use the magic commands **%pdoc, %pdef, %psource and %pfile**
===== • Magic functions =====
IPython has a set of predefined magic functions that you can call with a command line style syntax. These include:
* Functions that work with code: %run, %edit, %save, %macro, %recall, etc.
* Functions which affect the shell: %colors, %xmode, %autoindent, etc.
* Other functions such as %reset, %timeit or %paste.
You can always call these using the __% prefix__, and if youre typing one** on a line by itself,** you can omit even that:
**run thescript.py**
For more details on any magic function, call__ %somemagic?__ to read its docstring. To see all the available magic functions, call __%lsmagic__.
===== • Running and Editing =====
The %run magic command allows you to run **any python script **and __load all of its data directly into the interactive namespace__. Since the file is re-read from disk each time, changes you make to it are reflected immediately (unlike imported modules, which have to be specifically reloaded). IPython also includes dreload, a recursive reload function.
%run has special flags for__ timing__ the execution of your scripts (-t), or for running them under the control of either Pythons __pdb__ debugger (-d) or __profiler__ (-p).
The __%edit__ command gives a reasonable approximation of multiline editing, by invoking your favorite editor on the spot. IPython will execute the code you type in there as if it were typed interactively.
===== • Debugging =====
After an exception occurs, you can call __%debug__ to jump into the Python debugger (__pdb__) and examine the problem. Alternatively, if you call__ %pdb__, IPython will automatically start the debugger **on any uncaught exception**. You can print variables, see code, execute statements and even walk up and down the call stack to track down the true source of the problem. This can be an efficient way to develop and debug code, in many cases eliminating the need for print statements or external debugging tools.
You can also step through a program from the beginning by calling __%run -d theprogram.py__.
===== • History =====
IPython stores __both the commands you enter, and the results it produces__. You can easily go through previous commands with the up- and down-arrow keys, or access your history in more sophisticated ways.
Input and output history are kept in variables called __In and Out__, keyed by the prompt numbers, e.g. In[4]. The **last three objects in **__output history__ are also kept in variables named ___, __ and _____.
You can use the** %history **magic function to examine past input and output. Input history from previous sessions is saved in a__ database__, and IPython can be configured to save output history.
Several other magic functions can use your input history, including __%edit, %rerun, %recall, %macro, %save and %pastebin.__ You can use a standard format to refer to lines:
%pastebin 3 18-20 ~1/1-5
This will take line 3 and lines 18 to 20 from the** current session**, and lines 1-5 from the **previous session**.
===== • System shell commands =====
To run any command at the system shell, simply prefix it with !, e.g.:
!ping www.bbc.co.uk
You can **capture the output** into a__ Python list__, e.g.: __files = !ls__. To pass the values of **Python variables** or expressions to system commands,
prefix them with $: !grep -rF $pattern ipython/*. See our shell section for more details.
===== • Define your own system aliases =====
Its convenient to have__ aliases to the system commands __you use most often. This allows you to work seamlessly from inside IPython with the same commands you are used to in your system shell. IPython comes with some **pre-defined aliases** and **a complete system** for changing directories, both via a stack (see__ %pushd, %popd and %dhist__) and via direct__ %cd__. The latter keeps a history of visited directories and allows you to go to any previously visited one.
===== • Configuration =====
Much of IPython can be tweaked through configuration. To get started, use the command__ ipython profile create__ to produce the default config files. These will be placed in ~/.ipython/profile_default or ~/.config/ipython/**profile_default**, and contain comments explaining what the various options do.
Profiles allow you to use IPython **for different tasks**, keeping separate config files and history for each one. More details in the profiles section.
===== • Startup Files =====
If you want some code to be run at the beginning of __every IPython session__, the easiest way is to add **Python (.py) or IPython (.ipy)** scripts to your **profile_default/startup/ **directory. Files here will be executed as soon as the IPython shell is constructed, before any other code or scripts you have specified. The files will be run in order of their names, so you can control the ordering with prefixes, like 10-myimports.py.
Note
Automatic startup files are new in IPython 0.12. Use InteractiveShellApp.exec_files in ipython_config.py for similar behavior in 0.11.

View File

@@ -0,0 +1,269 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T12:38:57+08:00
====== PEP 0263 -- Defining Python Source Code Encodings ======
Created Sunday 23 October 2011
http://www.python.org/dev/peps/pep-0263/
PEP: 0263
Title: Defining Python Source Code Encodings
Version: 982904d11574
Last-Modified: 2009-06-04 19:44:37 +0000 (Thu, 04 Jun 2009)
Author: Marc-André Lemburg <mal at lemburg.com>, Martin von Löwis <martin at v.loewis.de>
Status: Final
Type: Standards Track
Created: 06-Jun-2001
Python-Version: 2.3
Post-History:
===== Abstract =====
This PEP proposes to introduce **a syntax to declare the encoding of**
** a Python source file**. The //encoding information// is then used by the
Python parser to interpret the file using the given encoding. Most
notably this enhances the interpretation of **Unicode literals** in
the source code and makes it possible to write Unicode literals
using e.g. UTF-8 directly in an Unicode aware editor.
===== Problem =====
In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the
programming environment rather unfriendly to Python users who live
and work in non-Latin-1 locales such as many of the Asian
countries. Programmers can write their 8-bit strings using the
favorite encoding, but are bound to the "unicode-escape" encoding
for Unicode literals.
===== Proposed Solution =====
I propose to make the Python source code encoding both visible and
changeable on a per-source file basis by using **a special comment**
at the top of the file to declare the encoding.
To make Python aware of this encoding declaration a number of
concept changes are necessary with respect to the handling of
Python source code data.
===== Defining the Encoding =====
Python will **default to ASCII **as standard encoding if no other
encoding hints are given.
To define a source code encoding, a** magic comment **must
be placed into the source files either as **first or second**
line in the file, such as:
// # coding=<encoding name>//
or (using formats recognized by popular editors)
// #!/usr/bin/python//
// # -*- coding: <encoding name> -*- //
or
// #!/usr/bin/python//
// # vim: set fileencoding=<encoding name> ://
More precisely, the first or second line must match the regular
expression "**coding[:=]\s*([-\w.]+)**". The first group of this
expression is then interpreted as **encoding name**. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.
To aid with platforms such as Windows, which add **Unicode BOM **marks
to the beginning of Unicode files, the UTF-8 signature
**'\xef\xbb\xbf'** will be interpreted as **'utf-8' **encoding as well
(even if no magic encoding comment is given).
If a source file uses both the UTF-8 BOM mark signature and a
magic encoding comment, the only allowed encoding for the comment
is 'utf-8'. Any other encoding will cause an error.
===== Examples =====
These are some examples to clarify the different styles for
defining the source code encoding at the top of a Python source
file:
1. With interpreter binary and using **Emacs style** file encoding
comment:
#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys
...
#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys
...
2. Without interpreter line, using plain text:
# This Python file uses the following encoding: utf-8
import os, sys
...
3. Text editors might have different ways of defining the file's
encoding, e.g.
#!/usr/local/bin/python
# coding: latin-1
import os, sys
...
4. Without encoding comment, Python's parser will **assume ASCII**
text:
#!/usr/local/bin/python
import os, sys
...
5. Encoding comments which don't work:
Missing "coding:" prefix:
#!/usr/local/bin/python
# latin-1
import os, sys
...
Encoding comment not on line 1 or 2:
#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys
...
Unsupported encoding:
#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys
...
===== Concepts =====
The PEP is based on the following concepts which would have to be
implemented to enable usage of such a magic comment:
1. The complete Python source file should **use a single encoding**.
Embedding of differently encoded data is not allowed and will
result in a decoding error during compilation of the Python
source code.
Any encoding which allows processing the first two lines in the
way indicated above is allowed as source code encoding, this
includes ASCII compatible encodings as well as certain
multi-byte encodings such as Shift_JIS. It does not include
encodings which use two or more bytes for all characters like
e.g. UTF-16. The reason for this is to keep the encoding
detection algorithm in the tokenizer simple.
2. Handling of escape sequences should continue to work as it does
now, but with all possible source code encodings, that is
standard string literals (both 8-bit and Unicode) are subject to
escape sequence expansion while raw string literals only expand
a very small subset of escape sequences.
3. Python's tokenizer/compiler combo will need to be updated to
work as follows:
1. read the file
2. decode it into Unicode assuming a fixed per-file encoding
3. convert it into a UTF-8 byte string
4. tokenize the UTF-8 content
5. compile it, creating Unicode objects from the given Unicode data
and creating string objects from the Unicode literal data
by first reencoding the UTF-8 data into 8-bit string data
using the given file encoding
Note that Python identifiers are restricted to the ASCII
subset of the encoding, and thus need no further conversion
after step 4.
===== Implementation =====
For backwards-compatibility with existing code which currently
uses non-ASCII in string literals without declaring an encoding,
the implementation will be introduced in two phases:
1. Allow non-ASCII in string literals and comments, by internally
treating a missing encoding declaration as a declaration of
"iso-8859-1". This will cause arbitrary byte strings to
correctly round-trip between step 2 and step 5 of the
processing, and provide compatibility with Python 2.2 for
Unicode literals that contain non-ASCII bytes.
A warning will be issued if non-ASCII bytes are found in the
input, once per improperly encoded input file.
2. Remove the warning, and change the default encoding to "ascii".
The builtin compile() API will be enhanced to accept Unicode as
input. 8-bit string input is subject to the standard procedure for
encoding detection as described above.
If a Unicode string with a coding declaration is passed to compile(),
a SyntaxError will be raised.
SUZUKI Hisao is working on a patch; see [2] for details. A patch
implementing only phase 1 is available at [1].
===== Phases =====
Implementation of steps 1 and 2 above were completed in 2.3,
except for changing the default encoding to "ascii".
The default encoding was set to "ascii" in version 2.5.
===== Scope =====
This PEP intends to provide an upgrade path from the current
(more-or-less) undefined source code encoding situation to a more
robust and portable definition.
===== References =====
[1] Phase 1 implementation:
http://python.org/sf/526840
[2] Phase 2 implementation:
http://python.org/sf/534304
===== History =====
1.10 and above: see CVS history
1.8: Added '.' to the coding RE.
1.7: Added warnings to phase 1 implementation. Replaced the
Latin-1 default encoding with the interpreter's default
encoding. Added tweaks to compile().
1.4 - 1.6: Minor tweaks
1.3: Worked in comments by Martin v. Loewis:
UTF-8 BOM mark detection, Emacs style magic comment,
two phase approach to the implementation
Copyright
This document has been placed in the public domain.

View File

@@ -0,0 +1,57 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T12:25:42+08:00
====== PEP 20 -- The Zen of Python ======
Created Sunday 23 October 2011
PEP: 20
Title: The Zen of Python
Version: 6b1b63cb3d74
Last-Modified: 2004-08-23 03:41:21 +0000 (Mon, 23 Aug 2004)
Author: Tim Peters <tim at zope.com>
Status: Active
Type: Informational
Content-Type: text/plain
Created: 19-Aug-2004
Post-History: 22-Aug-2004
===== Abstract =====
Long time Pythoneer Tim Peters succinctly channels the BDFL's
guiding principles for Python's design into 20 aphorisms, only 19
of which have been written down.
**The Zen of Python**
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
===== Easter Egg =====
>>> import this
===== Copyright =====
This document has been placed in the public domain.

View File

@@ -0,0 +1,194 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T14:08:32+08:00
====== PEP 257 -- Docstring Conventions ======
Created Sunday 23 October 2011
http://www.python.org/dev/peps/pep-0257/
PEP: 257
Title: Docstring Conventions
Version: e2b5d1a8a663
Last-Modified: 2009-01-18 09:50:42 +0000 (Sun, 18 Jan 2009)
Author: David Goodger <goodger at python.org>, Guido van Rossum <guido at python.org>
Discussions-To: doc-sig at python.org
Status: Active
Type: Informational
Content-Type: text/x-rst
Created: 29-May-2001
Post-History: 13-Jun-2001
===== Contents =====
Abstract
Rationale
Specification
What is a Docstring?
One-line Docstrings
Multi-line Docstrings
Handling Docstring Indentation
References and Footnotes
Copyright
Acknowledgements
====== Abstract ======
This PEP documents the** semantics and conventions** associated with Python docstrings.
===== Rationale =====
The aim of this PEP is to **standardize the high-level structure of docstrings**: what they should contain, and how to say it (without touching on any markup syntax within docstrings). The PEP contains conventions, not laws or syntax.
"A universal convention supplies all of maintainability, clarity, consistency, and a foundation for good programming habits too. What it doesn't do is insist that you follow it against your will. That's Python!"
—Tim Peters on comp.lang.python, 2001-06-16
If you violate these conventions, the worst you'll get is some dirty looks. But some software (such as the Docutils [4] docstring processing system [1] [2]) will be aware of the conventions, so following them will get you the best results.
===== Specification =====
==== What is a Docstring? ====
A docstring is **a string literal that occurs as the first statement** in a module, function, class, or method definition. Such a docstring becomes the **__doc__ **special attribute of that object.
All modules should normally have docstrings, and all functions and classes **exported** by a module should also have docstrings. **Public** methods (including the __init__ constructor) should also have docstrings. A package may be documented in the module docstring of the **__init__.py** file in the package directory.
String literals occurring elsewhere in Python code may also act as documentation. They are **not recognized **by the Python bytecode compiler and are **not accessible** as runtime object attributes (i.e. not assigned to __doc__), but two types of extra docstrings may be extracted by software tools:
* String literals occurring **immediately after** a simple assignment at the top level of a module, class, or __init__ method are called "**attribute docstrings**".
* String literals occurring immediately after another docstring are called "**additional docstrings**".
Please see PEP 258, "Docutils Design Specification" [2], for a detailed description of attribute and additional docstrings.
XXX Mention docstrings of 2.2 properties.
For consistency, always use__ """triple double quotes""" __around docstrings. Use__ r"""raw triple double quotes"""__ if you use any backslashes in your docstrings. For Unicode docstrings, use __u"""Unicode triple-quoted strings"""__.
There are two forms of docstrings: one-liners and multi-line docstrings.
**One-line Docstrings**
One-liners are for really obvious cases. They should really fit on one line. For example:
def kos_root():
"""Return the pathname of the KOS root directory."""
global _kos_root
if _kos_root: return _kos_root
...
Notes:
* Triple quotes are used even though the string fits on one line. This makes it easy to **later expand** it.
* The closing quotes are on the same line as the opening quotes. This looks better for one-liners.
* There's **no blank line **either before or after the docstring.
* The docstring is a phrase ending in a **period.** It prescribes the function or method's effect __as a command__ ("Do this", "Return that"), not as a __description__; e.g. don't write "Returns the pathname ...".
* The one-line docstring **should NOT be a "signature"** reiterating the function/method parameters (which can be obtained by introspection). Don't do:
def function(a, b):
"""function(a, b) -> list"""
This type of docstring is only appropriate for C functions (such as built-ins), where **introspection** is not possible. However, the nature of the return value cannot be determined by introspection, so it should be mentioned. The preferred form for such a docstring would be something like:
def function(a, b):
"""Do X and return a list."""
(Of course "Do X" should be replaced by a useful description!)
**Multi-line Docstrings**
Multi-line docstrings consist of **a summary line **just like a one-line docstring, followed by **a blank line**, followed by a more elaborate description. The summary line may be used by **automatic indexing tools**; it is important that it fits on one line and is separated from the rest of the docstring by a blank line. The summary line may be on **the same line **as the opening quotes **or on the next line**. The entire docstring is indented the** same as the quotes** at its first line (see example below).
Insert **a blank line before and after all docstrings** (one-line or multi-line) that document __a class __-- generally speaking, the class's methods are separated from each other by **a single blank line**, and the docstring needs to be offset from the** first method** by a blank line; for symmetry, put **a blank** line between the class header and the docstring. Docstrings documenting functions or methods generally **don't **have this requirement, unless the function or method's body is written as a number of blank-line separated sections -- in this case, treat the docstring as another section, and precede it with a blank line.
The docstring of __a script__ (a stand-alone program) should be usable as its **"usage"** message, printed when the script is invoked with incorrect or missing arguments (or perhaps with a "-h" option, for "help"). Such a docstring should document the **script's function and command line syntax, environment variables, and files**. Usage messages can be fairly elaborate (several screens full) and should be sufficient for a new user to use the command properly, as well as a complete quick reference to all options and arguments for the sophisticated user.
The docstring for__ a module__ should generally **list the classes, exceptions and functions** (and any other objects) that are exported by the module, with a **one-line summary of each**. (These summaries generally give less detail than the summary line in the object's docstring.) The docstring for a package (i.e., the docstring of the package's **__init__.py **module) should also list the modules and subpackages exported by the package.
The docstring for __a function or method__ should summarize its behavior and document its arguments, return value(s), side effects, exceptions raised, and restrictions on when it can be called (all if applicable). Optional arguments should be indicated. It should be documented whether keyword arguments are part of the interface.
The docstring for__ a class__ should summarize its behavior and** list **the public methods and instance variables. If the class is intended to be subclassed, and has an additional interface for subclasses, this interface should be listed separately (in the docstring). The class constructor should be documented in the docstring for its __init__ method. Individual methods should be documented by their own docstring.
If a class__ subclasses __another class and its behavior is mostly inherited from that class, its docstring should mention this and summarize the differences. Use the verb "**override**" to indicate that a subclass method replaces a superclass method and does not call the superclass method; use the verb "**extend**" to indicate that a subclass method calls the superclass method (in addition to its own behavior).
Do not use the Emacs convention of mentioning the arguments of functions or methods in upper case in running text. Python is case sensitive and the argument names can be used for keyword arguments, so the docstring should document the correct argument names. It is best to **list each argument on a separate line**. For example:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
The BDFL [3] recommends **inserting a blank line between the last paragraph in a multi-line docstring and its closing quotes, placing the closing quotes on a line by themselves.** This way, Emacs' fill-paragraph command can be used on it.
===== Handling Docstring Indentation =====
Docstring processing tools will strip a uniform amount of indentation from the **second and further lines of the docstring, equal to the minimum indentation of all non-blank lines after the first line.** Any indentation in the first line of the docstring (i.e., up to the first newline) is insignificant and **removed**. Relative indentation of later lines in the docstring is retained. Blank lines should be removed from the beginning and end of the docstring.
Since code is much more precise than words, here is an implementation of the algorithm:
def trim(docstring):
if not docstring:
return ''
# Convert tabs to spaces (following the normal Python rules)
# and split into a list of lines:
lines = docstring.expandtabs().splitlines()
# Determine **minimum indentation** (first line doesn't count):
indent = sys.maxint
for line in lines[1:]:
stripped = line.lstrip()
if stripped:
indent = min(indent, len(line) - len(stripped))
# Remove indentation (first line is **special**):
trimmed = [lines[0].strip()]
if indent < sys.maxint:
for line in lines[1:]:
trimmed.append(line[indent:].rstrip())
# Strip off trailing and leading blank lines:
while trimmed and not trimmed[-1]:
trimmed.pop()
while trimmed and not trimmed[0]:
trimmed.pop(0)
# Return a single string:
return '\n'.join(trimmed)
The docstring in this example contains two newline characters and is therefore** 3 lines long**. The first and last lines are blank:
def foo():
"""
This is the second line of the docstring.
"""
To illustrate:
>>> print repr(foo.__doc__)
'\n This is the second line of the docstring.\n '
>>> foo.__doc__.splitlines()
['', ' This is the second line of the docstring.', ' ']
>>> trim(foo.__doc__)
'This is the second line of the docstring.'
Once trimmed, these docstrings are equivalent:
def foo():
"""A multi-line
docstring.
"""
def bar():
"""
A multi-line
docstring.
"""
References and Footnotes
[1] PEP 256, Docstring Processing System Framework, Goodger (http://www.python.org/dev/peps/pep-0256/)
[2] (1, 2) PEP 258, Docutils Design Specification, Goodger (http://www.python.org/dev/peps/pep-0258/)
[3] Guido van Rossum, Python's creator and Benevolent Dictator For Life.
[4] http://docutils.sourceforge.net/
[5] http://www.python.org/doc/essays/styleguide.html
[6] http://www.python.org/sigs/doc-sig/

View File

@@ -0,0 +1,879 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T10:19:27+08:00
====== PEP 8 -- Style Guide for Python Code ======
Created Sunday 23 October 2011
http://www.python.org/dev/peps/pep-0008/
PEP: 8
Title: Style Guide for Python Code
Version: 00f8e3bb1197
Last-Modified: 2011-06-13 12:48:33 -0400 (Mon, 13 Jun 2011)
Author: Guido van Rossum <guido at python.org>, Barry Warsaw <barry at python.org>
Status: Active
Type: Process
Created: 05-Jul-2001
Post-History: 05-Jul-2001
===== Introduction =====
This document gives coding conventions for the Python code comprising the
standard library in the main Python distribution. Please see the
companion informational PEP describing style guidelines for the C code in
the C implementation of Python[1].
This document was adapted from Guido's original Python Style Guide
essay[2], with some additions from Barry's style guide[5]. Where there's
conflict, Guido's style rules for the purposes of this PEP. This PEP may
still be incomplete (in fact, it may never be finished <wink>).
===== A Foolish Consistency is the Hobgoblin of Little Minds =====
One of Guido's key insights is that **code is read much more often than it**
** is written**. The guidelines provided here are intended to improve the
**readability of code** and make it consistent across the wide spectrum of
Python code. As PEP 20 [6] says, "Readability counts".
A style guide is about **consistency**. Consistency with this style guide is
important. Consistency within a project is more important. Consistency
within one module or function is most important.
But most importantly: **know when to be inconsistent** -- sometimes the style
guide just doesn't apply. When in doubt, use your best judgment. Look
at other examples and decide what looks best. And don't hesitate to ask!
Two good reasons to break a particular rule:
(1) When applying the rule would make the code less readable, even for
someone who is used to reading code that follows the rules.
(2) To be consistent with surrounding code that also breaks it (maybe for
historic reasons) -- although this is also an opportunity to clean up
someone else's mess (in true XP style).
====== Code lay-out ======
** Indentation**
Use 4 spaces per indentation level.
For really old code that you don't want to mess up, you can continue to
use 8-space tabs.
Continuation lines should align wrapped elements either** vertically** using
Python's implicit line joining inside parentheses, brackets and braces, or
using a **hanging indent**. When using a hanging indent the following
considerations should be applied; there should be **no arguments on the**
** first line** and further indentation should be used to clearly distinguish
itself as a **continuation line**.
Yes: # Aligned with opening delimiter
foo = long_function_name(var_one, var_two,
var_three, var_four)
# More indentation对于参数的缩进而言 included to distinguish this from the rest.
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
No: # Arguments on first line forbidden when not using vertical alignment
foo = long_function_name(var_one, var_two,
var_three, var_four)
# Further indentation required as indentation is not distinguishable
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
Optional:
# Extra indentation is not necessary. #当是函数调用时,参数的额外缩进是可选的。但是函数定义时,额外缩进是必需的(否则,无法与正文区分)
foo = long_function_name(
var_one, var_two,
var_three, var_four)
** Tabs or Spaces?**
Never mix tabs and spaces.
The most popular way of indenting Python is with **spaces only**. The
second-most popular way is with tabs only. Code indented with a mixture
of tabs and spaces should be converted to using spaces exclusively. When
invoking the Python command line interpreter with the -t option, it issues
warnings about code that illegally mixes tabs and spaces. When using -tt
these warnings become errors. These options are highly recommended!
For new projects, spaces-only are strongly recommended over tabs. Most
editors have features that make this easy to do.
** Maximum Line Length**
Limit all lines to a maximum of **79 **characters.
There are still many devices around that are limited to 80 character
lines; plus, limiting windows to 80 characters makes it possible to have
several windows **side-by-side**. The default wrapping on such devices
disrupts the visual structure of the code, making it more difficult to
understand. Therefore, please limit all lines to a maximum of 79
characters. For flowing long blocks of text (docstrings or comments),
limiting the length to 72 characters is recommended.
The preferred way of wrapping long lines is by using Python's **implied line**
** continuation inside parentheses, brackets and braces.** Long lines can be
broken over multiple lines by wrapping expressions in parentheses. These
should be used in **preference to using a backslash** for line continuation.
Make sure to **indent the continued line appropriately**. The preferred place
to break around a binary operator is *after* the operator, not before it.
Some examples:
class Rectangle(Blob):
def __init__(self, width, height,
color='black', emphasis=None, highlight=0):
if (width == 0 and height == 0 and
color == 'red' and emphasis == 'strong' or
highlight > 100):
raise ValueError("sorry, you lose")
if width == 0 and height == 0 and (color == 'red' or
emphasis is None):
raise ValueError("I don't think so -- values are %s, %s" %
(width, height))
Blob.__init__(self, width, height,
color, emphasis, highlight)
** Blank Lines**
Separate top-level function and class definitions with** two** blank lines.
Method definitions inside a class are separated by **a single** blank line.
Extra blank lines may be used (sparingly) to separate **groups of related**
** functions.** Blank lines may be omitted between a bunch of related
one-liners (e.g. a set of dummy implementations).
Use blank lines in functions, sparingly, to indicate** logical sections**.
Python accepts the control-L (i.e. ^L) form feed character as whitespace;
Many tools treat these characters as page separators, so you may use them
to separate pages of related sections of your file. Note, some editors
and web-based code viewers may not recognize control-L as a form feed
and will show another glyph in its place.
**Encodings (PEP 263)**
Code in the core Python distribution should always use the ASCII or
Latin-1 encoding (a.k.a. ISO-8859-1). For Python 3.0 and beyond,
**UTF-8** is preferred over Latin-1, see PEP 3120.
Files using ASCII should not have a **coding cookie**. (使用ASCII编码的源文件
不应该有文档编码类型声明) Latin-1 (or UTF-8) should only be used when a
comment or docstring needs to mention an author name that requires
Latin-1; otherwise, using \x, \u or \U escapes is the preferred way to
include non-ASCII data in string literals. 这些都是对于ASCII编码的文件而言。
For Python 3.0 and beyond, the following policy is prescribed for
the **standard library** (see PEP 3131这些规则适用与标准库): All identifiers
in the Python standard library __MUST use ASCII-only identifiers__, and SHOULD use
English words wherever feasible (in many cases, abbreviations and
technical terms are used which aren't English). In addition,
__ string literals and comments must also be in ASCII__. The only
exceptions are (a) test cases testing the non-ASCII features, and
(b) names of authors. Authors whose names are not based on the
latin alphabet MUST provide a latin transliteration of their
names. 这些规则适用于编写标准库中的程序,因为他要适合各种语言的人使用,所以
用英语是最合适的,但是如果是个人使用,标示符一定要用英语,字符串字面量可以用
非英语字符。
Open source projects with a global audience are encouraged to
adopt a similar policy.
====== Imports ======
- Imports should usually be** on separate lines**, e.g.:
Yes: import os
import sys
No: import sys, os
it's okay to say this though:
from subprocess import Popen, PIPE
- Imports are always put at the top of the file, just **after** any module
comments and docstrings, and **before** module globals and constants.
Imports should be **grouped in the following order**:
1. standard library imports
2. related third party imports
3. local application/library specific imports
You should put a** blank line** between each group of imports.
Put any relevant **__all__ **specification after the imports.
- Relative imports for intra-package imports are highly **discouraged**.
**Always use the absolute package path for all imports**.
Even now that PEP 328 [7] is fully implemented in Python 2.5,
its style of explicit relative imports is actively discouraged;
absolute imports are more portable and usually more readable.
- When importing a class from a class-containing module, it's usually okay
to spell this
from myclass import MyClass
from foo.bar.yourclass import YourClass
If this spelling causes **local name clashes**, then spell them
import myclass
import foo.bar.yourclass
and use "myclass.MyClass" and "foo.bar.yourclass.YourClass"
在不引起标识符引用冲突的情况下尽量将最后的函数、类、常量、变量名称导入到当前空间。
====== Whitespace in Expressions and Statements ======
Pet Peeves
Avoid extraneous whitespace in the following situations:
- **Immediately inside** parentheses, brackets or braces.
紧随/接各种括号的内容与括号间不应有空格
Yes: spam(ham[1], {eggs: 2})
No: spam( ham[ 1 ], { eggs: 2 } )
- **Immediately before** a comma, semicolon, or colon:
各种标点符号前不应有空格
Yes: if x == 4: print x, y; x, y = y, x
No: if x == 4 : print x , y ; x , y = y , x
- Immediately **before** the open parenthesis that starts the argument
list of a function call:
左括号与其前的内容间不应该有括号
Yes: spam(1)
No: spam (1)
- Immediately **before **the open parenthesis that starts an indexing or
slicing:
Yes: dict['key'] = list[index]
No: dict ['key'] = list [index]
- More than one space around an assignment (or other) operator to
**align it with another**.
不要使用空格将多行的表达式对其
Yes:
x = 1
y = 2
long_variable = 3
No:
x = 1
y = 2
long_variable = 3
** Other Recommendations**
- Always surround these** binary operators** with a single space on
either side: assignment (=), augmented assignment (+=, -= etc.),
comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not),
Booleans (and, or, not).
各种运算符两边应该有空格,以将运算符与操作数分开
- Use spaces around arithmetic operators:
Yes:
i = i + 1
submitted += 1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)
No:
i=i+1
submitted +=1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
- Don't use spaces around the '=' sign when used to indicate a
__keyword argument or a default parameter value__.
关键字和缺省参数的等号两边不要使用空格
Yes:
def complex(real, imag=0.0):
return magic(r=real, i=imag)
No:
def complex(real, imag = 0.0):
return magic(r = real, i = imag)
- Compound statements (multiple statements on the same line) are
generally discouraged.
Yes:
if foo == 'blah':
do_blah_thing()
do_one()
do_two()
do_three()
Rather not:
if foo == 'blah': do_blah_thing()
do_one(); do_two(); do_three()
- While sometimes it's okay to put an if/for/while with** a small**
** body on the same line**, never do this for multi-clause
statements. Also avoid folding such long lines!
Rather not:
if foo == 'blah': do_blah_thing()
for x in lst: total += x
while t < 10: t = delay()
**Definitely not**:
if foo == 'blah': do_blah_thing()
else: do_non_blah_thing()
try: something()
finally: cleanup()
do_one(); do_two(); do_three(long, argument,
list, like, this)
if foo == 'blah': one(); two(); three()
====== Comments ======
Comments that contradict the code are worse than no comments. Always make
a priority of keeping the comments **up-to-date** when the code changes!
Comments should be **complete sentences**. If a comment is a phrase or
sentence, its first word should be **capitalized**, unless it is an identifier
that begins with a lower case letter (never alter the case of
identifiers!).
If a comment is short, the period at the end can be omitted. Block
comments generally consist of one or more paragraphs built out of complete
sentences, and each sentence should **end in a period**.
You should use **two spaces** after a sentence-ending period.
When writing English, Strunk and White apply.
Python coders from non-English speaking countries: please **write**
** your comments in English**, unless you are 120% sure that the code
will never be read by people who don't speak your language.
** Block Comments**
Block comments generally apply to some (or all) code that **follows them**,
and are **indented to the same level **as that code. Each line of a block
comment **starts with a # and a single space** (unless it is indented text
inside the comment).
Paragraphs inside a block comment are separated by a line containing a
** single #**.
** Inline Comments**
Use inline comments sparingly(节约地,爱惜地).
An inline comment is a comment **on the same line as a statement**. Inline
comments should be separated by **at least two spaces** from the statement.
They should start with** a # and a single space**.
Inline comments are unnecessary and in fact distracting if they state
the obvious. Don't do this:
x = x + 1 # Increment x
But sometimes, this is useful:
x = x + 1 # Compensate for border
====== Documentation Strings ======
Conventions for writing good documentation strings (a.k.a. "docstrings")
are immortalized in PEP 257 [3].
- Write docstrings** for all** public modules, functions, classes, and
methods. Docstrings are not necessary for **non-public** methods, but you
should have **a comment** that describes what the method does. This comment
should appear **after** the "def" line.
- PEP 257 describes good docstring conventions. Note that most
importantly, the """ that ends a multiline docstring should be **on a line**
** by itself**, and preferably preceded by **a blank line**, e.g.:
"""Return a foobang
Optional plotz says to frobnicate the bizbaz first.
"""
- For one liner docstrings, it's okay to keep the closing """ on the same
line.
====== Version Bookkeeping ======
If you have to have Subversion, CVS, or RCS crud in your source file, do
it as follows.
__version__ = "$Revision: 00f8e3bb1197 $"
# $Source$
These lines should be included **after the module's docstring**, before any
other code, separated by a blank line above and below.
====== Naming Conventions ======
The naming conventions of Python's library are a bit of a mess, so we'll
never get this completely consistent -- nevertheless, here are the
currently// recommended naming standards//. New modules and packages
(including third party frameworks) should be written to these standards,
but where an existing library has a different style, internal consistency
is preferred.
// Descriptive: Naming Styles//
There are a lot of different naming styles. It helps to be able to
recognize what naming style is being used, independently from what they
are used for.
The following naming styles are commonly distinguished:
- b (single lowercase letter)
- B (single uppercase letter)
- lowercase
- lower_case_with_underscores
- UPPERCASE
- UPPER_CASE_WITH_UNDERSCORES
- CapitalizedWords (or CapWords, or CamelCase -- so named because
of the bumpy look of its letters[4]). This is also sometimes known as
StudlyCaps.
Note: When using abbreviations in CapWords, **capitalize all the letters**
** of the abbreviation**. Thus HTTPServerError is better than
HttpServerError.
- mixedCase (differs from CapitalizedWords by **initial lowercase**
character!)
- Capitalized_Words_With_Underscores (**ugly!**)
There's also the style of using a short **unique prefix **to group related
names together. This is not used much in Python, but it is mentioned for
completeness. For example, the os.stat() function returns a tuple whose
items traditionally have names like st_mode, st_size, st_mtime and so on.
(This is done to emphasize the correspondence with the fields of the
POSIX system call struct, which helps programmers familiar with that.)
The X11 library uses a leading** X **for all its public functions. In Python,
this style is generally deemed unnecessary because attribute and method
names are prefixed with an object, and function names are prefixed with a
module name.
In addition, the following special forms using leading or trailing
underscores are recognized (these can generally be combined with any case
convention):
- _single_leading_underscore: **weak "internal use" indicator**. E.g. "from M
import *" does not import objects whose name starts with an underscore.
- single_trailing_underscore_: used by convention to **avoid conflicts with**
** Python keyword**, e.g.
Tkinter.Toplevel(master, class_='ClassName')
- __double_leading_underscore: when naming a class attribute, **invokes name**
** mangling** (inside class FooBar, __boo becomes _FooBar__boo; see below).
- __double_leading_and_trailing_underscore__: "magic" objects or
attributes that live in user-controlled namespaces. E.g. __init__,
__import__ or __file__. Never invent such names; only use them
as documented.
** Prescriptive: Naming Conventions**
=== Names to Avoid ===
Never use the characters `l' (lowercase letter el), `O' (uppercase
letter oh), or `I' (uppercase letter eye) as single character variable
names.
In some fonts, these characters are **indistinguishable** from the numerals
one and zero. When tempted to use `l', use `L' instead.
=== Package and Module Names ===
Modules should have **short, all-lowercase names**. Underscores can be used
in the module name if it improves readability. Python packages should
also have short, all-lowercase names, although the use of underscores is
discouraged.
Since** module names are mapped to file names**, and some file systems are
case insensitive and truncate long names, it is important that module
names be chosen to be fairly short -- this won't be a problem on Unix,
but it may be a problem when the code is transported to older Mac or
Windows versions, or DOS.
When an extension module written in C or C++ has an accompanying Python
module that provides a higher level (e.g. more object oriented)
interface, the C/C++ module has a leading underscore (e.g. _socket).
=== Class Names ===
Almost without exception, class names use the** CapWords **convention.
Classes for internal use have a leading underscore in addition.
===== Exception Names =====
Because exceptions should be classes, the class naming convention
applies here. However, you should use the** suffix "Error" **on your
exception names (if the exception actually is an error).
=== Global Variable Names ===
(Let's hope that these variables are meant for use inside **one **module
only.) The conventions are about the same as those for functions.
Modules that are designed for use via "from M import *" should use the
**__all__ **mechanism to prevent exporting globals, or use the older
convention of prefixing such globals with **an underscore **(which you might
want to do to indicate these globals are "module non-public").
===== Function Names =====
Function names should be** lowercase**, with words separated by **underscores**
as necessary to improve readability.
mixedCase is allowed only in contexts where that's already the
prevailing style (e.g. threading.py), to retain backwards compatibility.
===== Function and method arguments =====
Always use **'self' **for the first argument to instance methods.
Always use** 'cls' **for the first argument to class methods.
If a function argument's name clashes with a reserved keyword, it is
generally better to **append a single trailing underscore **rather than use
an abbreviation or spelling corruption. Thus "print_" is better than
"prnt". (Perhaps better is to avoid such clashes by using a synonym.)
===== Method Names and Instance Variables =====
Use the function naming rules: lowercase with words separated by
underscores as necessary to improve readability.
Use one leading underscore only for non-public methods and instance
variables.
To avoid name clashes with subclasses, use **two leading underscores** to
invoke Python's name mangling rules.
Python mangles these names with the class name: if class Foo has an
attribute named __a, it cannot be accessed by Foo.__a. (An insistent
user could still gain access by calling Foo._Foo__a.) Generally, double
leading underscores should be used only to avoid name conflicts with
attributes in classes designed to be subclassed.
Note: there is some controversy about the use of __names (see below).
===== Constants =====
Constants are usually defined** on a module level** and written in all
capital letters with underscores separating words. Examples include
MAX_OVERFLOW and TOTAL.
===== Designing for inheritance =====
Always decide whether a class's methods and instance variables
(collectively: "attributes") should be public or non-public. If in
doubt, choose **non-public**; it's easier to make it public later than to
make a public attribute non-public.
Public attributes are those that you expect unrelated clients of your
class to use, with your commitment to avoid backward incompatible
changes. Non-public attributes are those that are not intended to be
used by third parties; you make no guarantees that non-public attributes
won't change or even be removed.
We don't use the term "private" here, **since no attribute is really**
** private in Python** (without a generally unnecessary amount of work).
Another category of attributes are those that are part of the "**subclass**
** API**" (often called "protected" in other languages). Some classes are
designed to be inherited from, either to extend or modify aspects of the
class's behavior. When designing such a class, take care to make
explicit decisions about** which attributes are public, which are part of**
** the subclass API, and which are truly only to be used by your base**
class.
With this in mind, here are the Pythonic guidelines:
- Public attributes should have **no **leading underscores.
- If your public attribute name collides with a reserved keyword, **append**
a single trailing underscore to your attribute name. This is
preferable to an abbreviation or corrupted spelling. (However,
notwithstanding this rule, 'cls' is the preferred spelling for any
variable or argument which is known to be a class, especially the
first argument to a class method.)
Note 1: See the argument name recommendation above for class methods.
- For simple public data attributes, it is best to expose just the
**attribute name**, without complicated accessor/mutator methods. Keep in
mind that Python provides an easy path to future enhancement, should
you find that a simple data attribute needs to grow functional
behavior. In that case, use properties to hide functional
implementation behind simple data attribute access syntax.
Note 1: Properties only work on new-style classes.
Note 2: Try to keep the functional behavior side-effect free, although
side-effects such as caching are generally fine.
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe
that access is (relatively) cheap.
- If your class is intended to be subclassed, and you have attributes
that you do not want subclasses to use, consider naming them with
**double leading underscores and no trailing underscores**. This invokes
Python's name __mangling algorithm__, where the name of the class is
mangled into the attribute name. This helps avoid attribute name
collisions should subclasses inadvertently contain attributes with the
same name.
Note 1: Note that only the simple class name is used in the mangled
name, so if a subclass chooses both the same class name and attribute
name, you can still get name collisions.
Note 2: Name mangling can make certain uses, such as debugging and
__getattr__(), less convenient. However the name mangling algorithm
is well documented and easy to perform manually.
Note 3: Not everyone likes name mangling. Try to balance the
need to avoid accidental name clashes with potential use by
advanced callers.
====== Programming Recommendations ======
- Code should be written in a way that does not disadvantage other
implementations of Python (PyPy, Jython, IronPython, Pyrex, Psyco,
and such).
For example, do not rely on CPython's efficient implementation of
in-place string concatenation for statements in the form a+=b or a=a+b.
Those statements run more slowly in Jython. In performance sensitive
parts of the library, the ''.join() form should be used instead. This
will ensure that concatenation occurs in linear time across various
implementations.
- Comparisons to singletons like** None **should always be done with
** 'is' or 'is not'**, never the equality operators.
Also, beware of writing "if x" when you really mean "if x is not None"
-- e.g. when testing whether a variable or argument that defaults to
None was set to some other value. The other value might have a type
(such as a container) that could be false in a boolean context!
- When implementing ordering operations with rich comparisons, it is best to
implement all six operations (__eq__, __ne__, __lt__, __le__, __gt__,
__ge__) rather than relying on other code to only exercise a particular
comparison.
To minimize the effort involved, the functools.total_ordering() decorator
provides a tool to generate missing comparison methods.
PEP 207 indicates that reflexivity rules *are* assumed by Python. Thus,
the interpreter may swap y>x with x<y, y>=x with x<=y, and may swap the
arguments of x==y and x!=y. The sort() and min() operations are
guaranteed to use the < operator and the max() function uses the >
operator. However, it is best to implement all six operations so that
confusion doesn't arise in other contexts.
- Use class-based exceptions.
String exceptions in new code are forbidden, because this language
feature is being removed in Python 2.6.
Modules or packages should define their own domain-specific base
exception class, which should be subclassed from the built-in Exception
class. Always include a class docstring. E.g.:
class MessageError(Exception):
"""Base class for errors in the email package."""
Class naming conventions apply here, although you should add the suffix
"Error" to your exception classes, if the exception is an error.
Non-error exceptions need no special suffix.
- When raising an exception, use "**raise ValueError('message')**" instead of
the older form "raise ValueError, 'message'".
The paren-using form is preferred because when the exception arguments
are long or include string formatting, you don't need to use line
continuation characters thanks to the containing parentheses. The older
form will be removed in Python 3000.
- When catching exceptions, mention specific exceptions
whenever possible instead of using a bare 'except:' clause.
For example, use:
try:
import platform_specific_module
except ImportError:
platform_specific_module = None
A bare 'except:' clause will catch SystemExit and KeyboardInterrupt
exceptions, making it harder to interrupt a program with Control-C,
and can disguise other problems. If you want to catch all
exceptions that signal program errors, use 'except Exception:'.
A good rule of thumb is to limit use of bare 'except' clauses to two
cases:
1) If the exception handler will be printing out or logging
the traceback; at least the user will be aware that an
error has occurred.
2) If the code needs to do some cleanup work, but then lets
the exception propagate upwards with 'raise'.
'try...finally' is a better way to handle this case.
- Additionally, for all try/except clauses, limit the 'try' clause
to the absolute minimum amount of code necessary. Again, this
avoids masking bugs.
Yes:
try:
value = collection[key]
except KeyError:
return key_not_found(key)
else:
return handle_value(value)
No:
try:
# Too broad!
return handle_value(collection[key])
except KeyError:
# Will also catch KeyError raised by handle_value()
return key_not_found(key)
- Use string methods instead of the string module.
String methods are always much faster and share the same API with
unicode strings. Override this rule if backward compatibility with
Pythons older than 2.0 is required.
- Use '**'.startswith() and ''.endswith() **instead of string slicing to check
for prefixes or suffixes.
startswith() and endswith() are cleaner and less error prone. For
example:
Yes: if foo.startswith('bar'):
No: if foo[:3] == 'bar':
The exception is if your code must work with Python 1.5.2 (but let's
hope not!).
- Object type comparisons should always use isinstance() instead
of comparing types directly.
Yes: if isinstance(obj, int):
No: if type(obj) is type(1):
When checking if an object is a string, keep in mind that it might be a
unicode string too! In Python 2.3, str and unicode have a common base
class, basestring, so you can do:
if isinstance(obj, basestring):
- For sequences, (strings, lists, tuples), use the fact that empty
sequences are false.
Yes: if not seq:
if seq:
No: if len(seq)
if not len(seq)
- Don't write string literals that rely on significant trailing
whitespace. Such trailing whitespace is visually indistinguishable and
some editors (or more recently, reindent.py) will trim them.
- Don't compare boolean values to True or False using ==
Yes: if greeting:
No: if greeting == True:
Worse: if greeting is True:
Rules that apply only to the standard library
- Do not use function type annotations in the standard library.
These are reserved for users and third-party modules. See
PEP 3107 and the bug 10899 for details.
References
[1] PEP 7, Style Guide for C Code, van Rossum
[2] http://www.python.org/doc/essays/styleguide.html
[3] PEP 257, Docstring Conventions, Goodger, van Rossum
[4] http://www.wikipedia.com/wiki/CamelCase
[5] Barry's GNU Mailman style guide
http://barry.warsaw.us/software/STYLEGUIDE.txt
[6] PEP 20, The Zen of Python
[7] PEP 328, Imports: Multi-Line and Absolute/Relative
Copyright
This document has been placed in the public domain.

View File

@@ -0,0 +1,7 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-06-30T17:55:41+08:00
====== PyGTK教程 ======
Created 星期四 30 六月 2011

View File

@@ -0,0 +1,49 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-06-30T17:58:48+08:00
====== PyGTK教程——PyGTK简介 ======
Created 星期四 30 六月 2011
PyGTK教程——PyGTK简介
发表于2011年5月4日 | 归档于: PyGTK Tutorial, Python | 标签: PyGTK, Python, 教程, 翻译
由于最近自己在学习Python且对Python的PyGTK GUI编程有兴趣。网上搜了比较多的关于PyGTK的资料但大多数都很久没有更新比如官方的tutorial http://pygtk.org/pygtk2tutorial/index.html 其中Python还是基于2.2版本的太老了应该不太适合现在的情况。在http://www.zetcode.com/tutorials/%E8%BF%99%E9%87%8C%E5%8F%91%E7%8E%B0%E4%BA%86http://www.zetcode.com/tutorials/pygtktutorial/ 比较新的教程。计划翻译一下整个教程为自己的一个PyGTK相关的项目也为想学PyGTK的新手。
1. PyGTK简介
在这个部分我们将谈谈PyGTK GUI图形用户界面库和一般的Python语言编程。
>>关于本教程
本教程是PyGTK编程教程。其中的例子在Linux上被创建并测试通过。PyGTK编程教程适合于新手以及高级用户。
>> PyGTK
PyGTK是一套GTK+ GUI库的Python封装。它为创建桌面程序提供了一套综合的图形元素和其它实用的编程工具。它是GNOME项目的一部分。PyGTK是基于LGPL许可之下的免费软件。其原始作者是James Henstridge。PyGTK非常容易使用对于速成原型法它是相当理想的。普遍地认为PyGTK是最流行的GTK+库封装中的一种。
PyGTK包含以下几个模块
{{./modules.png}}
GObject是基类它为PyGTK类提供通用的属性和函数。ATK是一个提供辅助功能的工具包。该工具包提供了帮助残障人士使用计算机的各种工具。GTK是用户界面模块。Pango是一个用于处理文本和国际化的库。Cairo是一个用于创建2D矢量模型的库。Glade是用来从XML描述中构建GUI界面。
>>Python
Python是一个动态的面向对象的编程语言。它是一种通用编程语言。它能被用于许多种类的软件开发。Python语言的设计目的是强调程序员的生产率和代码的可读性。它最初是由Guido van Rossum开发的并且于1991年第一次被发布。创造Python语言的灵感来源于ABC, Haskell, Java, Lisp, Icon和Perl这些编程语言。Python是一种高级的、通用的、跨平台的解释型语言。Python是一种极为简洁的语言。它的一种最明显的特征之一是它不使用逗号和括号而是使用缩进来代替。Python当前有两个主要的分支——Python 2.x和Python 3.x。Python 3.x与之前的Python发行版相比停止了向后兼容。它被创建用来纠正语言设计上的缺陷使该语言更加简洁。Python 2.x的最新版本是2.7.1Python 3.x的是3.1.3。本教程是为当前Python 2.x的版本所写。现在Python是由来自世界各地的一大群志愿者维护。
>>GTK+
GTK+是一个用于创建图形用户界面的库。该库是用C语言创建。GTK+库也被称为GIMP工具包。最初该库被创建是为了开发GIMP图像处理程序。自此GTK+成为了Linux和BSD Unix下最流行的工具包之一。现在在开源世界中大多数的GUI软件是用QT或者GTK+创建。GTK+是一个面向对象的应用程序接口。面向对象系统是基于Glib对象系统而创建Glib库是GTK+库的基础。GObject也能够使程序员创建各种各样其它编程语言的绑定。GTK+语言的绑定包括C++, Python, Perl, Java, C#以及其它程序设计语言。
Gnome和XFce桌面环境已经以GTK+库为基础被创建。SWT和wxWidgets是著名的编程框架它们也是用GTK+创建的。使用GTK+的杰出的软件程序包括Firefox或者Inkscape等。
>>参考来源:
pygtk.org
wikipedia.org
本文翻译部分原文地址http://www.zetcode.com/tutorials/pygtktutorial/introduction/
PS: 英文版的教程我简单的制作了一个pdf离线版的有兴趣的童鞋可以到这里下载来看看。
5月9号更新: 新发现一个英文版的教程内容比较完整英文不错的同学可以参考传送门我编译打包好了不想编译的童鞋可以到我的google code中下载或者查看本站提供的在线版

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

View File

@@ -0,0 +1,255 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-06-30T17:55:55+08:00
====== PyGTK教程——第一步 ======
Created 星期四 30 六月 2011
PyGTK教程——第一步
发表于2011年5月4日 | 归档于: PyGTK Tutorial, Python | 标签: PyGTK, Python, 教程, 翻译
本文为《PyGTK tutorial》翻译的第二篇上一篇链接为http://www.yeezi.org/2011/05/pygtk-tutorial-pygtk-indroduction.html
在本教程的这部分里,我们将进行我们编程的第一步。我们将创建示例程序。
>>简单的例子
第一个代码示例是一个非常简单的
Center.py
01 #!/usr/bin/python
02 # ZetCode PyGTK tutorial
03 #
04 # This is a trivial PyGTK example
05 #
06 # author: jan bodnar
07 # website: zetcode.com
08 # last edited: February 2009
09 import gtk
10 class PyApp(gtk.Window):
11 def __init__(self):
12 super(PyApp, self).__init__()
13
14 self.connect("destroy", gtk.main_quit)
15 self.set_size_request(250, 150)
16 self.set_position(gtk.WIN_POS_CENTER)
17 self.show()
18 PyApp()
19 gtk.main()
这段代码展示了一个位于屏幕中心的窗口。
import gtk
我们导入gtk模块。在这里我们用对象来创建GUI应用程序。
class PyApp(gtk.Window):
我们的程序基于PyApp类它继承自Window。
def __init__(self):
super(PyApp, self).__init__()
这是构造函数它初始化我们的程序。它也通过super()函数回调它的父构造函数。
self.connect("destroy", gtk.main_quit)
我们连接destroy信号到main_quit()函数。当我们点击窗口标题栏的关闭按钮或者按下ALt+F4destroy信号将会被调用。窗口将会被销毁但是程序没有被停止。如果你从命令行启动这个例子你会看到这种情况。我们通过调用main_quit()函数退出程序,这是很好的做法。
self.set_size_request(250, 150)
我们设置窗口的尺寸为250×150px.
self.set_position(gtk.WIN_POS_CENTER)
这一行使窗口位居屏幕的中心。
self.show()
现在我们显示这个窗口。这个窗口直到我们调用show()方法,才会是可见的。
PyApp()
gtk.main()
我们创建了我们的程序的实例,并且开始了主循环。
>>图标(Icon)
在下一个示例中,我们将显示程序的图标。大多数的窗口管理器会在窗口标题栏左上角和任务栏上显示图标。
Icon.py
01 #!/usr/bin/python
02 # ZetCode PyGTK tutorial
03 #
04 # This example shows an icon# in the titlebar of the window
05 #
06 # author: jan bodnar# website: zetcode.com
07 # last edited: February 2009
08
09 import gtk, sys
10
11 class PyApp(gtk.Window):
12 def __init__(self):
13 super(PyApp, self).__init__()
14
15 self.set_title("Icon")
16 self.set_size_request(250, 150)
17 self.set_position(gtk.WIN_POS_CENTER)
18
19 try:
20 self.set_icon_from_file("web.png")
21 except Exception, e:
22 print e.message
23 sys.exit(1)
24
25 self.connect("destroy", gtk.main_quit)
26
27 self.show()
28
29 PyApp()
30 gtk.main()
以上代码示例展示了程序图标。
self.set_title("Icon")
我们为这个窗口设置一个标题。
self.set_icon_from_file("web.png")
set_icon_from_file()方法是为窗口设置一个图标。图片被从磁盘当前工作目录被加载。
Figure: Icon
{{./icon.png}}
>>按钮(Buttons)
在下个例子中我们将进一步提高我们的PyGTK库编程技巧。
Buttons.py
01 #!/usr/bin/python
02
03 # ZetCode PyGTK tutorial
04 #
05 # This example shows four buttons
06 # in various modes
07 #
08 # author: jan bodnar
09 # website: zetcode.com
10 # last edited: February 2009
11
12 import gtk
13
14 class PyApp(gtk.Window):
15 def __init__(self):
16 super(PyApp, self).__init__()
17
18 self.set_title("Buttons")
19 self.set_size_request(250, 200)
20 self.set_position(gtk.WIN_POS_CENTER)
21
22 btn1 = gtk.Button("Button")
23 btn1.set_sensitive(False)
24 btn2 = gtk.Button("Button")
25 btn3 = gtk.Button(stock=gtk.STOCK_OPEN)
26 btn4 = gtk.Button("Button")
27 btn4.set_size_request(80, 40)
28
29 fixed = gtk.Fixed()
30
31 fixed.put(btn1, 20, 30)
32 fixed.put(btn2, 100, 30)
33 fixed.put(btn3, 20, 80)
34 fixed.put(btn4, 100, 80)
35
36 self.connect("destroy", gtk.main_quit)
37
38 self.add(fixed)
39 self.show_all()
40
41 PyApp()
42 gtk.main()
我们在窗口上展示了4个不同的按钮。我们将看见容器部件container widgets和子部件child widgets之间的不同并且将会更改子部件的一些属性properties )。
btn1 = gtk.Button("Button")
一个Button就是一个子部件。子部件被放置在容器内。
btn1.set_sensitive(False)
我们使这个按钮不敏感insensitive。这意味着我们不能点击它了它也不能被选择、聚焦等。这个部件图形化地变灰。
btn3 = gtk.Button(stock=gtk.STOCK_CLOSE)
第三个按钮在它的区域里显示了一个图片。PyGTK库中有一个内置的图片库我们可以使用它。此处可以参考The gtk Class Reference
btn4.set_size_request(80, 40)
这里我更改了按钮的尺寸。
fixed = gtk.Fixed()
Fixed部件是一个不可见的容器部件container widget。它的用途是用来包含其它子部件。
fixed.put(btn1, 20, 30)
fixed.put(btn2, 100, 30)
...
这里我们将按钮部件放置到fixed容器部件。
self.add(fixed)
我们设置Fixed容器成为我们的Window部件的主容器。
self.show_all()
要么我们调用show_all()方法要么就对每个部件包括容器调用show()方法。
Figure: Buttons
{{./buttons.png}}
>>提示文本(Tooltip)
一个提示文本Tooltip就是在应用程序中对一个部件用途的建议。它能够被用来提供额外的帮助。
Tooltips.py
01 #!/usr/bin/python
02
03 # ZetCode PyGTK tutorial
04 #
05 # This code shows a tooltip on
06 # a window and a button
07 #
08 # author: jan bodnar
09 # website: zetcode.com
10 # last edited: February 2009
11
12 import gtk
13
14 class PyApp(gtk.Window):
15
16 def __init__(self):
17 super(PyApp, self).__init__()
18
19 self.set_title("Tooltips")
20 self.set_size_request(250, 200)
21 self.set_position(gtk.WIN_POS_CENTER)
22
23 self.connect("destroy", gtk.main_quit)
24
25 self.fixed = gtk.Fixed()
26 self.add(self.fixed)
27
28 button = gtk.Button("Button")
29 button.set_size_request(80, 35)
30
31 self.fixed.put(button, 50, 50)
32
33 self.set_tooltip_text("Window widget")
34 button.set_tooltip_text("Button widget")
35
36 self.show_all()
37
38 PyApp()
39 gtk.main()
在这个例子中我们对一个窗口和一个按钮各设置了一个提示文本tooltip
self.set_tooltip_text("Window widget")
button.set_tooltip_text("Button widget")
用set_tooltip_text()方法做这项工作。
Figure: Tooltips
{{./tooltips.png}}
在这章中我们用PyGTK编程库创建了第一个程序。
本文翻译部分原文地址http://www.zetcode.com/tutorials/pygtktutorial/firststeps/
PS本文大体上以翻译原文为主添加了少量的参考链接以帮助查找相关的详细文档。
PS2暴汗( ⊙ o ⊙ )啊Python缩进全都没有了想办法解决中╮(╯▽╰)╭。。。
缩进问题请参考下一篇日志,有能同时解决缩进和高亮的童鞋,请告知,不胜感激~
NND问题解决了但是很麻烦请参考http://article.yeeyan.org/view/119553/94844 我用的是Syntax Highlighter and Code Colorizer for WordPress

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 KiB

View File

@@ -0,0 +1,7 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:46:37+08:00
====== Python Essential Reference 4th ======
Created 星期二 05 七月 2011

View File

@@ -0,0 +1,144 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:49:50+08:00
====== Python Essential Reference 4th 第10章 读书笔记 ======
Created 星期二 05 七月 2011
1、python解析器的参数
-i 当程序执行完毕后,进入交互模式。
-O 优化模式
-OO 进一步优化模式
-v 详细现实
-x 跳过py文件的第一行
2、另外python会解析一些环境变量例如
PYTHONPATH 模块搜索的路径,会被插入到 sys.path
PYTHONIOENCODING 对于stdin、stdout和stdout的方式
PYTHONOPTIMIZE -O参数
3、当在命令行只输入python不加任何py文件时进入交互模式。>>>提示输入新的一行语句。…表示目前处于多行语句下。这些数值可以通过环境变量sys.ps1和sys.ps2来设置。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> print sys.ps1
>>>
>>> print sys.ps2
...
4、上一次的计算结果用下划线“_”代替。
5、如何直接执行py脚本而不加python类似于shell在第一行加入python的位置指令。
?
#相关代码, [四号程序员] http://www.coder4.com
例如test.py文件
#!/usr/bin/env python
......
执行时:
?
#相关代码, [四号程序员] http://www.coder4.com
chmod u+x test.py
./test.py
6、第三方包和模块路径sys.path此外还有site目录也在sys.path中记录了。
7、在安装包时可以只将其安装在用户目录下例如~/.local或者~/.local/lib/python2.x/site-packages这可以通过python setup.py install user来完成。
8、需要使用未来版本中的但当前又不支持的功能通过future
from __future__ import division
9、当有未捕获的SystemExit异常时或者信号SIGTERMSIGHUP时会释放所有对象的引用和命名空间如果有调用__del__()。
但很多时候__del__()并不会被调用我们可以自己编写释放资源的函数并利用atexit模块在退出事件上注册它。这可以保证在退出时一定会执行cleanup()
?
#相关代码, [四号程序员] http://www.coder4.com
import atexit
def cleanup():
print "Exiting..."
#Do something
#close(f)
#...
atexit.register(cleanup)
执行:
?
#相关代码, [四号程序员] http://www.coder4.com
python ./cln.py
Exiting...
10、也可以使用os._exit(status)来退出程序它将直接使用exit()系统调用。liheyuan@cliheyuan@coder4-pc:/media/WIN7/Documents and Settings/liheyuan/Desktop$ iconv -f gbk -t utf8 ./python第10章.txt
1、python解析器的参数
-i 当程序执行完毕后,进入交互模式。
-O 优化模式
-OO 进一步优化模式
-v 详细现实
-x 跳过py文件的第一行
2、另外python会解析一些环境变量例如
PYTHONPATH 模块搜索的路径,会被插入到 sys.path
PYTHONIOENCODING 对于stdin、stdout和stdout的方式
PYTHONOPTIMIZE -O参数
3、当在命令行只输入python不加任何py文件时进入交互模式。>>>提示输入新的一行语句。…表示目前处于多行语句下。这些数值可以通过环境变量sys.ps1和sys.ps2来设置。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> print sys.ps1
>>>
>>> print sys.ps2
...
4、上一次的计算结果用下划线“_”代替。
5、如何直接执行py脚本而不加python类似于shell在第一行加入python的位置指令。
?
#相关代码, [四号程序员] http://www.coder4.com
例如test.py文件
#!/usr/bin/env python
......
执行时:
?
#相关代码, [四号程序员] http://www.coder4.com
chmod u+x test.py
./test.py
6、第三方包和模块路径sys.path此外还有site目录也在sys.path中记录了。
7、在安装包时可以只将其安装在用户目录下例如~/.local或者~/.local/lib/python2.x/site-packages这可以通过python setup.py install user来完成。
8、需要使用未来版本中的但当前又不支持的功能通过future
from __future__ import division
9、当有未捕获的SystemExit异常时或者信号SIGTERMSIGHUP时会释放所有对象的引用和命名空间如果有调用__del__()。
但很多时候__del__()并不会被调用我们可以自己编写释放资源的函数并利用atexit模块在退出事件上注册它。这可以保证在退出时一定会执行cleanup()
?
#相关代码, [四号程序员] http://www.coder4.com
import atexit
def cleanup():
print "Exiting..."
#Do something
#close(f)
#...
atexit.register(cleanup)
执行:
?
#相关代码, [四号程序员] http://www.coder4.com
python ./cln.py
Exiting...
10、也可以使用os._exit(status)来退出程序它将直接使用exit()系统调用。

View File

@@ -0,0 +1,53 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:50:11+08:00
====== Python Essential Reference 4th 第11章 读书笔记 ======
Created 星期二 05 七月 2011
本章主要介绍测试、调试和性能调优
1、C、Java等语言都是预编译类型编译器会阻止大部分的错误。而对于Python来说仅当运行时才能知道错误。因此发现错误的过程更麻烦一些。
2、函数、类等第一行常用三个引号的字符串来写注释docstring如下
def split(line,...):
"""
Split....
>>>split(...)
>>>[...]
"""
如上所示doc中经常包含python交互shell的代码用做测试用例。
我们可以用docstring中的测试用例来做单元测试。
?
#相关代码, [四号程序员] http://www.coder4.com
import split #被测试模块
import doctest
#根据docstring返回unittest的通过和失败数量
nfail, nsuccess = doctest.testmod(split)
3、上面用docstring来做测试的方法确实有点山寨而且效率比较低python也有unittest模块其实和JUnit非常类似。
?
#相关代码, [四号程序员] http://www.coder4.com
import split #被测试模块
import unittest
class TestSplit(unittest.TestCase):
def setUp(self):
pass
def tearDown(self):
pass
def testsimplestring(self):
r = split.split("...")
self.assertEqual(r,[......])
可以看到和JUnit非常类似。其中assert还可以有
t.assert
t.assertAlmostEqual(x,y,places) #在一定精度范围内匹配
t.assertRaises(exc,callable…)
等很多,需要时候看文档吧。
4、

View File

@@ -0,0 +1,162 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:51:25+08:00
====== Python Essential Reference 4th 第12章 读书笔记 ======
Created 星期二 05 七月 2011
1、有一些函数是不需要import的因为他们存在于__buildin__模块下这个模块是被默认引入的。在Python 3之后更名为__buildins__加了个s。
2、这些不需要import的称为内置函数。
3、一些内置函数/对象:
ascii(c)只在Python 3 可用返回c对应的ascii字符非ascii字符转化为对应的转义字符。
basestringPython 2 中Byte字符串和Unicode字符串的父类。
bin(x)返回整数x的二进制表示。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> bin(1)
'0b1'
>>> bin(100)
'0b1100100'
bool(x):将对象转化为布尔值。
?
#相关代码, [四号程序员] http://www.coder4.com
#list非空都为True
>>> bool([10,20,0])
True
#None对象为False
>>> bool(None)
False
bytearrayByte String的类型。
cmp(x, y)比较x和y若xy返回正数x==y返回0.
compile(string,filename,mode)将字符串编译为Python对象。
返回的是一个内部对象只能被eval等执行。
string一行或者多行\n换行的代码
filename一般给”就可以
modesinglestring为单行execstring为多行evalstring为单行
?
#相关代码, [四号程序员] http://www.coder4.com
#编译有a和b的语境到对象x
>>> x = compile("a=10\nb=[a,a,a]",'','exec')
#执行x对象
>>> eval(x)
#打印b
>>> print b
[10, 10, 10]
4、一些内置函数
delattr(object,attr)相当于del object.attr注意是删除对象的不是dictionary的
eval(expr)如上所述eval执行compile编译好的代码片段对象相当于可动态载入吧。
exec(expr)和eval类似但它无返回值。
filter(function,iterable)迭带过滤器对迭带对象的每个元素应用function如果返回True则加入到返回结果中。
如下过滤所有x<5的。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> def func1(x):
... return True if x<5 else False
...
>>> filter(func1,xrange(1,20))
[1, 2, 3, 4]
5、一些内置函数
getattr(object,name)等同object.name
hash(object)返回一个对应的Hash值不是所有类型都可以必须实现了__hash__()
?
#相关代码, [四号程序员] http://www.coder4.com
>>> hash(x)
2119161711
iter(object)返回对象objet对应的迭带器。
len(s)返回s的长度。
locals():返回当前局部的变量。
map(function,items)将function应用于每个items的对象上并收集每次调用的返回结果。
?
#相关代码, [四号程序员] http://www.coder4.com
#map函数和用法
>>> def func2(x):
... return -x
...
>>> map(func2,xrange(10))
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
6、一些内置函数
max(s)返回s中最大值
min(s)返回s中最小值
next(s):返回迭带器的下一个
object所有对象的基类
open(filename)打开一个对象并返回一个file-like object。
ord(c)返回ascii/unicode字符对应的数字。
?
#相关代码, [四号程序员] http://www.coder4.com
#ascii的数值
>>> ord('c')
99
#unicode的数值
>>> ord(u'赫')
36203
7、一些内置函数
pow(x,y,[z])x ** y如果有z的话x ** y % z不知道是不是算RSA神马的会不会给力……
试了试,好像还挺给力……
range/xrange返回连续数字的数组一般推荐xrange吧Python 3后统一为range了。
round(x)将x约到最近一个10^x位。
set(items)创建一个set
slice(start,stop[,step])返回一个slice对象
sorted(iterable)将iterxx排序后返回。
vars(object)返回object的符号表(其实就是返回__dict__属性)
zip(s1,s2)把多个seq并联返回(x1,x2)其中x1是s1中的x2是s2中的。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> zip(xrange(5),["a","b","c","d","e"])
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]
8、内置异常类
BaseException所有异常的基类。
Exception所有程序导致的异常的基类。
ArithmeticError算数异常基类如OverflowErrorZeroDivisionError。
EnvironmentError环境导致的异常基类如IOError和OSError。
9、异常的捕获
异常带属性args和message
?
#相关代码, [四号程序员] http://www.coder4.com
try:
xxxx
except IOError as e:
#handle exception
print e.args
pass
10、一些异常类
EOFError读到文件的末尾不过只有input()和raw_input()会抛出这个异常。其他的read()和readline()一般以返回None或者空行做为反应。
MemoryError可恢复的内存错误
IOError很常见了。带属性errno/strerror和filename。
SystemExit由sys.exit()函数发起退出也可用os._exit()。
11、Python中也有Warning警告。
12、引入未来的模块/函数future_builtins
?
#相关代码, [四号程序员] http://www.coder4.com
import future_builtins
本章完。

View File

@@ -0,0 +1,202 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:51:47+08:00
====== Python Essential Reference 4th 第13章 读书笔记 ======
Created 星期二 05 七月 2011
本章主要介绍了一些与Python运行时相关的模块。
1、atexit模块
在真个Python解释器退出时可以绑定若干钩子函数通过atexit完成。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> def fun1():
... print "hh"
...
>>> import atexit
>>> atexit.register(fun1)
<function fun1 at 0xb7707614>
>>>
#按Ctrl 和 D退出会调用fun1
hh
2、copy模块可用于浅拷贝和深拷贝copy.deepcopy
deepcopy(x [,visit])其中visit用于递归深度以防造成死循环。
对象必须自己实现了__copy__(self_)和__deepcopy__(self,visit)才可以被浅、深拷贝。
3、gc模块用于垃圾回收这个应该很熟悉了。
Python的垃圾回收分Generation01和2。当在第0个Gen存活了x个周期而没有到达refcount为0则被放入1以此类推到2。按照检查频率012递减。以此来最大化平衡效率和回收效率。
一些gc模块的函数
collect([generation]):全部执行一遍垃圾回收。
get_count(object)获取某对象object的引用计数。
disable()禁用gc
garbage一个list存了一些不再使用但存在循环引用的对象。
set_threshold(threshold0,[threshold1,threshold2])设置每隔多少个执行指令检查一次gc引来一个gc周期
4、inspect模块用于Python的属性、函数、字符串、文档、源代码等对应的Python的对象表示。
几个术语:
frame代码+环境+stack。。。
几个函数:
isbuiltin(object):是否是内置函数。
ismethod(object):是否是对象。
trace([context])返回stack等信息。
5、marshal模块一个基础的对象序列化模块很快但是功能单一对数据类型支持也不太好。一般只支持基本类型。
dump(value,file)把序列化写到file-object对象file中。
dumps(value[,version]):对象序列化,并返回字符串。
load(file)从file反序列化。
loads(string):从字符串中饭序列化。
marshal在性能上很好但是只支持数字、字符串、tuple、list、dictionary。而这几种支持的容器类型必须包含基本类型。也就是说不支持自定义object啦
6、pickle推荐使用的序列化方法基本类型都支持了
dump、dumps、load、loads和marshal一样。
如果要序列化多个对象多次调用dump/dumps即可。
有的对象不适合序列化因为对内部状态没有标识如file-objectnetwork的socket等。
7、Pickler和Unpickler是对pickle的对象化封装需要的话可以翻翻文档。
8、sys模块变量、解释器相关的函数。
常用的属性:
argvcmd传入的参数不用解释了吧
byteorder返回机器的CPU的大小端little或者big这个很实用呵呵……
copyright返回版权信息Python的。。
maxsize本机器上C语言integer支持的最大size我的32位机器为2147483647
ps1ps2设置Python解释器的>>>和…
winver返回注册表的版本windows可用。
常用的函数:
_clear_type_cache()Python会Cache对象和他们所在的模块一般是1024个最近的。这个函数可以清空这个Cache。
_current_frame()当前所处的frame。
?
#相关代码, [四号程序员] http://www.coder4.com
#在python解释器中执行
>>> sys._current_frames()
{-1216022848: <frame object at 0x845458c>}
sys.exit([n])引起一个SystemExit异常然后退出如果不需要引起异常可以执行os._exit(n)。
sys.getdefaultencoding():获取当前默认字符编码。
?
#相关代码, [四号程序员] http://www.coder4.com
#貌似默认是ascii
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
sys.setprofile()设置这么一个profile类似~/.bashrc只不过类型很随意这么解释靠谱儿么
sys.getprofile()获取楼上设置的profile。
9、traceback模块用于处理异常时候的很管用。
最常用的其实就是两个:
traceback.print_exec()直接把sys.exec上次异常信息格式化并打印到stdout。
traceback.format_exc(),和楼上类似,只不过是返回一个字符串。
?
#相关代码, [四号程序员] http://www.coder4.com
#!/usr/bin/python
import traceback
def fun1():
try:
1/0
except:
traceback.print_exec()
#print traceback.format_exc()
pass
if __name__ == "__main__":
fun1()
10、types模块就是Python对象的各种对应类型。
11、warnings模块会”弹出“但不会抛出异常也不会阻碍程序继续运行。
warning是可以过滤的两种方法
1代码方式
?
#相关代码, [四号程序员] http://www.coder4.com
warnings.filterwarnings(action="ignore",message=".*xxx.*",category=xxxx)
其中action可以是ignore等。
message是对哪些消息过滤支持正则。
category支持如下类型
Warning
UserWarning
DeprecationWarning
SyntaxWarning
RuntimeWarning
FutureWarning
2解释器的启动参数
?
#相关代码, [四号程序员] http://www.coder4.com
python -Wignore:the \ regex:DeprecationWarning
resetwarnings()重置所有filter
12、weakref弱引用模块。
可以加一个引用,但不增加原对象的引用计数。
有时候比如观察着模式可以防止循环引用而导致gc无法回收
创建弱引用wref = weakref.ref(obj)
通过弱引用来方位原对象wref()如果有则返回原对象如果已经被删除则返回None。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> class A:pass
...
>>> a= A()
>>> import weakref
>>> ar = weakref.ref(a)
>>> print ar
<weakref at 0xb76e4694; to 'instance' at 0xb76e3cac>
>>> print a
<__main__.A instance at 0xb76e3cac>
>>> del a
>>> print ar
<weakref at 0xb76e4694; dead>
一个例子这个Cache基本没用因为在foocache中使用的是弱引用每次cache后都留下一个弱引用而原始对象马上被销毁因此实际每次都要计算cache是废的。
?
#相关代码, [四号程序员] http://www.coder4.com
#!/usr/python
import weakref
class A:
def __init__(self,x):
self.val = x
def __str__(self):
return str(self.val)
def foo(x):
print "foo(%s)" % (str(x))
return A(x)
_resultcache = {}
def foocache(x):
if _resultcache.has_key(x):
r = _resultcache[x]()
if r is not None:return r
r = foo(x)
_resultcache[x] = weakref.ref(r)
return r
if __name__ == "__main__":
for i in xrange(10):
foocache(i)
#print foocache(i)
print "start..."
for i in xrange(10):
#print foocache(i)
foocache(i)
pass

View File

@@ -0,0 +1,128 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:52:10+08:00
====== Python Essential Reference 4th 第14章 读书笔记 ======
Created 星期二 05 七月 2011
本章主要介绍一些和数学计算相关的模块。
1、decimal模块
主要提供浮点运算。Python默认的浮点存储是IEEE 754标准的对于0.1可能被存储为0.000000000000000001。这类问题在Java等也存在。在一般情况下是可容忍的但有些时候比如用于金融行业软件3 * 0.1 == 0.3会返回False这是不合适的。
dicimal模块采用IBM General Decimal Arthmetic标准。定义了两种个主要类Decimal和Context类。其中Context是用于控制精度、向上/下取整、出错处理等。
构造和基本运算:
?
#相关代码, [四号程序员] http://www.coder4.com
#构造
import decimal
x = decimal.Decimal("3.4")
y = decimal.Decimal("4.5")
#基本运算符号都支持
>>> x * y
Decimal('15.30')
>>> x / y
Decimal('0.7555555555555555555555555556')
更改精度方法1
每个线程都有一个context对象直接修改其精确值
?
#相关代码, [四号程序员] http://www.coder4.com
#更改线程的context对象精度改为3
decimal.getcontext().prec = 3
c = x * y
d = x / y
#输出
>>> x * y
Decimal('15.3')
>>> x / y
Decimal('0.756')
更改精度方法2
用with块语句局部改变精度
?
#相关代码, [四号程序员] http://www.coder4.com
#使用with语句块局部更改精度
with decimal.localcontext(decimal.Context(prec=10)):
print x * y
print x / y
#输出结果
15.30
0.7555555556
2、Decimal对象构造方法
(1)数字
d = Decimal(102)
(2)字符串
d = Decimal(102.3)
特殊值Infinity, -Infinity, NanNot a Number
运算函数除了常用的之外:
x.exp([context])自然指数e ** x
x.sqrt([context])x开根号
3、Context能控制很多属性比较重要的是取整和精度。
Context(prec=None, rouding=None, traps=None, flags=None, Emin=None, Emax=None, capitals=1)
prec精度
rouding取整
traps信号处理机制当dicimal的对象抛出异常的时候可以在traps注册函数以拦截并进行处理。
flags运算状态变量如是否溢出、被零除等等。
capitals布尔值幂值是E或者e。
getcontext()和localcontext()返回当前线程的Context。
也可用前面提到的with方法获得
?
#相关代码, [四号程序员] http://www.coder4.com
with localcontext() as c:
c.prec = 5
#xxxx statements
对于0来说可认为是正号也可认为是符号。
4、fractions小数模块它存在的意义是对于无限小数很多无法精确表示。比如1/3
构造方法:
(1)从浮点数
?
#相关代码, [四号程序员] http://www.coder4.com
>>> fractions.Fraction("1.75")
Fraction(7, 4)
(2)从分数例如1/3
?
#相关代码, [四号程序员] http://www.coder4.com
#会自动约分的!
>>> fractions.Fraction(20,100)
Fraction(1, 5)
5、小数的运算和其他运算类似不再赘述。
6、math模块除了我们常用的abs、ceil等还有一些值得注意的
fsum(seq):全精确的计算求和,因为有时候浮点数之间运算会导致数被省略等。
hypot(x,y)计算点的平方的和sqrt(x*x + y * y)
7、numbers模块定义了一系列抽象函数分别对应基础类型如Number、Complex、Real、Rational、Integeral。
8、随机random模块。
random.seed([x]):随机化种子,如果省略,则用系统时钟做种子。
random.randint(a,b),返回[a,b]之间的随机整数
random.choice(seq)从seq的序列中随机选取一个元素。
random.sample(seq,len):和楼上类似,不过是指定长度的。这个用于验证码很完美吧!
?
#相关代码, [四号程序员] http://www.coder4.com
#生成4位数字的验证码
>>> "".join(random.sample("1234567890",4))
'7401'
random.uniform(a,b):返回随机浮点数,在[a,b)之间。
random.random():然会随机浮点数,在[0.0,1.0)之间。
最后注意一点:随机模块不是线程安全的!!

View File

@@ -0,0 +1,66 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:48:17+08:00
====== Python Essential Reference 4th 第15章 读书笔记 ======
Created 星期二 05 七月 2011
本章主要介绍抽象类和一些高级的Collection。
1、abc模块抽象类模块。
一个抽象类的例子:
(1)__metaclass__用ABCMeta替换
(2)抽象方法用@abstractmethod标注
(3)抽象属性用@abstractproperty标注
?
#相关代码, [四号程序员] http://www.coder4.com
from abc import ABCMeta,abstractmethod,abstractproperty
class Stackable:
__metaclass__ = ABCMeta
@abstractmethod
def push(self,item):
pass
@abstractmethod
def pop(self):
pass
@abstractproperty
def size(self):
pass
2、抽象方法和属性必须被子类全部实现后才能实例化
3、array模块类似list只不过其中所含元素必须全部相同
array模块的优点是比list省空间运算速度更快。缺点是之中类型显然不够灵活
array(typecode)
typecode可以取基本类型主要是char、整形、浮点具体见书P259页。
4、bisect模块在有序队列上插入并保持排序基于二分查找
?
#相关代码, [四号程序员] http://www.coder4.com
>>> lst = [1,2,5,6,7]
>>> bisect.insort(lst,4)
>>> print lst
[1, 2, 4, 5, 6, 7]
也有其他方法,用于二分查找:
?
#相关代码, [四号程序员] http://www.coder4.com
>>> lst = [1,2,5,6,7]
>>> bisect.bisect(lst,4)
2
上面返回2表示它应该被插入到2的位置
5、collections模块
deque双向队列
defaultdict与dictionary一样只是对keyerror的处理不同。
namedtuple命名的tuple与传统tuple兼容在提供作为参数时非常管用。
heapq优先队列
itertools用于iter的工具。
itertools.chain(itr1,itr2…)把N个itr串联一个end后执行下一个
itertools.cycle(itr)对itr循环到了end后回到头。
itertools.ifilter(func,iterable)仅当func返回True时才产生itr。
完毕。

View File

@@ -0,0 +1,446 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:46:50+08:00
====== Python Essential Reference 4th 第1章 读书笔记 ======
Created 星期二 05 七月 2011
http://www.coder4.com/archives/1475
1、Python是解释型语言。
2、在python解释器下下划线”_”存储了上次计算的结果。
3、关于print的写法print(“Hi!”)是Python2和3的print “Hi!”是Python2的。
4、程序退出的方法*nix(Ctrl+D)、WindowsCtrl+Z、程序中raise SystemExit异常
5、每一行算一条语句如果要一行中表示多条用分号”;”分割。
6、Block靠缩进识别一般为4个空格没有花括号。
7、如果Block下暂时没有代码如分支的某个if必须用pass代替否则格式会报错。
8、类似printf的print方法print( “%3d %.2f” % (year,) ),即用百分号“%”分割。
9、python没有switch只能用if和elif
?
#相关代码, [四号程序员] http://www.coder4.com
if suffix == ".htm":
content = "text/html"
elif suffix == ".jpg":
content = "image/jpeg"
elif suffix == ".png:
centent = "image/png"
else:
raise RUntimeError("Unknown content type")
10、in是操作符号用在sequencemap、list、tuple等中时检查seq中是否含有某元素。用在字符串中时检查是否为子串返回True或者False。
?
#相关代码, [四号程序员] http://www.coder4.com
#检查元素
arr = [1,2,3,4,5]
if 1 in arr:
print "Has 1"
else:
print "None"
#查找字串
if "spam" in s:
has_spam = True
else:
has_spam = False
11、文件读写
基础版本代码性能较低因为将会把所有数据都读入内存后面几章会介绍使用yield的版本不会都调入内存性能更好。
代码1比较初级
?
#相关代码, [四号程序员] http://www.coder4.com
f = open("../ExpInfoDAO.cc")
line = f.readline()
while line:
print line, #这里加上,会防止多换一行。
line = f.readline()
f.close()
代码2更简洁只有两行
?
#相关代码, [四号程序员] http://www.coder4.com
for line in open("../ExpInfoDAO.cc"):
print line,
如果想要输出到文件怎么办呢?
假设已经打开f = open(“out”,”w”)
print >>f,”Hi” #Python2的方法
print(“Hi”,file=f) #Python3的方法
12、字符串string包含在单、双、三引号中Hi,”Hi”。三引号可多行多用于doc
“”“
I
can
do
it
“”"
13、string也是sequence的一种但是是不可变的immutable和tuple一样
14、字符串可用加号+连接g = a + “Test string”。
可以slice切分、取字串
a = “Hello world.”
b = a[4] # a == “o”
b = a[:5] # a==”Hello”
类似的用2~3个下标可非常轻松的取出字串
15、string->其他使用int()、float()将字符串强制转化为其他类型:
?
#相关代码, [四号程序员] http://www.coder4.com
>>> x = int("12")
>>> x
12
16、其他->string使用str()、repr()、format()。str一般是直接字符转换直译、repr是翻译为内置类型的字符串译意format翻译完了再格式化。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> x = 3L
>>> str(x)
'3'
>>> repc(x)
>>> repr(x)
'3L'
>>> format(x,"0.5f")
'3.00000'
17、list是sequence的一种用方括号表示[1,2,"3",4,[5,6]]。可以嵌套任意类型。
18、list的基础操作slice、加运算、list复合comprehension
?
#相关代码, [四号程序员] http://www.coder4.com
#slice
>>> names = ["Li","He","Yuan"]
>>> names[:1]
['Li']
#加运算
>>> names_2 = ["Liu"]
>>> names + names_2
['Li', 'He', 'Yuan', 'Liu']
复合运算是比较高级的一种,如下,以打开文件、打印每行为例:
?
#相关代码, [四号程序员] http://www.coder4.com
lines = [line for line in open("../test.txt")]
for line in lines:
print line,
上面这个例子这么用有些繁琐,但是做数值计算时候非常有用,经常很有用,比如,求幻方:
?
#相关代码, [四号程序员] http://www.coder4.com
>>> [x*x for x in xrange(1,10)]
[1, 4, 9, 16, 25, 36, 49, 64, 81]
19、Tuples它和list同属sequence用圆括号表示区别是它是不可变的相对比list更省内存。
20、Tuples的用途很广泛
函数return返回多个值
?
#相关代码, [四号程序员] http://www.coder4.com
#arr = [1,2,3,4,5]
def minmax(arr):
m1 = min(arr)
m2 = max(arr)
return (m1,m2)
#结果:
>>> minmax(arr)
(1, 5)
再比如for时候的unpack解包
?
#相关代码, [四号程序员] http://www.coder4.com
m = [(1,"liheyuan"),(2,"liuxinrui")]
for (name,id) in m:
print name," ",id,
21、set类似stl和java里的set非重复无序元素集合必须用set()函数创建。
?
#相关代码, [四号程序员] http://www.coder4.com
#3不会重复的
>>> set([1,2,3,3,4,5])
set([1, 2, 3, 4, 5])
通过add()或者update()都用来添加前者更新单个元素后者可写入sequence。
s = set("1","2","3")
s.add("x")
s.update([1,2,3,4])
通过remove()来删除
s.remove("x")
22、Dictionaries叫法奇怪啊我更愿意叫map用花括号写。
?
#相关代码, [四号程序员] http://www.coder4.com
stock = {
"name": "GOOG",
"shares": 100,
"price": 490.10
}
要说明的是任何不可变类型都可做key
23、Dictionaries的操作访问value、更新、删除、转化为list。
可以直接用下标来完成。
?
#相关代码, [四号程序员] http://www.coder4.com
#访问value直接下标
>>> stock["shares"]
100
#更新value也用下标
>>> stock["shares"] = 200
>>> print stock
{'price': 490.10000000000002, 'name': 'GOOG', 'shares': 200}
#删除一个key和value
>>> del stock["shares"]
>>> print stock
{'price': 490.10000000000002, 'name': 'GOOG'}
#转化为list只转化key
>>> list(stock)
['price', 'name']
24、for循环和迭代。
?
#相关代码, [四号程序员] http://www.coder4.com
#range一次产生
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
#xrange是迭代的产生
>>> xrange(10)
xrange(10)
#迭代上面的stock
>>> for key in stock:
... print key,stock[key]
...
price 490.1
name GOOG
说明在Python3中xrange已经被合并为了range。
25、函数定义
?
#相关代码, [四号程序员] http://www.coder4.com
def remainder(a,b):
q = a//b
r = a - q * b
return (q,r)
if __name__ == "__main__":
print remainder(30,7)
#结果:
(4, 2)
26、Generators生成器、发射器是一个很有用的功能可以理解为函数级的”管道“。
?
#相关代码, [四号程序员] http://www.coder4.com
def countdown(n):
print "Ready for countdown..."
for i xrange(n):
yield i
#用法获取函数的handle后依次调用next()来获取下一通过”管道“发射过来的数值。
>>> c = countdown(10)
>>> c.next()
Ready for countdown...
0
>>> c.next()
1
>>> c.close()
27、用yield发射器来模拟*nix的常用命令“tail -f log|grep key“
?
#相关代码, [四号程序员] http://www.coder4.com
import os,time
def tail(f):
f.seek(0,os.SEEK_END)
while True:
line = f.readline()
if not line:
time.sleep(0.1)
continue
else:
yield line
def grep(lines,key):
for line in lines: #注意必须有是for不能省略
if key in line:
yield line
if __name__ == "__main__":
for line in grep(tail(open("log")),"python"):
print line,
28、Coroutines协同程序
协同程序与发射器有区别,但类似:
line = (yied) #用括号( )把yield围起来了。
与发射器相反程序中的协同语句将阻塞直到send()塞入消息为止。
发射器的“生命周期”工作周期是从直行第一次开始到close( )或者函数返回。此外在第一次send( )之前需要调用一次next( )
?
#相关代码, [四号程序员] http://www.coder4.com
import os
import time
def print_matches(key):
print "Looking for,",key
while True:
line = (yield)
if key in line:
print line
def tail(f):
f.seek(0,os.SEEK_END)
while True:
line = f.readline()
if not line:
time.sleep(0.1)
continue
else:
yield line
matchers = [
print_matches("python"),
print_matches("guido"),
print_matches("jython")
]
#在第一次使用一个coroutine之前必须先调用一次next()函数
for m in matchers:
m.next()
wwwlog = tail(open("log"))
for line in wwwlog:
for m in matchers:
m.send(line)
29、查看类对象的方法Python的源代码中含有与javadoc类似的文档机制。
使用dir(类名/对象变量名),可以查看某类可用的方法,
特殊的方法是两个下划线开始和结束的__ne__实际上是重载了操作符=
如下:
?
#相关代码, [四号程序员] http://www.coder4.com
>>> map = {"a":1,"b":2}
>>> dir(map)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
#__ne__是特殊函数实际重载了!=操作符
>>> map.__ne__({"a":1,"b":2})
False
>>> map != {"a":1,"b":2}
False
这是一个很实用的技巧我感觉Python的文档没有Javadoc那么详细
30、class里的类函数相当于c++的static函数
需要用@staticmethod标记如下
?
#相关代码, [四号程序员] http://www.coder4.com
#!/usr/bin/python
class TestClass:
#注意static函数无需self参数
@staticmethod
def print_static():
print("I'm an static function.")
def print_non_static(self):
print("I'm not an static function.")
if __name__ == "__main__":
TestClass.print_static()
tc = TestClass()
tc.print_non_static()
31、异常处理方法为的try、catch
?
#相关代码, [四号程序员] http://www.coder4.com
#!/usr/bin/python
try:
f = open("file.txt","r")
except IOError as e:
print e
finally:
print "Finally release other resource."
2使用with语句
?
#相关代码, [四号程序员] http://www.coder4.com
主动抛出异常用raise
raise RuntimeError("Computer says no")
32、除了finally之外还可以使用with来自动释放资源
?
#相关代码, [四号程序员] http://www.coder4.com
with m_lock:
message.add(msg)
with操作符由lock重载过了当进入with时候会锁住临界区出with块后会释放锁。
33、关于简单的doc。
在函数下面用三引号定义的可以用doc函数或者函数名.__doc__取出如下
?
#相关代码, [四号程序员] http://www.coder4.com
def TestFunction():
"""
I'm the doc for TestFunction.
"""
print("Do nothing.")
if __name__ == "__main__":
TestFunction()
#可以用__doc__取出。
>>> TestFunction.__doc__
"\n I'm the doc for TestFunction.\n "
(第一章笔记完)

View File

@@ -0,0 +1,80 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:47:36+08:00
====== Python Essential Reference 4th 第2章 读书笔记 ======
Created 星期二 05 七月 2011
本章主要是关于字面值和基本的符号。
1、一条很长的语句可以使用空格加斜杠” \”来分割,如下:
?
#相关代码, [四号程序员] http://www.coder4.com
#注意\ 前面又个空格哦!
a = 1 + 2 +3 + 4 \
+ 5 + 6+ 7
2、与1相反当需要多条语句在一行的时候需要用分号”;”分开。
?
#相关代码, [四号程序员] http://www.coder4.com
#注意\ 前面又个空格哦!
a = 2;b=3
3、当某快内没有任何语句的时候必须用pass占位因为Pyhon靠缩进来判别代码块
?
#相关代码, [四号程序员] http://www.coder4.com
if a:
pass
else:
print("False !")
4、标识符由字母、数字、下划线且非数字开头。和C、C++、Java等基本一样。
保留字如下:
and del from nonlocal try as elif global not while assert else if or with break except import pass yield class exec in print continue finally is raise def for lambda return
一个下划线开头的一般有特殊含义。
5、数字字面值布尔、整型、浮点、复数。
True/False
1 111111111111111111L
123.45 123e+04
1+2j 1-2J
6、字符串字面值单引号、双引号、三引号。
Im a Word
“Still a Word”
“”"
I
also
a
word
“”"
7、转移字符由\开始与C语言等类似
8、Python2中默认为byte字符串ASCII类似Python3才默认为utf-8,因此经常有乱码的问题。这个非常讨厌,遇到时候再找解决办法吧。
9、容器sequence
list[1,2,3,"Hi"]
tuple(1,2,3,”Hi I cant change”)
dictionary{“a”:1,”b”:2}
10、保留的操作符、特殊符号。
很多,不一一列举了。
特别注意的是,下述符号也被保留了:# \ @
11、文档字符串见上一章讲过了在def的下面用三个引号引起来的部分就是。
12、修饰符比如上一章的@staticmethod已经见过了。
13、源代码的编码
如果源代码中出现了其他语言的字符常量,而又与文件编码不一致,可以强制生命编码,文件头写如下内容:
# -*- coding: UTF-8 -*-
这样之后就可以直接使用utf-8的编辑器编辑了。
第2章 完毕)
您可能也喜欢如下文章:

View File

@@ -0,0 +1,417 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:50:31+08:00
====== Python Essential Reference 4th 第3章 读书笔记 ======
Created 星期二 05 七月 2011
1、Python中一切都是对象。
2、类型type/class对象/实例instance。
3、对象按照是否可以修改分为可变的 mutable 和不可变的 immutable 。
4、对象中可以引用其他对象成为容器 container or collection。
5、对象有属性和方法函数
6、Python中的对象无法直接映射到内存空间但是可以用内置函数id( )函数来查看:
?
#相关代码, [四号程序员] http://www.coder4.com
>>> a = []
>>> B = []
>>> id(a)
3077871564L
>>> id(B)
3077198956L
7、操作符is和==的区别:
is是比较是否是指向同一个对象指向的地址是否相同
==是比较指向的值是否相等
下面的例子:
?
#相关代码, [四号程序员] http://www.coder4.com
# 初始化
>>> a = ["123",456]
>>> b = a
>>> c = ["123",456]
# = 比较的是指向的值是否等
>>> a == b
True
>>> a == c
True
>>> b == c
True
# is比较的是指向地址是否相等
>>> a is b
True
>>> a is c
False
>>> b is c
False
8、每个对象都有一个类型用内置函数type查看可用此比较两个对象是否为同一类型。
但是is用于type上不包含继承类型。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> type([])
<type 'list'>
>>> a = set()
>>> type(a)
<type 'set'>
#比较
type(a) is type(b)
9、如果需要判断对象类型并考虑继承情况可以用isinstance。
?
#相关代码, [四号程序员] http://www.coder4.com
# 第一个参数必须是实例,第二个参数必须是类型
isinstance(a,list)
10、每个对象都有一个ref-count
ref-count加一的情况赋值给别的对象加入list等container中。
ref-count减一的情况del删除对象或者引用的对象出了作用域。
使用sys.getrefcount(xx)可以获得xx对象的引用数。
?
#相关代码, [四号程序员] http://www.coder4.com
a = 37
print sys.getrefcount(a)
b = a
c = []
c.append(a)
print sys.getrefcount(a)
del b
del c[0]
print sys.getrefcount(a)
#输出结果
9
11
9
11、加入list、tuple等collection中的是引用
因此如果一个对象被加入两个collection修改其中一个另外一个也会被改变。
?
#相关代码, [四号程序员] http://www.coder4.com
a = [1,2]
b = [0,a,3]
c = [a,55]
print c
b[1][1] = 3
print c
#输出
[[1, 2], 55]
[[1, 3], 55]
12、如果希望复制给别的引用或者添加到list中的是独立的对象新的可以用deepcopy
?
#相关代码, [四号程序员] http://www.coder4.com
import copy
a = [1,2]
b = copy.deepcopy(a)
b[0] = -2
print a
print b
#输出经过deepcopy后就是独立的了。
[1, 2]
[-2, 2]
13、一个转化数据的例子我们要把GOOG, 100, 490.10转化为对应的类型:
非常精简吧!
?
#相关代码, [四号程序员] http://www.coder4.com
line = "GOOG, 100 , 490.10"
field_types = [str,int,float]
fields = [ty(val) for ty,val in zip(field_types,line.split(","))]
#结果
['GOOG', 100, 490.10000000000002]
14、内置类型None int long float complex bool str unicode(只有python2有) list tuple xrange dict set frozenset(不可变的set)
int值域 -2147483648 ~ 2147483647
long值域 无限(取决与内存)
float64bit的浮点标示法。
15、sequence类型 str unicode list tuple xrange 的共用操作
S[i] index
S[i:j] slice
S[i:j:stride] slice stride是间隔
要说明的是i和j都可以是负数比如-1表示倒数最后一个。
len(S) S中元素的数量
min(S) S中最小的
max(S) S中最大的
sum(S,[初始值]) 累加S
all(S) 当S中都为True时返回True否则False
any(S) 当S中有任何一个为True时返回True否则False
16、list独有的操作
list(s) 转化为list
s.append(x) 追加元素到s末尾(如果x是list则把x整体做为一个元素追加到s后面)
s.extend(t) 追加list到s末尾(如果x是list则把x打散后做为N个元素追加到s后面)
?
#相关代码, [四号程序员] http://www.coder4.com
#list.append 整体追加
>>> x = [1,2,3]
>>> m = [0]
>>> m.append(x)
>>> print m
[0, [1, 2, 3]]
#list.extend 打散追加
>>> x = [1,2,3]
>>> m = [0]
>>> m.extend(x)
>>> print m
[0, 1, 2, 3]
#list.sort 原地排序
>>> x = [5,2,5,3,5,4,2]
>>> x.sort()
>>> x
[2, 2, 3, 4, 5, 5, 5]
17、Python 2 中默认是用Byte String而不是Unicode非常恶心。对字符串的操作不会改变原值要么返回新的要么是返回状态。
str.encode()将unicode的字符串str转化为其他编码并返回。
str.decode()将非unicode编码的字符串str转化为unicode编码的并返回。
encode和decode的参数有时候是可以省略的。
str.strip() 移除字符串头和尾部的空白字符,所有空白!包括空格、换行、制表符等等。
?
#相关代码, [四号程序员] http://www.coder4.com
#str.capitalize() 返回首字母大写的字符串
>>> a = "china"
>>> a.capitalize()
'China'
#str.encode() 将unicode的字符串str转化为其他编码并返回
>>> a = u"计算所"
>>> b = a.encode("gbk")
>>> print b
计算所
#str.decode() 将非unicode编码的字符串str转化为unicode编码的并返回
>>> c = b.decode("gbk")
>>> print c
计算所
#str.isupper() 检查str所有字符是否都是大写
>>> s = "ABC"
>>> s.isupper()
True
>>> s = "aBC"
>>> s.isupper()
False
#str.strip(chrs) 移除字符串头或者尾部的空白chrs指定
>>> s = " \tchina \r\n"
>>> s.strip()
'china'
18、xrange构造i到j的数组用于循环时候比较多。
?
#相关代码, [四号程序员] http://www.coder4.com
#输出
>>> for i in xrange(10,20):
... print i
...
10
11
12
13
14
15
16
17
18
19
19、map在Python中又叫做Dictionary词典。和数组类似k和v可以使任意数值k必须是不可变类型
【常用操作】
k in m检查key是否在map m中。
m[k]=v 对m的key赋值v
print m[k] 访问m的key值k
m.has_key(k) m中是否有key k
m.setdefault(k,v) 如果m中已经有k忽略否则新建k并设置数值为v
len(m)map含k-v对儿的个数
m.items()遍历所有k和v
m.keys()遍历所有k
m.values()遍历所有v
m.copy():影子拷贝,非深度拷贝!
20、set类型。set是集合内含元素非重复可以看作是退化的无value的map。
frozenset是不可变的set。
set和fronzenset的常用操作
set_a.copy()
set_a.intersection(another_set)与set取交集。
set_a.union(another_set)与set取并集。
set独有的常用操作修改的
set_b.add(item)
set_b.clear():清空
set_b.discard(item)如果item再set_b中移除不在的话无效不报异常。
set_b.remove(item)和discard类似不过item不在要抛出异常
21、lamada函数其实就是类似Java的匿名类。
?
#相关代码, [四号程序员] http://www.coder4.com
bar = lamada x,y: x + y
Cop Six 's Blog
22、函数有三种直接函数、类函数和静态函数
?
#相关代码, [四号程序员] http://www.coder4.com
#直接函数
def fun1(x,y):
return x+y
#类函数和静态函数
class Foo(object):
#类函数第一个参数必须是self
@classmethod
def method1(self,arg):
pass
#静态函数其实就是class的static函数第一个参数不用是self
@staticmethod
def method2(arg):
pass
23、几个内置对象
Trackback调用Trackback一般异常的时候用
Generator构造器
Slice这个不用说了吧lst[1:2]这种slice的时候就是用的这个对象
Ellipsis下标相应的对象。
24、特殊函数一般是双下划线开头__add__()是重载的+操作符。再如__getitem__()是重载的[]操作符。
25、与构造对象相关的特殊函数
__new__():新建对象时。
__init__():初始化时。
__de__()del xx操作符。
new和init一般是同时使用等价关系是新建对象=new + init
?
#相关代码, [四号程序员] http://www.coder4.com
#x = A()翻译为:
x = A.__new__(A,args)
is isinstance(x,A): x.__init__(args)
26、字符串相关的特殊函数
__str__()重载的str(xx)函数。
__repr()重载的repr(xx)函数。
27、特殊函数
__bool__()返回True或者False用于分支判断时。
__len__()重载len()函数。
__hash__()返回一个int类型的hash值。
__lt__(self,other)重载self
__ge__(self,other)重载self>=other
28、属性访问相关的特殊函数
__getattr__(self, name)访问属性x.name
__setattr__(self,name)设置属性x.name=yy
__delattr__(self, name)删除属性del x.name
29、有时候我们希望在访问x.name时加一层逻辑如log日志此时可以用Descriptors。
30、
__len__(self)len(item)重载
__geitem__(self,key)item[key]重载
__contains__(self,obj)返回self中是否含obj重载了 x in item。
31、数学操作符特殊函数
__add__(self,other) self + other
__sub__(self,other) self - other
__mul__(self,other) self * other
__div__(self,other) self / other (Python 2 only)
__truediv__(self,other) self / other (Python 3)
__floordiv__(self,other) self // other
__mod__(self,other) self % other
__divmod__(self,other) divmod(self,other)
__pow__(self,other [,modulo]) self ** other, pow(self, other,modulo)
__lshift__(self,other) self << other
__rshift__(self,other) self >> other
__and__(self,other) self & other
__or__(self,other) self | other
__xor__(self,other) self ^ other
__radd__(self,other) other + self
__rsub__(self,other) other - self
__rmul__(self,other) other * self
__rdiv__(self,other) other / self (Python 2 only)
__rtruediv__(self,other) other / self (Python 3)
__rfloordiv__(self,other) other // self
__rmod__(self,other) other % self
__rdivmod__(self,other) divmod(other,self)
__rpow__(self,other) other ** self
__rlshift__(self,other) other << self
__rrshift__(self,other) other >> self
__rand__(self,other) other & self
__ror__(self,other) other | self
__rxor__(self,other) other ^ self
__iadd__(self,other) self += other
__isub__(self,other) self -= other
__imul__(self,other) self *= other
__idiv__(self,other) self /= other (Python 2 only)
__itruediv__(self,other) self /= other (Python 3)
__ifloordiv__(self,other) self //= other
__imod__(self,other) self %= other
__ipow__(self,other) self **= other
__iand__(self,other) self &= other
__ior__(self,other) self |= other
__ixor__(self,other) self ^= other
__ilshift__(self,other) self <<= other
__irshift__(self,other) self >>= other
__neg__(self) self
__pos__(self) +self
__abs__(self) abs(self)
__invert__(self) ~self
__int__(self) int(self)
__long__(self) long(self) (Python 2 only)
__float__(self) float(self)
__complex__(self) complex(self)
32、with操作符。
with context [ as var]:
statements
重载函数为__enter__(self)和__exit__(self, type, value, tb)
33、dir函数辅助函数类似help
__dir__(self)重载了dir函数。
第三章完毕。

View File

@@ -0,0 +1,162 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:51:06+08:00
====== Python Essential Reference 4th 第4章 读书笔记 ======
Created 星期二 05 七月 2011
1、关于除法/和//。
在Python 2中/还是整除,即只返回整数部分。
而Python 3中/变为浮点除,不在约为整数。
在Python 2中想要获取浮点结果可以将除数或者被除数强转为float
?
#相关代码, [四号程序员] http://www.coder4.com
float(1)/100
0.01
2、一些有用的函数
abs(x)返回x的绝对值
pow(x,y)x ^ y等价于x ** y
round(x)返回离x最近的10^n值n可正或者负
3、关于对tuple、list、字符等的乘法重复字符
a * 5或者 5 * a都行
?
#相关代码, [四号程序员] http://www.coder4.com
>>> 'a' * 10
'aaaaaaaaaa'
>>> 10 * 'a'
'aaaaaaaaaa'
4、
all(s)s中所有元素都是true时返回true。
any(s)s中有任何元素为true时返回true。
5、v1, v2…, vn = S将S容器unpack给v1~vn
?
#相关代码, [四号程序员] http://www.coder4.com
#字符串也可以unpack
>>> x,y,z = "abc"
>>> print x,y,z
a b c
但是注意unpack出的v和S里面的格式必须完全一致
6、s[-1]:倒数第一个元素。
s[i:j]是取所有的k其中i<=k< p=""> <>
7、对于字符串in和not in可以相当于子字符串查找但是它不支持正则表达式
8、s[ i:j ] = r将s[i,j)都替换为r。
9、del s[i]的时候出了从list移除之外还会删除引用计数
10、sequence之间也可以比< > <= >= == 和!=。单都是基于全部匹配的。
11、字符串通过字典序比较。
12、不要将Unicode和普通Byte String混合使用。
13、String格式化(s % d)d是tuples是格式化的字符串。
14、String格式化时d也可以不是tuple而是dictionary如下
?
#相关代码, [四号程序员] http://www.coder4.com
>>> stock = { "name":"GOOG","shares":100,"price":300 }
>>> "%(shares)d of %(name)s at %(price)0.2f" % stock
#格式化后结果
'100 of GOOG at 300.00'
15、更高级的字符串格式化format(*args,*kwargs)这之中交叉使用了顺序参数和key-value参数。
这些高级字符串格式化方法都使用format并且用{ }
?
#相关代码, [四号程序员] http://www.coder4.com
>>> r = "{0} {1} {2}".format("GOOG",100,500.1)
>>> print r
#输出
GOOG 100 500.1
#混合使用加入key-value取值
>>> "{name:8} {share:8d} {price:8.2f}".format(name="lhy",share=100,price=500.1)
'lhy 100 500.10'
16、Dictionary上的一些操作
x = d[k]取dict d中的key k
d[k] = x设置dict d中的k对应val为x
del d[k]删除d中key为k的
k in d如果d中有以k为键的元素返回True否则False
len(d)dict中的长度
17、set和frozenset都支持并| 交& 差- 反^lenmaxmin等操作
18、对字符串使用+=,相当于拼接字符串的简写。
19、对点.的使用:为属性访问操作符。
如foo.x = 3
在一行中可连用多个点.如foo.bar(3,4,5).spam
20、使用functools包的partial函数可以分两次来给函数传入参数
?
#相关代码, [四号程序员] http://www.coder4.com
>>> def func(x,y,z):
... return x + y + z
...
>>> f = partial(func,1,2)
>>> from functools import partial
>>> f = partial(func,1,2)
>>> f(3)
6
21、Python中类型之间的转换没有C、C++那么麻烦,直接类型名()的函数即可。
?
#相关代码, [四号程序员] http://www.coder4.com
>>> a = "123"
>>> b = int(a)
>>> print b
123
22、unichar将整型转化成对应的unicode的char
chr将int转化成对应的ascii码的char
?
#相关代码, [四号程序员] http://www.coder4.com
#chr转化为ascii码的字符
>>> a = 49
>>> chr(a)
'1'
#unichar转为unicode的字符
>>> a = 21271 #21271是北的unicode编码
>>> print unichr(a)
23、在用于分支判断等时Ture/False和其他类型的转换所有非零的数、非空字符串、list、tuple、dictionary都会被视为True。而False0,None空list、tuple等转化为False。
24、x == y检查两个对象在逻辑上是否相等如完全相同的字典序、容器中每个元素都相同等
x is y则检查两个对象是否指向同一个引用即指向的内存地址是否相同
25、乘幂运算 ** 是右结合。
26、Python中没有C、C++、就Java等之中常见的三元操作符但是可以用一行写if
minvalue = a if a<=b else b
类似的,也可以用于[构造数组]中:
?
#相关代码, [四号程序员] http://www.coder4.com
[x if x < 50 else 50 for x in values]
完毕。

View File

@@ -0,0 +1,289 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:48:39+08:00
====== Python Essential Reference 4th 第7章 读书笔记 ======
Created 星期二 05 七月 2011
1、类(class)和实例(instances)是1对N的关系。
2、class由若干函数、变量类成员、属性实例成员组成。
一个示例的类如下:
?
#相关代码, [四号程序员] http://www.coder4.com
class Account(object):
num_account = 0 #类成员,所有示例共享!
def __init__(self,name,balance): #构造函数
self.name = name #实例成员
self.balance = balance #实例成员
Account.num_account += 1 #更新类成员
def withdraw(self,amt): #注意一定要有self
self.balance -= amt #更新实例成员
3、类的实例化
?
#相关代码, [四号程序员] http://www.coder4.com
#实例化会调用__init__函数
a = Account("He",1000)
b = Account("Rui",2000)
#操作 ( attribute binding )会先检查有无该属性或者函数注意无self参数显然的。。
a.withdraw(100)
4、Python无类作用域即对实例无论是函数还是属性的操作都必须加self完成。
?
#相关代码, [四号程序员] http://www.coder4.com
class Foo(object):
def bar(self):
print("bar!")
def spam(self):
bar(self) #错误!
self.bar() #正确!
5、python中的继承基类( base class )、派生类 ( derived class )派生类继承基类的所有属性和方法并可以选择重写或移除。object是所有Python对象的基类和Java的Object一样。一个继承类的写法很简单就是在def时class B(A):A为基类B为派生类。
?
#相关代码, [四号程序员] http://www.coder4.com
class EvilAccount(Account):
def badwithdraw(self,amt):
self.balance -= 2 * amt
而调用是,依然是
c = EvilAccount(“Me”,1500) #继承了基类的__init__方法
c.badwithdraw(1000) #直接定位到派生类
c.withdraw(1000) #派生类没有,定位到基类
6、派生类调用基类构造函数多数发生在派生类的构造函数与基类参数不同时。
?
#相关代码, [四号程序员] http://www.coder4.com
class EvilAccount(Account):
def __init__(name,balance,factor): #派生类与基类的构造函数不同
Account.__init__(name,balance) #派生类调用基类的构造函数
self.f = factor #然后初始化自己的特有成员变量
7、派生类如何调用父类的函数
(1)self.xxx(abcd)
(2)super(cls, instance).xxx(abcd)
例如:
?
#相关代码, [四号程序员] http://www.coder4.com
class MoreEvilAccount(EvilAccount):
def deposit(self, amount):
self.withdraw(5.00)
super(MoreEvilAccount,self).desposit(amount) #调用父类的desposit方法
8、Python支持多继承但属性间的冲突会导致很多问题所以不建议使用。
9、多态/Duck TypingPython一直都是运行时决定类型因此有人称之为Duck Typing即“鸭式类型”。好处是可以定义一些内部方法、属性很类似但又无继承关系的类。例如标准库中的file-like文件。
10、静态方法和类方法。
可以说,类方法是静态方法的一个拓展吧(不再局限于类了)
?
#相关代码, [四号程序员] http://www.coder4.com
class Foo(object):
factor = 1
@staticmethod
def add(x,y):
return x+y
@classmethod
def mul(cls,s):
return cls.factor*x
标准的调用方法都一样:
Foo.add(3,4)
Foo.mul(4,5)
除此之外python并没有限定类方法和静态方法不能用于实例上因此如下也是合法的
f = Foo()
f.add(3,4)
f.mul(5,6)
11、属性标记@property标记后可以向访问实例属性一样自动get如下
?
#相关代码, [四号程序员] http://www.coder4.com
#给函数加@property属性
class Cicle(object):
def __init__(self,radius):
self.radius = radius
@property
def area(self):
return math.pi * self.radius *2
#调用时可视作属性一样get
>>> c = Circle(4.0)
>>> c.radius
>>> 4.0
如果不加上述@property则返回值c.radius会被视为是函数area的实例。
m = c.radius #m是函数实例
真正的m()时才会调用c.radius()
12、一般来说最好用getter和setter对实例内的属性进行保护上面的@property只是getter方法如何实现让函数类似的属性可以被赋值呢需要@xxx.setter和@xxx.deleter。如下
?
#相关代码, [四号程序员] http://www.coder4.com
class Cicle(object):
def __init__(self,area):
self.__area__ = area
@property
def area(self):
return self.__area__
@area.setter
def area(self,value):
self.__area__ = value
@area.deleter
def area(self):
raise TypeError("can't delete name.")
标签@area.deleter的.前面的area必须完全匹配@property标记的属性
这样后,就可以如下用啦!
c = Cicle()
a = c.area
c.area = 123.2
del c.area
13、也可以用用户自定义的get和set方法来做隔离。它们是__get__()、__set__()、__delete__()函数。
14、关于私有函数/变量。根据Python约定以下划线_开头的可视为私有变量但无明确语法限制。
15、对象的内存管理创建对象时会调用class的__new__()和__init__()。
__new__()的用法一般很罕见,主要用途为:
(1)从一些不可变类继承时在new中更改一些数值入stringtuple等。
?
#相关代码, [四号程序员] http://www.coder4.com
class UpperStr(str):
def __new__(cls,value=""):
return str.__new__(cls,value.upper())
u = UpperStr("hello") #value is "HELLO"
(2)__new__()用于metaclass后面会讲到
16、对象的管理也是基于引用计数reference counting。当rc降到0的时候会删除对象此时调用自定义的__del__()如果有的话。__del__()一般也无需定义除非你要显示的关闭文件、关闭网络socket、释放链接等操作。一般这些都不应该在__del__()中完成和Java的finalize()道理一样。
17、引用计数rc并非完美又是会产生“环引用”导致内存泄漏典型的就是“观察者模型”此时可用弱引用解决问题。
?
#相关代码, [四号程序员] http://www.coder4.com
import weakref
class AccountObserver(object):
def __init__(self, theaccount):
self.accountref = weakref.ref(theaccount ) #create weakref
18、默认情况下python的属性名集合是用dirtionary(类似map)的结构实现的记录在cls.__dict__中。
当实例的任何属性self.xxx变化时都会反应到cls.__dict__中。访问属性时最终会调用内置的__getattr__()、__setattr__()和__delattr__(),如果需要,可以在这三个地方做拦截器(记录日志之类的)。
19、可以替换属性集合的内置数据结构采用__slot__它将限定属性可使用的名称换来更小的内存和更快的运行时间。
?
#相关代码, [四号程序员] http://www.coder4.com
import weakref
class AccountObserver(object):
__slot__ = ('name', 'balance')
.... #属性只能命名为name和balance
由于不使用__dict__对属性的访问也不会在进入__getattr__()等函数了。
此外在继承的时候子类必须也定义__slot__否则内存消耗会更大因此继承时候要谨慎使用。
20、python支持重载操作符例如对加号+的重载如下:
?
#相关代码, [四号程序员] http://www.coder4.com
class Complex:
def __init__(self,real,imag=0):
self.real = float(real)
self.imag = float(imag)
def __repr__(self):
return "Complex(%s,%s)" % (self.real,self.imag)
def __str__(self):
return "(%g+%gj)" % (self.real,self.imag)
def __add__(self,other):
return Complex(self.real + other.real, self.imag + self.imag)
def __sub__(self,other):
return Complex(self.real - other.real, self.imag - self.imag)
21、检查一个实例是否属于某一类isinstance(obj,name)
issubclass(A,B)如果类A属于类B注意A和B都是cls不是实例。
isinstance和issubclass都是可以重写的这点上python比较灵活。
22、抽象类Python支持抽象类需要引用abc模块。
?
#相关代码, [四号程序员] http://www.coder4.com
from abc import ABCMeta, abstractmethod, abstractproperty
class Foo:
__meta__ = ABCMeta #must 1
@abstractmethod #must 2
def spam(self,a,b):
pass
@abstractproperty #must 2
def name(self):
pass
抽象类肯定是不能直接被实例化的,实现抽象类得方法并不是继承!而是注册,如下:
?
#相关代码, [四号程序员] http://www.coder4.com
class Grok:
def spam(self,a,b):
print("Grok.spam")
Foo.register(Grok)
23、Metaclass我没看懂 - - 感觉是类似于Java的反射代理机制用于框架时候比较给力。
24、类包装器Class Decorators: take a class as input and returns a class as output
其实也没太明白用途。。
?
#相关代码, [四号程序员] http://www.coder4.com
#类包装器
registry = { }
def register(cls):
registry[cls.__clsid__] = cls
return cls
@register
class Foo(object):
__clsid__ = "123-456"
def bar(self):
pass
#用法
register(Foo)

View File

@@ -0,0 +1,204 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:49:04+08:00
====== Python Essential Reference 4th 第8章 读书笔记 ======
Created 星期二 05 七月 2011
1、Python程序可以通过模块(modules)和包(package)来组织简单来说模块就是xx.py文件而包就是一组这种文件组成的文件夹含子文件夹
2、如1中所属每一个.py文件都可以看做是一个模块通过import来引用。
?
#相关代码, [四号程序员] http://www.coder4.com
#spam.py
a = 37
def foo():
print("I'm foo and a is %d" % a)
def bar():
print("I'm bar and calling to foo")
class Spam(object):
def grok(self):
print("I'm Spam.grok in moudles spam")
#调用
import spam
x = spam.a
spam.foo()
......
3、import时会执行模块中的所有语句包括你可能的一些测试语句。当然他们只会被执行一遍不管被import了多少次
4、import仅仅是表示当前程序知道了那个模块的命名空间使用的时候必须带上模块比如上面的例子
obj = spam.Spam()
5、可以通过逗号,来引用多个模块:
?
#相关代码, [四号程序员] http://www.coder4.com
import socket, os, re
6、可以对import进来的模块重命名这在解决重名冲突时候非常有用
?
#相关代码, [四号程序员] http://www.coder4.com
import spam as sp
或者,可以作为一种“选择载入”的方法:
?
#相关代码, [四号程序员] http://www.coder4.com
if format == "xml":
import xmlreader as reader
elif format == "cvs":
improt csvreader as reader
确实挺好用的吧!
7、模块是Python中的一等公民也就是说模块可以被赋值到任何变量、列表等等之中……
8、import可以出现在程序的任意地方但是不管你写了多少个improt他会且只会被载入一次。
9、想知道当前import了那些模块print sys.modules()
10、import模块的时候只是引入了新模块的命名空间。类似于using namespace。但是很多时候我们需要的是using std::endl这样的只引用模块中的某函数或者类这就需要用from functionxxx import modulesxxx。
?
#相关代码, [四号程序员] http://www.coder4.com
from spam import foo
foo() #这样就可以直接用foo了类似于直接cout.....而不用std::cout...
11、from xxx import xxx也是可以用as的
?
#相关代码, [四号程序员] http://www.coder4.com
from spam import Spam as Sp
s = Sp()
12、from xxxModules import *,可以使用*。这个*并不是全部而是需要到xxxModules.py里面查找__all__定义。
因此是可以隐藏的,例如下面:
?
#相关代码, [四号程序员] http://www.coder4.com
#spam.py
import sys
__all__ = ['bar','Spam']
a = 37
def foo():
print("I'm foo and a is %d" % a)
def bar():
print("I'm bar and calling to foo")
class Spam(object):
def grok(self):
print("I'm Spam.grok in moudles spam")
#print sys.modules
#此时from spam import *时就不会有foo()
13、如果模块中使用了global命名空间则只以定义模块的文件为依据而非调用的模块
14、如何以“main”类似的方式启动模块。
如果运行代码非函数什么的直接写在模块尾部则import时候会被全执行因此可以用下面的方法
?
#相关代码, [四号程序员] http://www.coder4.com
if __name__ == "__main__":
print "Hi, I'm in main"
else:
pass
只有单独执行python xxx.py的时候才会进入"Hi I'm main"这里
15、模块的搜索。
路径在sys.path下按顺序来也可以动态添加
?
#相关代码, [四号程序员] http://www.coder4.com
>>> print sys.path
['D:/python', 'D:\\python', 'C:\\Python27\\Lib\\idlelib', 'D:\\python\\%PYTHONPATH%', 'd:\\python', 'C:\\Windows\\system32\\python27.zip', 'C:\\Python27\\DLLs', 'C:\\Python27\\lib', 'C:\\Python27\\lib\\plat-win', 'C:\\Python27\\lib\\lib-tk', 'C:\\Python27', 'C:\\Python27\\lib\\site-packages']
>>>
# 动态添加
sys.path.append("/tmp/xxx")
python会识别的拓展名为.py .pyw .pyc .pyo以及动态连库
16、如果是动态连库.pyd则会同时载入**.so(或者dll)
17、.py在第一次被import的时候会被变异为字节码.pyc优化后的未.pyo。
18、python搜索模块时候是大小写敏感的
19、关于重载入Python现在的版本中已经无法实现所以不要尝试动态冲载入更新.py后想载入新版本的
20、包Package每个包的下面要包含__init__.py。包的子目录下面也要有__init__.py。
类似的也是from xxx improt *的时候会检查__init__.py的__all__变量。有的才会被import
21、发布python程序。
首先放在一个目录下加入README等然后加入如下的setup.py
?
#相关代码, [四号程序员] http://www.coder4.com
from distutils.core import setup
setup(name = "spam",
version = "1.0",
py_modules = ['libspam'],
packages = ['spampkg'],
scripts = ['runspam.py'])
python setup.py sdist #自动打包生成一个zip。
python setup.py install #安装到本地通常是用户下载会zip包后解压缩后执行
python setup.py bdist #生成一个二进制版本都编译成pyc了
其他可转化成可执行程序的工具py2exe(windows), py2app(MAC OS)可怜的linux下还没有
22、也可以用setuptools则setup.py修改为
?
#相关代码, [四号程序员] http://www.coder4.com
try:
from setuptools import setup
except ImportError:
from distutils.core import setup
setup(name = "spam",
version = "1.0",
py_modules = ['libspam'],
packages = ['spampkg'],
scripts = ['runspam.py'])
23、Python的包都可以从PyPI上下载。http://pypi.python.org。
python setup.py install #安装
python setup.py install --user #安装到用户的目录下(home相关)
python setup.py install --prefix #安装到其他目录一般需要修改sys.path。
很多插件都用了c/c++,则需要对应编译器。
24、使用easy_install可以从网上直接下载然后安装。
25、附上setuptools(含easy_install)的安装方法:
Linux下
?
#相关代码, [四号程序员] http://www.coder4.com
wget -q http://peak.telecommunity.com/dist/ez_setup.py
sudo python ./ez_setup.py
Windows下下载并安装http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11.win32-py2.7.exe#md5=57e1e64f6b7c7f1d2eddfc9746bbaf20

View File

@@ -0,0 +1,241 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:49:26+08:00
====== Python Essential Reference 4th 第9章 读书笔记 ======
Created 星期二 05 七月 2011
本章主要是关于各种I/O操作包括File-Objec及其操作、Unicode字符串相关的I/O函数以及对象的序列化和持久化。
1、从cmd读取传入参数sys.argv。其中sys.argv[0]是当前的程序名称。
2、想要退出系统时出了exit和return外还可以raise SystemExit(1)
3、解析命令行参数可以使用optparse模块。
不过从2.7之后Python将废弃optparse转而支持argparse话说开源的东西变动太大。。。
optparse的用法如下
?
#相关代码, [四号程序员] http://www.coder4.com
import optparse
p = optparse.OptionParser()
#Add option of -o/--output
p.add_option("-o",action="store",dest="outfile")
p.add_option("--output",action="store",dest="outfile")
#Add option of boolean
p.add_option("-d",action="store_true",dest="debug")
p.add_option("--debug",action="store_true",dest="debug")
#Set default values
#p.set_default(debug=False)
opts,args = p.parse_args()
print opts.outfile,opts.debug
4、环境变量os.envviron
?
#相关代码, [四号程序员] http://www.coder4.com
>>> import os
>>> print os.environ
{'TMP': 'C:\\Users\\liheyuan\\AppData\\Local\\Temp', 'COMPUTERNAME': 'LIHEYUAN-PC', 'USERDOMAIN': 'liheyuan-PC', 'PSMODULEPATH': 'C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules\\', 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files', 'PROCESSOR_IDENTIFIER': 'x86 Family 6 Model 23 Stepping 10, GenuineIntel', 'PROGRAMFILES': 'C:\\Program Files', 'PROCESSOR_REVISION': '170a', 'SYSTEMROOT': 'C:\\Windows', 'HOME': 'C:\\Users\\liheyuan', 'COMSPEC': 'C:\\Windows\\system32\\cmd.exe', 'TK_LIBRARY': 'C:\\Python27\\tcl\\tk8.5', 'TEMP': 'C:\\Users\\liheyuan\\AppData\\Local\\Temp', 'PROCESSOR_ARCHITECTURE': 'x86', 'TIX_LIBRARY': 'C:\\Python27\\tcl\\tix8.4.3', 'ALLUSERSPROFILE': 'C:\\ProgramData', 'SESSIONNAME': 'Console', 'HOMEPATH': '\\Users\\liheyuan', 'USERNAME': 'liheyuan', 'LOGONSERVER': '\\\\LIHEYUAN-PC', 'LOCALAPPDATA': 'C:\\Users\\liheyuan\\AppData\\Local', 'PROGRAMDATA': 'C:\\ProgramData', 'PYTHONPATH': '%PYTHONPATH%;d:\\python;d:\\python;d:\\python', 'TCL_LIBRARY': 'C:\\Python27\\tcl\\tcl8.5', 'PATH': 'C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\;C:\\Program Files\\Common Files\\Thunder Network\\KanKan\\Codecs', 'PATHEXT': '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC', 'FP_NO_HOST_CHECK': 'NO', 'WINDIR': 'C:\\Windows', 'APPDATA': 'C:\\Users\\liheyuan\\AppData\\Roaming', 'HOMEDRIVE': 'C:', 'SYSTEMDRIVE': 'C:', 'NUMBER_OF_PROCESSORS': '2', 'PROCESSOR_LEVEL': '6', 'OS': 'Windows_NT', 'PUBLIC': 'C:\\Users\\Public', 'USERPROFILE': 'C:\\Users\\liheyuan'}
5、File-Object。
内置的open方法open(name,mode,bufsize),后两个参数是可选的。
mode可取r、w或a(追加)。默认情况下为了解决windows和linux下对换行的差异windows下是\r\n,linux是\n默认会对其进行转换统一成\n这个转换可以通过rU关闭或者U启用。如果要用二进制模式b则不会进行自动换行转换。如果需要及时更新则在r或者w或者a后面加上+。
bufsize是读取、写入的缓存0为关闭1是行缓存>1之后是用字节表示的读取次数。
File-Object的操作很多比较特别需要注意的如下
f.readline([n])读取一行接近n个字节时停止n省略时就是一行。
f.readlines([size])读取所有的行以list形式返回可选接近size个字节时停止。
f.close(),关闭释放资源。
f.tell(),定位当前偏移量。
f.seek(),随机访问文件的某一位置。
6、File读取完毕后会返回None而非抛出异常!
因此逐行读取的方法是:
?
#相关代码, [四号程序员] http://www.coder4.com
while True:
line = f.readline()
if not line:
break
7、在Python2中read()返回的是8-bit的字符串而Python3以后统一是utf-8
8、标准输入输出sys.stdinsys.stdoutsys.stderr。和C中是对应的。如果直接用可以
?
#相关代码, [四号程序员] http://www.coder4.com
import sys
sys.stdin.write("What's your name:")
name = sys.stdin.readline("")
9、当然如果从stdin读取写入stdout也不用那么麻烦用raw_input和print
?
#相关代码, [四号程序员] http://www.coder4.com
import sys
name = raw_input("What's your name:")
10、print语句尽管我们都是用它向stdout输出但实际上它也可以用于File-Object的。
?
#相关代码, [四号程序员] http://www.coder4.com
f = open("output","w")
print >>f,"hello world"
11、print语句在使用,隔开之后,默认的分隔符是空格,我们可以改变的:
(但是我一直没执行成功的说。。。)
?
#相关代码, [四号程序员] http://www.coder4.com
f = open("output","w")
print("The values are",x,y,z,sep=",")
12、print的字符串中的变量可使用“模板替换”。一般来说web框架都有自己定义的一套模板语法和文件但基本的用法还是可以用的
?
#相关代码, [四号程序员] http://www.coder4.com
text = "Dear %(name)s, Give me $%(amount)0.2f"
print text % ( {"name":'Mrs. Liu', "amount":100.3} )
也可以用formart方法
?
#相关代码, [四号程序员] http://www.coder4.com
text = "Dear {name}s, Give me {amount:0.2f}"
print text.format( name='Mrs. Liu', amount=100.3)
13、Generater和I/O
一般来说用生成器Generator与I/O操作结合可以让内容的产生和I/O部分去耦合另外附带的好处就是内存消耗更小因为不是拼接好一堆string后再写入
?
#相关代码, [四号程序员] http://www.coder4.com
import sys
#产生内容
def content(n):
while n > 0:
yield "T-minus %d\n" % n
n -= 1
yield "HaHa"
#逐行写入,不费内存
ct = content(5)
f = sys.stdout
f.writelines(ct)
14、显然Generator对buffer的利用不是最大化的因此有些时候我们也会采取拼接大字符串再一次性写入的方法。拼接可以用join
(当然内存少不了。。)
?
#相关代码, [四号程序员] http://www.coder4.com
......
"".join(lines)
15、Unicode字符串的处理。
绝对不要把unicode字符串和非unicode字符串连用
在I/O操作时会遇到很多的Unicode问题。解决方法有很多
1通过encode和decode
s.decode(encoding,error) 将encoding编码的字符串转化为unicode编码字符串
s.encode(encoding,error) 将unicode编码的字符串转化为8-bit的、encoding格式的字节码
encoding可以取ascii latin-1(iso-8859-1) cp1252 utf-8 utf-16 utf-16-le utf-16-be unicode-escape
raw-unicode-escape等。
error是转化过程中的容错级别默认是strict可以选择ignore或者replace等。
2使用Unicode I/O
使用函数codes.open(filename,mode,encoding,error),后三个参数可选。
mode和open的mode类似。
encoding是指定read和write时的字符串编码
?
#相关代码, [四号程序员] http://www.coder4.com
#这个例子要写入utf-8编码的字符串到文件
>>> import codecs
>>> str = u"计算所"
>>> f = codecs.open("test.txt",'w','utf-8')
>>> f.write(str)
>>> f.close()
#这个则是ascii编码gbk
>>> import codecs
>>> str = "计算所"
>>> f = codecs.open("test.txt",'w')
>>> f.write(str)
>>> f.close()
16、如果已经打开了一个文件并且想用codecs则可以用codecs包装一下
fenc = codecs.EncodedFile(f,”utf-8″)
17、Object序列化持久化
Python中对Object进行持久化非常容易可以用pickle实现
一个比较虎的地方貌似是支持循环引用。。比如下面的例子我持久化o2会把引用的o1也自动持久化了。。
?
#相关代码, [四号程序员] http://www.coder4.com
#!/usr/bin/python
import pickle
class MyObj(object):
def __init__(self,v,r):
self.value = v
self.ref = r
def print123(self):
print self.value
if self.ref != None:
print self.ref.value
if __name__ == "__main__":
o1 = MyObj(1,None)
o2 = MyObj(3,o1)
print "Before store using pickle"
o2.print123()
f = open("obj.sav","wb")
pickle.dump(o2,f)
f.close()
print "After load using pickle"
f = open("obj.sav","rb")
o = pickle.load(f)
f.close()
o.print123()
18、对象的持久化还可以直接使用shelve他不用再open文件了。
?
#相关代码, [四号程序员] http://www.coder4.com
import shelve
obj = SomeObject()
db = shelve.open("file")
db['key'] = obj
...
obj = db['key']
db.close()
19、实际上shelve的底层使用了pickle模块只不过将它写成文件时更易读懂。pickle的持久化格式随着版本有细微差异可以用过dump(obj,file,protocol)的最后一个参数来解决。
20、如果希望自定义持久化的数据可以来重写对象的__getstate__() 和 __setstate__()。它们会被pickle再dump和load的时候调用。比如对象涉及底层网络socket的时候就是一个例子。
?
#相关代码, [四号程序员] http://www.coder4.com
import socket
class Client(object):
def __init__(self,addr):
self.server_addr = addr
self.sock = socket.Socket(socket.AF_INET,socket.SOCK_STREAM)
self.sock.connect(addr)
def __getstate__(self):
return self.server_addr
def __setstate(self,value):
self.server_addr = value
self.sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
self.sock.connect(self.server_addr)

View File

@@ -0,0 +1,580 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-05T14:38:28+08:00
====== Python for Bash scripters ======
Created Thursday 05 January 2012
http://magazine.redhat.com/2008/02/07/python-for-bash-scripters-a-well-kept-secret/
Python for Bash scripters: A well-kept secret
by Noah Gift
Hey you, ya you! Do you write Bash scripts?
Come here, I have a secret to tell you.
Python is easy to learn, and more powerful than Bash. I wasnt supposed to tell you thisits supposed to be a secret. Anything more than a few lines of Bash could be done better in Python. Python is often just as portable as Bash too. Off the top of my head, **I cant think of any *NIX operating systems, that dont include Python**. Even IRIX has Python installed.
If you can write a function in Bash, or even__ piece together__ a few commands into a script and make it executable, then you can learn Python. What usually throws Bash scripters off is they see something object-oriented like this:
class FancyObjectOriented(object):
def __init__(self, stuff = "RegularStuff"):
self.stuff = stuff
def printStuff(self):
print "This method prints the %s object" % self.stuff
Object-oriented programming can be a real challenge to get the hang of, but fortunately in Python it is __100% optional__. You dont need to have a Computer Science degree to program in Pythonyou can get started immediately if you know a few shortcuts. My goal here is to show Average Joe Bash scripter how to write in Python some of the things they would normally write in Bash. Even though it seems unbelievable, you can be a beginning Python programmer, by the end of this article.
===== Baby steps =====
The very first thing to understand about Python, is that __whitespace is significant__. This can be a bit of a stumbling block for newcomers, but it will be old hat very quickly. Also, the shebang line is different than it should be in Bash:
Python Shebang Line:
#!/usr/bin/env python
Bash Shebang Line:
#!/usr/bin/env bash
Knowing these two things, we can easily create the usual Hello World program in Python, although whitespace wont come into play just yet. Open up your favorite text editor and call the python script, **hello.py**, and the bash script hello.sh.
Python Hello World script:
#!/usr/bin/env python
print "Hello World"
Bash Hello World script:
#!/usr/bin/env bash
echo Hello World
Make sure that you make each file **executable **by using chmod +x hello.py, and chmod +x hello.sh. Now if you run either script./hello.py or ./hello.shyou will get the obligatory “Hello World.”
===== Toddler: System calls in Python =====
Now that we got Hello World out of the way, lets move on to more useful code. __Typically most small Bash scripts are just a bunch of commands either chained together, or run in sequence__. Because Python is also a procedural language, we can easily do the same thing. Lets take a look at a simple example.
In order to take our toddler steps it is important to remember two things:
1. Whitespace is significant. Keep this in mindI promise we will get to it. It is so important that I want to keep reminding you!
2. A module called __subprocess__ needs to be imported to make system calls.
It is very easy to import modules in Python. You just need to put this statement at the top of the script to import the module:
import subprocess
Lets take a look at something really easy with the subprocess module. Lets execute an ls -l of the current directory.
//Python ls -l command://
#!/usr/bin/env python
import subprocess
subprocess.call("ls -l", shell=True)
If you run this script it will do the exact same thing as running ls -l in Bash. Obviously writing 2 lines of Python to do one line of Bash isnt that efficient. But lets run a few commands in sequence, just like we would do in Bash so you can get comfortable with how a few commands run in sequence might look. In order to do that I will need to introduce two new concepts: one for Python __variables__ and the other for__ lists__ (known as arrays in Bash). Lets write a very simple script that gets the status of a few important items on your system. Since we can freely mix large blocks of Bash code, we dont have to completely convert to Python just yet. **We can do it in stages**. We can do this by assigning Bash commands to a variable.
Note:
If you are cutting and pasting this text, you MUST preserve the whitespace. If you are using vim you can do that by using paste mode :set paste
//PYTHON//
//Python runs a sequence of system commands.//
#!/usr/bin/env python
import subprocess
#Note that Python is much more flexible with equal signs. There can be spaces around equal signs.
MESSAGES = "tail /var/log/messages"
SPACE = "df -h"
#Places variables into a** list/array**
cmds = [MESSAGES, SPACE]
#**Iterates** over list, running statements for each item in the list
#Note, that whitespace is absolutely critical and that a consistent indent must be maintained for the code to work properly
count=0
for cmd in cmds:
count+=1
print "Running Command Number %s" % count
subprocess.call(cmd, shell=True)
//BASH//
//Bash runs a sequence of system commands.//
#!/usr/bin/env bash
#Create Commands
SPACE=`df -h`
MESSAGES=`tail /var/log/messages`
#Assign to an array(list in Python)
cmds=("$MESSAGES" "$SPACE")
#iteration loop
count=0
for cmd in "${cmds[@]}"; do
count=$((count + 1))
printf "Running Command Number %s \n" $count
echo "$cmd"
done
Python is much more forgiving about the way you quote and use variables, and lets you create a much less cluttered piece of code.
===== Childhood: Reusing code by writing functions =====
We have seen how Python can implement system calls to run commands in sequence, just like a regular Bash script. Lets go a little further and organize blocks of code into functions. As I mentioned earlier, Python does not require the use of classes and object-oriented programming techniques, so most of the full power of the language is still at our fingertips—even if were only using plain functions.
Lets write a simple function in Python and Bash and call them both in a script.
Note:
These two scripts will deliver identical output in Bash and Python, although Python handles default keyword parameters automatically in functions. With Bash, setting default parameters is much more work.
//PYTHON://
#!/usr/bin/env python
import subprocess
#Create variables out of shell commands
MESSAGES = "tail /var/log/messages"
SPACE = "df -h"
#Places variables into a list/array
cmds = [MESSAGES, SPACE]
#Create a function, that takes a list parameter
#Function uses default keyword parameter of cmds
def runCommands(commands=cmds):
#Iterates over list, running statements for each item in the list
count=0
for cmd in cmds:
count+=1
print "Running Command Number %s" % count
subprocess.call(cmd, shell=True)
#Function is called
runCommands()
BASH:
#!/usr/bin/env bash
#Create variables out of shell commands
SPACE=`df -h`
MESSAGES=`tail /var/log/messages`
LS=`ls -l`
#Assign to an array(list in Python)
cmds=("$MESSAGES" "$SPACE")
function runCommands ()
{
count=0
for cmd in "${cmds[@]}"; do
count=$((count + 1))
printf "Running Command Number %s \n" $count
echo "$cmd"
done
}
#Run function
runCommands
===== Teenager: Making reusable command-line tools =====
Now that you have the ability to translate simple Bash scripts and functions into Python, lets get away from the nonsensical scripts and actually write something useful. Python has__ a massive standard library__ that can be used by simple importing modules. For this example we are going to create a robust command-line tool with the standard library of Python, by importing the **subprocess** and **optparse** modules.
You can later use this example __as a template__ to build your own tools that combine snippits of Bash inside of the more powerful Python. This is a great way to use your current knowledge to slowly migrate to Python.
Embedding Bash to make Python command-line tools[1]:
#!/usr/bin/env python
import subprocess
import optparse
import re
#Create variables out of shell commands
#Note triple quotes can embed Bash
#You could add another bash command here
#HOLDING_SPOT="""fake_command"""
#Determines Home Directory Usage in Gigs
HOMEDIR_USAGE = """
du -sh $HOME | cut -f1
"""
#Determines IP Address
IPADDR = """
/sbin/ifconfig -a | awk '/(cast)/ { print $2 }' | cut -d':' -f2 | head -1
"""
#This function takes Bash commands and returns them
def runBash(cmd):
p = subprocess.__Popen__(cmd, shell=True, __stdout=subprocess.PIPE__)
out = p.stdout.read().strip()
return out #This is the stdout from the shell command
VERBOSE=False
def report(output,cmdtype="UNIX COMMAND:"):
#Notice the global statement allows input from outside of function
if VERBOSE:
print "%s: %s" % (cmdtype, output)
else:
print output
#Function to control__ option parsing__ in Python
def controller():
__ global__ VERBOSE
#Create instance of __OptionParser__ Module, included in Standard Library
p = optparse.OptionParser(description='A unix toolbox',
prog='py4sa',
version='py4sa 0.1',
usage= '%prog [option]')
p.add_option('--ip','-i', action="store_true", help='gets current IP Address')
p.add_option('--usage', '-u', action="store_true", help='gets disk usage of homedir')
p.add_option('--verbose', '-v',
action = 'store_true',
help='prints verbosely',
default=False)
#Option Handling passes correct parameter to runBash
options, arguments =__ p.parse_args()__
if options.verbose:
VERBOSE=True
if options.ip:
value = runBash(IPADDR)
report(value,"IPADDR")
elif options.usage:
value = runBash(HOMEDIR_USAGE)
report(value, "HOMEDIR_USAGE")
else:
p.print_help()
#R**uns all the functions**
def main():
controller()
#This idiom means the below code only runs when executed from command line
if __name__ == '__main__':
main()
===== Pythons secret sysadmin weapon: IPython =====
The skeptics in the Bash crowd are just about to say, “Python is pretty cool, but it isnt interactive like Bash.” Actually, this is not true. One of the best kept secrets of the Python world is **IPython**. I asked the creator of IPython, Fernando Perez, how IPython stacks up to classic Unix interactive shells. Rather than trying to replicate what he said, Ill simply quote directly:
IPython is a replacement for the Python interactive environment that tries to incorporate the most common shell-like usage patterns in a natural way, while keeping 100% syntactic compatibility with the Python language itself. In IPython, commands like cd or ls do what youd expect of them, while still allowing you to type normal Python code. And since IPython is highly customizable, it ships with a special mode that activates even more defaults for shell-like behavior. IPython custom modes are called profiles, and the shell profile can be requested via:
ipython -p sh
This will enable all the shell-like features by default. The links below show some basic information about the shell-like usage of IPython, though we still lack a comprehensive guide for all of the features that actually exist under the hood.
http://ipython.scipy.org/moin/Cookbook/IpythonShell
http://ipython.scipy.org/moin/Cookbook/JobControl
IPython also contains a set of extensions for interactively connecting and manipulating tabular data, called ipipe, that enables a lot of sophisticated exploration of filesystem objects and environment variables. More information about ipipe can be found here:
http://ipython.scipy.org/moin/UsingIPipe
It is quite possible to use IPython as the only interactive shell for simple systems administration tasks. I recently wrote an article for IBM Developerworks, in which I demonstrated using IPython to perform interactive SNMP queries using Net-SNMP with Python bindings:
Summary
Even if you can barely string together a few statements in Bash, with a little work you can learn Python and be productive very quickly. Your existing Bash skills can be slowly converted to Python skills. And before you know it, you will be a full-fledged Python programmer.
I find Python easier to program in than Bash; you dont have to deal with hordes of escaping scenarios, for one. Bash has its placeusually when you dont have the ability to run Pythonas Python beats the pants off Bash as a scripting language.
I have included a link to all of the examples, and will have a souped-up version of the Python command-line tool with a few extra tricks sometime soon.
Let me close with saying that if you are interested in replacing Bash with Python, try to start out on the best possible foot and write tests that validate what you think you wrote actually works. This is a huge leap in thinking, but it can propel your code and productivity to the next level. The easiest way to get started with testing in Python is to use doctests, and I have enclosed a link at the bottom of this article. Good luck!
References
Subversion Repository For Examples
Checklist Based Testing For SysAdmins
Doctests
Online Bash Scripting Guide
Python Tutorial
IPython
Jeff Rush Show Me Do Tutorial
PEP8
Net-SNMP and IPython
[1] This code example has been corrected. Feb 08, 2008, 11AM EST
About the author
Noah Gift is currently co-authoring a book for OReilly, “Python For *Nix Systems Administration,” (working title) due sometime in 2008. He works as a software engineer for Racemi, dealing with Bash, Python, SNMP and a slew of *nix operating systems, including AIX, HP-UX, Solaris, Irix, Red Hat, Ubuntu, Free BSD, OS X, and anything else that has a shell. He is giving a talk at PyCon 2008the annual Python Programming convention being held in Chicagoon writing *nix command line tools in Python. When not sitting in front of a terminal, you might find him on a 20 mile run on a Sunday afternoon.
This entry was posted by Noah Gift on Thursday, February 7th, 2008 at 6:17 pm and is filed under documentation, Fedora, Red Hat Enterprise Linux, technical. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.
35 responses to “Python for Bash scripters: A well-kept secret”
Colin Walters says:
February 7th, 2008 at 6:49 pm
Also related to this topic is the Hotwire hypershell:
http://hotwire-shell.org/
You could think of it as a multi-threaded,multi-tab graphical IPipe shell. Were actually discussing sharing come code.
anzan says:
February 7th, 2008 at 8:00 pm
Thank you. I have some of this material in a more complex form. You have presented it very clearly.
Jayce says:
February 7th, 2008 at 10:00 pm
I never thought of using Python this way. I have used and seen many Python scripts to replace what could be done in Bash, but these always never looked anything like shell scripts—totally lost the special shell flavor. You have shown how to use Python, while keeping the shell flavor. Thanks for the new perspective.
A small point: “global” modifier is not required to read a global variable. It is only required if you need to modify a global variable (a safety feature to prevent accidental modification of global variables).
Seth Vidal says:
February 8th, 2008 at 12:00 am
Great article. Best to draw people in this way and then we have more python programmers!
Thanks for the article.
Paul W. Frields says:
February 8th, 2008 at 12:57 am
A bash script I wrote many, MANY moons ago, which I rewrote in Python after spending only about two hours on reading other Python scripts and some documentation (http://diveintopython.org/), cut execution time from ~2.5 minutes to about ~5 seconds. Id say that in itself was worth the trouble. Im no Pythonista, just a wannabe programmer that wanted to Get Something Done. Python presents the lowest barrier to entry to people who want to learn programming at all, and for object-oriented programming its also a great way to get started.
Noah Gift says:
February 8th, 2008 at 4:52 am
Colin/Looks interesting I will need to check it out.
Anzan/Glad it was helpful.
Jayce/Good catch on the totally unnecessary use of global. It was a “solution in search of a problem”, I forgot I left in the code example…woops :)
Writing articles in which people can comment is always a great way to make an article even better. Here is a link to a more explicit example of using global, which is actually a rarely used idiom:
http://www.oreillynet.com/onlamp/blog/2007/12/tpt_tiny_python_tip_global_1.html
I also fixed this in the svn repository, and might be able to get it fixed in the article example.
Seth/Glad you liked the article
Paul/I agree with you. Python is good for almost anything from the web, to scripting, to application development.
Malcolm Parsons says:
February 8th, 2008 at 5:06 am
Do we need the Hello World examples twice?
Noah Gift says:
February 8th, 2008 at 5:08 am
Malcolm/Good catch, that was a version control problem :) No, it was a mistake.
Kris says:
February 8th, 2008 at 5:56 am
I wrote a long post and im not going to replace that.. It basically said that its more of a rule than an exception that a python app fudges up ime(whos with me??!), and also that bash is the most unintuitive “language” ive come across.
So please only code python for personal stuff, dont publish it.
Nate says:
February 8th, 2008 at 8:06 am
I think youre missing the strengths of both languages.
Bash is exceptional at dealing with the output and exit status of other commands. It may not be the fastest, but I can work with command output and exit statuses effortlessly.
Python is great at trying things out interactively (even without IPython) before you put the code in a larger program. It also has available to it most of the system calls you would use in a C program. Calling out to “ls” and “df” is silly when you can call readdir() or statvfs() directly.
Michael DeHaan says:
February 8th, 2008 at 8:22 am
Always glad to see some more Python advocacy. It really is the perfect tool for sysadmin scripting.
Some readers will quickly find out subprocess isnt available on EL4 since that still uses python 2.3. No problem!
For those folks, use os.system(“command here”).
Subprocess is of course a lot more flexible, allowing python to quickly replace usage of things like “expect”, for more details, see “pydoc subprocess”. Its very powerful. The subprocess module actually does work in python 2.3, but youll have to copy it over.
sqweek says:
February 8th, 2008 at 10:01 am
What the hell? You call those shell scripts?
I call them needlessly complicated bastardisations. Allow me to reimplement your toddler script (in sh, because rc is probably a bit esoteric for this crowd).
#!/bin/sh
tail /var/log/messages
df -h
Woah! Now on to the teenager script a “reusable command-line tool” that does… two wildly different things?
NO. You have completely missed the point of the shell. Heres how its done:
$ cat ~/bin/ip
#!/bin/sh
/sbin/ifconfig -a | awk /(cast)/ { print $2 } | cut -d: -f2 | head -1
$ cat ~/bin/usage
#!/bin/sh
du -sh $HOME | cut -f1
Do _ONE_ thing, and do it well. The power of the shell lies in its ability to composite _simple_ tools to complete complex tasks.
Mind you, bourne shell clones certainly make a mess of quoting and add tons of useless features like arithmetic (thats bcs job[1]) and line editing (this is forgiven only because unix terminals SUCK). Thats why rc exists.
Python _is_ more powerful than bash, but it does not beat the shell at manipulating environments and processes. Just as awks grammar makes it well suited to text manipulation, the shells grammar makes it well suited to these tasks.
If youre trying to use the shell as a general purpose language then yes, switch to python. But embrace the shell when in its domain and you may be surprised by its elegance.
-sqweek
[1] Of course, bc has its own quirks and pitfalls which shouldnt exist if it was done right *sigh*.
Charlie says:
February 8th, 2008 at 10:36 am
In the third example, I noticed youre still using “for cmd in cmds:” which works, but I thought it was supposed to show how the local variable commands was set as a parameter. “for cmd in commands:” would show this properly, right?
BTW, why can one access cmds from inside that function without using the global keyword? Is global implied since it is specified in the parameter?
Colin Walters says:
February 8th, 2008 at 11:13 am
> Python _is_ more powerful than bash, but it does not beat the shell at manipulating environments and processes.
In Hotwire, you can say for example:
proc | filter badprocess.*foo cmd | kill -9
That kills all processes (with SIGKILL) whose command name matches the regular expression “badprocess.*foo”.
Absolutely no text parsing of /bin/ps involved.
Dejan Lekic says:
February 8th, 2008 at 12:05 pm
Python is a good OO language, there is no doubt about it, but as language for shell scripts… IMHO not. It will never beat specialized languages like ZSH or BASH. Numerous reasons have already been posted above.
However I could not see one very important reason BASH will always win in the shell-battle it is a _POSIX STANDARD_.
Do not forget that!
Python does well in the PERL-killer-wannabe battle of titans (PHP, Ruby, Python). I would always chose Lua for shell apps instead of these 3 languages, plus PERL. Sure it is just a matter of taste. Reasons? Lua is simply fast, lean, and well-designed language. None of them have some cool advantages other languages do not have, or cannot have. It is simply a personal choice what to chose…
Kind regards
_dietrich says:
February 8th, 2008 at 1:04 pm
Nice work Noah!
Otheus says:
February 8th, 2008 at 1:58 pm
What kind of shell script is this?
SPACE=`df -h`
MESSAGES=`tail /var/log/messages`
LS=`ls -l`
In SH (and BASH), these are run IMMEDIATELY. In the Python code, the processes are deferred until the loop. In the BASH version shown above, its implied these are executed in the loop, but thats not the case.
Its nice to know that python CAN do OS stuff, but Im not sure what the point is. Execution speed? I can see that of being importance in certain situations, like Nagios handler scripts.
Stephen Smoogen says:
February 8th, 2008 at 2:23 pm
I wanted to say great intro to the language. My biggest problem with python on large scale environment is version dependencies. Usually some code will work with a particular version of python, and if your system doesnt have it.. well you cant get there from here without a lot of work. [Try running func or newer yum's on RHEL-3 or RHEL-2 :) ]. This is actually a problem with any of the interpreted languages… [Up to last year, I had systems that only have perl4 on them] and why I end up falling back to sh scripts for anything cross-platform.]
Now if python were packaged up so I could install python-2.2,2.3,2.4,etc on the same system without much trouble… I would be so much happier :) .
Noah Gift says:
February 8th, 2008 at 5:00 pm
Everyone who commented/Thank you so much, there are many excellent points, that I agree with them mostly. If anyone has a better example of the Bash or Python scripts, send me an email noah dot gift at gmail dot com. I will add you to the google code project and you can check in more examples of your own. At worst it is something fun to do on a weekend:
http://code.google.com/p/python4bash/
Michael DeHaan says:
February 8th, 2008 at 5:56 pm
Noahs examples strike me as more of examples of baby steps to doing things, rather than examples that simply replace shell script one liners in their own right. So if you are new to Python syntax, read them over, but dont take them as boilerplate. For instance, he showed you how to use optparse and subprocess — now imagine their usage in a more complicated program :)
If you want to see another interesting systems management project using Python that may appeal to bash+ssh users BTW, check out Func — https://fedorahosted.org/func. There should a Red Hat Magazine article on it coming out pretty soon now too.
(I see Smooge has already alluded to it… and yes, hes a bit right regarding versioning. Typically I target my stuff at a base of Python 2.3 and avoid newer functions in 2.4/2.5 — such is the case with many toolsets. Thats more of an issue of coding to the distro though, than the language itself… or in coding to the API of various libraries that are no longer updated for older platforms).
Paddy3118 says:
February 8th, 2008 at 11:44 pm
At work we use this excellent tool to manage multiple versions of all types of software just install to a different area and create a module to update your environment to access the version of Python you require:
http://modules.sourceforge.net/
I could then do:
module load python/2.5.1
python -V; # shows its python 2.5.1
module rm python
module load python/2.4
python -V; # shows its python 2.4
- Paddy.
Ajay says:
February 9th, 2008 at 4:17 am
dabbling with shell every now and then (I ma not a shell ninja by any means), I have not come across an elegant error handling solution so far.
(If someone has it will be nice if you can point me to some documentation/tutorial)
This starts biting you in case you have large shell scripts (even if broken down to functions) and every line ends with “|| die” where die spits some debug info and as the name suggests dies.
Python/ruby can be be helpful here, as they have better error handling mechanisms in try/catch/finally
however, I will stick to shell for my one liners
- “pipe is mans best friend”
Don Seiler says:
February 13th, 2008 at 4:52 pm
@Charlie, I noticed the same thing with “cmds” vs “commands”. Id like to see a correction in the article if anyone is paying attention.
Wannabe says:
February 16th, 2008 at 3:28 am
the “subprocess” module is specific to the newest version of python. The version that came with my system doesnt have it, but the version I compiled myself does.
Noah Gift says:
February 16th, 2008 at 9:23 am
If you find yourself not having subprocess, you can use popen:
http://docs.python.org/lib/module-popen2.html
PEdroArthur_JEdi says:
March 2nd, 2008 at 8:32 am
Just a little tip…
count=$((count + 1))
Do you think this is messy?
so, code like this:
((count++))
www.tagsto.com/trackback/ says:
May 11th, 2008 at 12:57 pm
Hubs of Python for Bash scripters: A well-kept secret
hubs about Administration IRIX to … “Python For *Nix Systems Administration,” (working title) due sometime in 2008. He works as a software engineer for Racemi, dealing with Bash, Python, SNMP and a slew of *nix operating systems, including AIX, H…
OldPro says:
July 12th, 2008 at 5:10 am
I tried to use python, converting several bash tools but I am not convinced. Python is a full-fledged programming language and maybe not a bad one. But BASH scripts look much cleaner if you want to glue together bricks of binaries and shell scripts. As soon as your BASH script looks too complicated, it should be converted into a regular (non-shell) language with heavy type checking and strict compilation settings.
The fact that in Python I need to import some class “subprocesses” (dependency hell!!!) to do even the most menial of things, that I should embed code into Popen etc. is appalling to me: the full Unix power is transparently in your hands within BASH. As a rule in programming, the necessary and complicated data structures should always be hidden under the hood, i.e., within data files etc. Shell programming is about handling named chunks of information in a processing line, possibly using pipes to avoid the over-use of temporary files. Sadly, this simplicity is lacking in all of the script-use examples of Python I see on the web.
The importing of modules/classes was a hell in Tcl/Tk, it is in Java and it is in Python. My juniors are losing a lot of time in getting things to work on any other machine (“but the .py worked perfectly on mine!”).
Lets keep it simple wherever it can.
By the way: the line
echo “$cmd”
will NOT execute the command in the BASH sequence example.
Lifestream Updates for 2008-11-03 - By Joerg Hochwald - Lifestream says:
November 3rd, 2008 at 6:00 pm
[...] Red Hat Magazine | Python for Bash scripters: A well-kept secret [...]
Lifestream Updates for 2008-11-03 « hochwald.net says:
November 3rd, 2008 at 6:29 pm
[...] Red Hat Magazine | Python for Bash scripters: A well-kept secret [...]
Denis says:
November 26th, 2008 at 4:29 am
Yeah, as someone already have told python doesnt have one very important thing: conveyers.
Python for Bash scripters | Madbuda says:
January 10th, 2009 at 10:29 am
[...] Embedding Bash to make Python command-line tools[1]: [...]
Madbuda » Blog Archive » Run-levels: Create, use, modify, and master says:
January 10th, 2009 at 2:06 pm
[...] Python for bash scripters: A well-kept secret (RHM, Feb 2008) [...]
Nex blog » Blog Archive » Links del giorno: January 27, 2009 says:
January 27th, 2009 at 5:46 am
[...] Python for Bash scripters: A well-kept secret [...]
A. Yuryshev says:
April 24th, 2009 at 7:40 am
IPython will definetly beat sh-clones in time.
I define it like XXI-bash.
Its more modern in concepts.
In fact IPython made the same thing MS did in PowerShell OO-shell. But better. And portable.
The battle is: OO-shells vs file-shells.

View File

@@ -0,0 +1,128 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T15:00:06+08:00
====== Python 下载网页的几种方法 ======
Created 星期二 05 七月 2011
http://blog.csdn.net/sding/article/details/5538065
总结下Python 下载网页的几种方法
1
fd = urllib2.urlopen(url_link)
data = fd.read()
这是最简洁的一种当然也是Get的方法
2
通过GET的方法
def GetHtmlSource(url):
try:
htmSource = ''
req = urllib2.Request(url)
fd = urllib2.urlopen(req,"")
while 1:
data = fd.read(1024)
if not len(data):
break
htmSource += data
fd.close()
del fd
del req
htmSource = htmSource.decode('cp936')
htmSource = formatStr(htmSource)
return htmSource
except socket.error, err:
str_err = "%s" % err
return ""
3
通过GET的方法
def GetHtmlSource_Get(htmurl):
htmSource = ""
try:
urlx = httplib.urlsplit(htmurl)
conn = httplib.HTTPConnection(urlx.netloc)
conn.connect()
conn.putrequest("GET", htmurl, None)
conn.putheader("Content-Length", 0)
conn.putheader("Connection", "close")
conn.endheaders()
res = conn.getresponse()
htmSource = res.read()
except Exception(), err:
trackback.print_exec()
conn.close()
return htmSource
通过POST的方法
def GetHtmlSource_Post(getString):
htmSource = ""
try:
url = httplib.urlsplit("http://app.sipo.gov.cn:8080")
conn = httplib.HTTPConnection(url.netloc)
conn.connect()
conn.putrequest("POST", "/sipo/zljs/hyjs-jieguo.jsp")
conn.putheader("Content-Length", len(getString))
conn.putheader("Content-Type", "application/x-www-form-urlencoded")
conn.putheader("Connection", " Keep-Alive")
conn.endheaders()
conn.send(getString)
f = conn.getresponse()
if not f:
raise socket.error, "timed out"
htmSource = f.read()
f.close()
conn.close()
return htmSource
except Exception(), err:
trackback.print_exec()
conn.close()
return htmSource

View File

@@ -0,0 +1,290 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-06T13:02:01+08:00
====== Python 二三事 ======
Created Friday 06 January 2012
http://pre-sence.com/archives/python-intro/
面向初学者介绍Python相关的一些工具以及可能遇到的常见问题。
最后更新 2011.9.25
之前有两篇同名的文章发在网上,在这里编辑整合在一起放在这里。
===== 引言 =====
在这里我假设你已经看完了一篇Python教程基本熟悉了Python的**结构和语法**在命令行下的Python互动环境中尝试过大部分Python的语句觉得Python是个不错的语言准备继续下去。那么本篇文章会就Python实际运用中相关工具的选择包括IDE调试套件第三方库管理工具这些进行介绍。另外还会对某些中文环境下容易遇到的问题例如unicode编码解码的问题进行说明。本文主要是针对** Windows** 环境下的 Python 开发进行说明。文章的目的是为了分享些我觉得很有用的经验和例子,若发现文中有疏漏之处请务必联系我。谢谢。
===== Python 语言介绍 =====
Python 是一个近些年在开始流行起来的计算机编程语言。根据Python官网上的简介Python主要特性包括跨平台免费简单且容易维护。就我个人理解来说Python是一门适合大部分人的语言因为各种类型的第三方库都有所以像简单桌面程序动态网站开发图像处理表格处理甚至自动发帖机这些小应用在简单的学习后不需要很深厚的编程经验的人应该都能自己做出来。
一些流行的Python教程有:
* Dive into Python 面向有一定编程基础的同学。另外还有Dive into Python 3针对Python3的教程。
* Learn Python The Hard Way书中主要是通过各种练习来进行学习面向完全没有编程经验的同学。
* Invent Your Own Computer Game With Python让你一上手就做个游戏出来的教程厉害吧。
* The Python Tutorial官方文档中的教程正统而完整。
如果你还没有开始接触 Python 或者觉得还不够熟悉那么不妨找一份你觉得看得下去的教程开始学习吧。就我个人经验来说Python 是我到目前为止觉得学的最划得来的一门语言,也是日常用的最多的一门,而事实上你并不需要了解完全了解 Python 就能在开始使用它。
===== Python 版本选择,其他发行版 =====
==== Python 2 与 3 ====
Python 2 和 3 系列的选择可能是比较让人烦躁的事情。其实区别很简单Python 3.x 各个方面都更好,但语法与 Python 2.x 很大部分不兼容。Python 2.x 已经停止继续开发。但是目前很多第三方库仍然不支持 Python 3 , 文章后面介绍的很多工具譬如 ipython 目前也是仅支持 Python 2.6 的。
我建议现在选择 Python 2.6.5 因为目前大部分第三方库和工具对2.6都有简单的安装包,不需要自己做太多处理。
==== 发行版 ====
目前在 Windows 下除了官方提供的安装版外,还有:
* ActivePython 这个与官方版本的区别在于提供了额外的库和文档并且自动设置了PATH环境变量(后文会详细提到)
* Python(x,y),这个是我一直用并且推荐给别人用的版本。从名字就能看出来这个发行版附带了科学计算方面的**很多常用库**,另外还有大量常用库比如用于桌面软件界面制作的 PyQt, 还有文档处理exe文件生成等常用库。另外的还有大量的工具如IDE制图制表工具加强的互动shell之类。很多下文提到的软件在此发行版中都有附带。其他方面Python(x, y)还附带了手工整理出的所有库的离线文档,每个小版本升级都提供单独的补丁。总的来说是很用心维护的一个发行版,十分建议安装这个版本。
===== 开发相关工具 =====
首先,你需要一份文档
对于 Python 这样的语言,你觉得你学到什么时候算是**完全掌握**呢?你也许会想也许哪一天你记得**大部分函数的名字**很怎么用,不用打几行就 Google 搜一下的时候,就算学会了。这样的理解对了一半,等你熟悉 Python 以后你的却不应该常搜索但前一半却不一定我个人认为你并__不需要记住__庞大的标准库中的内容很多时候你__只要清楚要在哪里能找到相应的文档__就行了。
Python 在这方面可以说是做的非常非常非常好。在真正着手开发之前,你应该在下载一份离线的文档。在这个页面(如果打不开的话试试这里,你懂的)下载一份 HTML 格式的,比如是 2.6.5 版那么对应的文档名字应该是 python-2.6.5-docs-html.zip。下好后把它在一个你喜欢的地方解压出来打开其中的 index.html这就是这个文档的主页。你可以看到他分为很多部分包括语言的参考标准库和其他很多方便的文档。
===== Python 文档 =====
如果你一下不知道从哪里看起,这份文档还有一个非常棒的功能。看到左边的 Quick Search 栏,我在上图中也有标注起来。当你需要对某个函数或者标准库进行进一步了解的时候,你可以在这边来进行搜索。这里的搜索是火星科技驱动的离线状态下也能够使用的!比如输入 urllib.urlencode你可以很方便的找到它对应的页面。基本有了这份文档你可以避免掉很多疯狂搜索的情况。同样的当你使用某个第三方库的时候你最好也在他的站点上找找有没有一份离线文档因为 Python 项目很多都有着很赞的文档。
===== 选择 PyDev 作为 IDE =====
Python 集成开发环境的选择好像一直以来也是一个很难抉择的问题。在尝试过很多个工具后我发现基于 Eclipse 的 PyDev 绝对是功能最为完整的一个 IDE 。除了断点调试之外PyDev 的**代码自动补全**可能是现在这类 IDE 中最强力的。
如果你安装了 Python(x, y) 的话PyDev 就已经在你的机器上了。如果没有的话请按照这篇文章来进行安装。
设置上有一些需要注意的地方。首先在打开 PyDev ,打开菜单中 Window -> Preferences在弹出对话框中左边找到 PyDev -> Editor -> Code Completion。这里可以设置代码自动补全的相关信息。可以降低 Autocompletion delay 来更早的提示代码,并且将 Request completion on 系列尽可能勾上,让 PyDev尽可多的提示代码。
之后再找到 Interpreter Python 选项卡,这里可以设置所谓 Forced Buildins可以强制引入某些第三方库从而完成代码补全。就我的经验来看大部分第三方库在这样设置后都能进行基本的补全。具体的做如图中选择到对应的选项卡点击 New并输入你需要的模块名字即可。
Forced Buldins设置
设置后总体效果绝对是同类IDE中比较好的:
===== IPython 替代 Python Shell =====
在学习 Python 的时候应该都接触过 Python 的 Shell能够输入 Python 语句并且立即返回结果。而 IPython就是一个豪华加强版的 Python Shell。如果你安装了 Python(x, y) 的话,那 IPython 已经在你的机器上了。如果没有的话那么请在这里下载 Windows Installer 进行安装。在安装这个之后还需要安装 pyreadline 让 IPython 开启高亮和自动补全功能。
之后你在命令行下需要 python 的时候改为输入__ ipython__ 就能使用它了。开启 IPython 看看,首先感觉的不同应该是这个是**有颜色**的。我们来看看它提供的一些基础而实用的功能吧。
首先是__自动补全__一种是简单的关键字补全另外一种是对象的方法和属性补全。作为例子我们先引入 sys 模块,之后再输入 sys. (注意有个点),此时按下 tab 键IPython 会列出所有 sys 模块下的方法和属性。因为是在互动模式下进行的,此时的 Python 语句实实在在的被执行了,所以对普通 object 的补全也是很完好的。
===== IPython =====
接着上面的例子,我们输入 sys?,这样会显示出 sys 模块的 __docstring__及相关信息。很多时候这个也是很方便的功能。
===== IPython 实用技巧 =====
这里再介绍下 IPython 使用中的一些实用功能。在学习 Python 时你可能看到在循环或者函数返回时可以赋值给__ _ __来表示忽略某个返回值。其实这只是一个常用的习惯。事实上 _ 是一个合法的变量名,而且在 Python shell 下 ___ 总是被赋予之前最后一个输出的值__。这里看个例子应该就能清楚
>>> import string
>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
>>> print _
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
举个实际的例子,比如你在调试时读文件的时候直接进行 f.read() ,你看了看发现输出结果很有意思,想要对它进行进一步处理,但发现读的时候忘记赋值了。以往你只能叹叹气重新开文件再读一次,现在你只要执行 __result = ___把 _ 附到另外一个变量就可以了。
IPython 还有强大之处很大部分还体现在它的 __magic function__ 中。它是指的在 IPython 环境下执行__以 % 开头__的一些命令来对 IPython 进行一些**设定或者执行某些功能**。在 IPython 中输入__ %lsmagic __就能列出所有的 magic functions。在这里简单介绍下几个比较有意思的你也可以自己通过查看文档来找找有哪些你特别用的到得。
* 之前看到能用 ? 来查询__函数的文档__对于 magic function 也是如此。比如 %run?。
* !cd .. 在命令前面加上 ! 则它会被作为__命令行命令执行__这样你就不用退出 IPython 来进行命令行操作。
* %run foo.py 在**当前环境**下直接执行 foo.py效果跟命令行下调用 ipython foo.py 相同。
* %time foo.bar() 跟 timeit decorator 作用相同,进行简单的 profile。
* %hist 能显示之前输入过的__命令的历史__同时你可以用 In[<linenumber>] 来访问之前的命令。比如 **%exec In[10] **就能执行列表中第十行。
* %rep 类似上面的 _ 变量,但是是以字串的形式返回.
* 最后,如果 %automagic 是打开的状态的话,所有 magic function 不需要在前面加 % 就能正确调用。
在当前 IPython 版本中还有一个由于安全原因没有默认引入的 __%autoreload__它的作用是在可以**自动重新载入你调用的函数,以及其相关模块**。接触过 django 的同学对这个应该比较熟悉,在 IPython 中的效果就是,当你在调试一个一直在反复改动的函数时,你可以开启这个功能保证每次调用都会重新读取最新的版本,让你在源码中的改动马上生效。在 IPython 中执行
import ipy_autoreload
%%autoreload 2
这样 IPython 会对**所有的模块**都进行 autoreload。你可以通过执行 %autoreload? 来查询它的文档来进行进一步设定。如果你希望 IPython 每次启动自动载入次功能那么可以通过配置__ ipythonrc__ (在 Windows 下可以在 C:\Users\<username>\_ipython\ipythonrc.ini 找到) 来进行相关设置。
最后还有一个很神奇的功能。如果你的程序是由命令行开始执行的,即在命令行下输入 python foo.py大部分 Python 程序都是),那么你还可以利用 IPython 在你的程序任意地方进行__断点调试__在你程序中任意地方加入如下语句
from IPython import embed
embed()
再和平常一样运行你的程序你会发现在程序运行到插入语句的地方时__会转到 IPython 环境下__。你可以试试运行些指令就会发现此刻 IPython 的环境就是在程序的那个位置。你可以逐个浏览当前状态下的各个变量调用各种函数输出你感兴趣的值来帮助调试。之后你可以__照常退出 IPython__然后程序会继续运行下去自然地你在当时 IPython 下执行的语句也会对程序接下来的运行__造成影响__。
这个方法我实在这里看到的。想象一下,这样做就像让高速运转的程序暂停下来,你再对运行中的程序进行检查和修改,之后再让他继续运行下去。这里举一个例子,比如编写网页 bot ,你在每取回一个页面后你都得看看它的内容,再尝试如何处理他获得下一个页面的地址。运用这个技巧,你可以在取回页面后让程序中断,再那里实验各种处理方法,在找到正确的处理方式后写回到你的代码中,再进行下一步。这种工作流程只有像 Python 这种动态语言才可以做到。
===== pip 管理第三方库 =====
Python 的一大优势就是有极为大量的第三方库,包括各个方面的引用。然而安装第三方库对没有掌握方法的同学来说会变得很让人烦恼。事实上 Python 第三方库的安装和管理有着一个一个唯一正确的做法,这个做法要求你什么其他的都不用干,只要输入你要安装库的名字就可以了。
setuptools 也包在 Python(x, y) 当中。如果没有的话,要首先先安装 __setuptools__ ,这个其实就是一个**安装第三方库**的软件。选择对应版本的 Windows Installer 进行下载和安装后,打开一个命令行窗口,输入:
easy_install pip
如果提示找不到程序,那么说明你当前没有设定好环境变量。安装官方提供的 Python 安装包的话肯定会有这个问题,而且很可能暂时不会修正,这就是牛逼程序员的倔强。具体做法是 右键我的电脑 - 属性 - 高级系统设置 - 环境变量 - 将 C:\python2*\Scripts 加入到 PATH 那一组当中。这样做的效果就是在任何地方的命令行下输入命令,那么系统会额外查找我们设定的那个目录中的内容。之后再执行上面的命令,装好了以后我们就要弃用 setuptools转投 pip。要安装任何一个库你只要找到他的名字(不需要版本号),用 pip 安装即可。譬如安装 django那么输入如下命令即可:
__pip__ install django
其实之前 easy_install 跟 pip 效用是类似的,都是在官方的第三方库索引 PyPI 查询信息并进行下载和安装。pip 的__优势在于支持更高级的功能譬如虚拟环境__安装失败不会残留破损的库更重要的是 pip 还可以进行__卸载__。输入下面命令就能卸载一个之前由 pip 进行安装的库。继续上面的例子,现在要卸载 django:
pip uninstall django
这是 setuptools 所缺失的功能。需要额外说明的是大部分纯 Python 的库都能用这个方法在 Windows 下装上,但是需要编译 C 语言模块的一般都不太可能成功。遇到这种情况,在相应的库德站点上找找有没有对应的 Windows 安装包。
===== 用 virtualenv 构建虚拟 Python 环境 =====
如果你使用过 Python 做过 Web 开发,或者你有需求在本机上安装多个版本的 Python 来测试你的代码能否跑再 2.5, 2.6, 2.7 各个版本上,或者你的不同项目依赖于一个第三方库的不同版本;再或者,有时候你就是想要一个没有之前安装过的乱七八糟的库,一个干净的 Python 环境。这种时候 virtualenv 就能帮上你的忙。它能利用安装好的 Python 在同一台机器上建立一个或多个__互不相干的虚拟 Python 环境__且能随时切换。如果你看到这里还不觉得这个有什么用处那不妨看下去留下点印象等哪天你有这类需求的时候能找到这个简单实用的工具。
和其他第三方库一样,我们可以通过 pip 轻松安装:
pip install virtualenv
安装完成后你可以开启一个命令行窗口,输入 virtualenv 看看能不能找到这个脚本。如果有问题的话,请按照上面介绍过的步骤检查下是否设置好了 PATH。之后我们可以在一个**方便的地方建立一个虚拟环境**。建立 C:\envs\ 文件夹,命令行下 cd 到该文件夹中,输入:
virtualenv --no-site-packages --python=C:\Python26\python.exe **envtest**
之后应该会看到一个叫 envtest 的文件夹。这就是一个新建立的__虚拟环境__(virtual enviroment)。我们不妨先__激活它__来看看应该怎么用。命令行下执行 envtest\Scripts\activate.bat这时是你会发现命令行变成这个样子:
(envtest) c:\>
提示符前面的 (envtest) 就是__该环境已被激活__的标志。这样你就可以在这个虚拟环境下进行工作了。执行 pip freeze ,你会发现... 你会发现什么都没有啊。执行 pip help你可以看到 pip freeze 是**输出当前 Python 环境下已经安装的所有第三方库**。因为我们创建此环境时开启了选项 --no-site-packages意思就是在__创建此虚拟环境中不从系统 Python 中把已经安装了的库也安装到这里来__所以这里是一个干净的新 Python 环境。你可以在这里调用 pip 或者 easy_install 来安装各种你需要的库到这个环境中来而__不会影响到你系统中 Python__ 的情况,所以说它是一个**虚拟的 Python 环境**。
我们再回头看下 envtest 目录的结构,其下面的 Scripts 目录中有 python.exe pip.exe 这些程序,在虚拟环境已激活的情况下,你调用 python 或者 pip 都是调用的__此目录__中的程序。此时系统中的 python.exe 被 virtualenv 通过设置环境变量隐藏了起来。而 Lib 目录下就是存放各种**新安装的库**。
到这里你应该已经对 virtualenv 基本操作已经了解了,下面讲些使用上的注意事项:
* 调用 activate.bat __开启虚拟环境__你也可以用同目录下的 deactivate.bat 来退出该虚拟环境。
建立虚拟环境时的参数 --python=C:\Python26\python.exe 是用来指定你想使用 __Python 程序位置__所以你可以建立多个虚拟环境来指向多个 Python 版本。另外你要注意的是如果你在系统上安装了多个版本的 Python 你最先安装的一个版本会被当做主要版本,你在命令行下打 python 时,调用的就是最先安装的一个版本。其实这个是按照 PATH 中设定的路径位置来确定的,你最好把你需要主要使用的版本相关路径放在 PATH 环境变量中最前面。比如我的机器上,就是把 C:\Python26 和 C:\Python26\Scripts 作为 PATH 最前面两个。这样应该就能让保证你主要版本的正常使用。
* 当你在一个虚拟环境下工作时,假如你想在当前环境下来执行一个 Python 程序,这时你在命令行下必须执行 python foo.py ,这样 foo.py 才会在你当前已经激活的 virtualenv 下执行。作为比较如果你直接执行 foo.py 那么它仍然时在系统环境下执行的。
另外Linux 下可以使用 __virtualenvwrapper __来进行方便的管理和切换各个环境可惜的是这东西在 Windows 下用不了。但幸好有一个简单的脚本 envdotpy 来帮助你使用。把 env.py 放到 PATH 上的目录内,譬如 C:\Python26\Scripts 下。之后先打开里面的 DEFAULT_DIR_PATH 变量,把它改成你集中存放 virtualenv 的地方,在我们上面的例子中就可以把这行改为:
DEFAULT_DIR_PATH = "C:\\envs\\"
之后你就不需要专门 cd 到这个目录,而可以在任意路径上通过 env.py 来进行激活,切换,退出 virtualenv 了。例如执行: env.py envtest 就能激活 envtest 。执行 env.py -q 就能退出任意一个 virtualenv。
===== Winpdb 进行可视化调试 =====
如果你使用的 PyDev 的话那么用其自带的断点调试应该就可以了。Winpdb则是为用其他简单编辑器进行 Python 开发的用户提供一个熟悉的调试环境。Winpdb不出意料的也在 Python(x, y)当中。所以如果装上 Python(x, y) 你可以不断发掘里面附带的优秀工具。使用方法很简单,假设程序名为 foo.py那么在命令行中输入
winpdb foo.py
之后会弹出窗口,也就是一个大家都熟悉的 debug 图形界面。需要注意的是这里需要点击想要设置断点的行,点击 F9 设置断点,然后该行底色会变为红色,如下图所示。
Winpdb
===== 编码问题 =====
作为中文用户,初学 Python 最容易碰到的问题估计就是编码问题了。明明英文的都可以用到中文的时候就要出问题,而且出错信息难以理解,想要解决问题又不知道从何开始。幸运的是编码问题通过预防性的措施是很好避免的。下面从几个方面来讲讲 Python 中处理中文及 Unicode 容易碰到的问题。
==== Unicode 编码基础 ====
这里非常简单的讲一下编码知识,此部分表述可能不太准确,如果你对 Unicode 更为了解的话请联系我帮忙纠正。
你可以想象 Unicode 是一个很大的表,里面有着世界上所有的文字的个体,如英文中的字母,中文的汉字。事实上 Unicode 标准中每一个字都有一个唯一对应的__编号__好比说 '中'字 对应十六进制 0x4E2D而字母 'a' 对应的是十六进制 0x0061。这个编号是由 Unicode Consortium 这个组织来确定的。 如果说用这个编码来对应字符来用于表示字符,理论上是可以的,这样的话就是**每一个数字编号能对应一个字符**。
而实际情况中,不是每篇文章都用得到世界上所有的字符。譬如一篇英文文章就只有英文字母加上一些符号,用 Unicode 来进行存储的话每个字符要__浪费太多的空间__。所以就有__各种类型的Unicode字符集编码产生__。编码我们这里可以理解就是将一部分的 Unicode (比如说所有的中文,或者所有的日文)字符以某种方式确定另外一个符号来代表他。中文常用编码有__ UTF8 和 GBK__仍然以 '中'字 为例, UTF8 编码将对应 '中'字 的 Unicode 编号 0x4E2D 拆成三个的编号的组合[0xE4, 0xB8, 0xAD],只有这几个连在一起的时候才会被作为一个 '中'字 显示出来作为对比GBK 编码将 '中'字 对应的 Unicode 编号 0x4E2D 编码成为两个编号的组合 [0xD6, 0xD0],在 GBK 编码环境下只有这两个编号一起时,才会显示为 '中'字。
上面的例子中,如果把 UTF8 编码后的 [0xE4, 0xB8, 0xAD] 放到 GBK 环境下来显示会怎样?这几个编号跟 '中'字 在 GBK 下的编码 [0xD6, 0xD0],不同,则显然不会显示为 '中'字。这三个字符会跟排在其前后的字符一起__按照 GBK 的编码规则__找有没有对应的字符。结果有可能显示出一个毫不相关的字符有时候为符号或者干脆不显示这种情况就算产生了乱码。
==== Python 2.x 中的 String 与 Unicode ====
在 Python 2.x 中是有两种字串符相关类型的,分别为** String 和 Unicode**两者提供的接口非常类似有时候又能__自动转换__蛮容易误导人的。在 Python 3 中 这两个类型分别用 **Bytes 和 String** 替代了。这个名字更能说明两者的本质Python 2.x 中的 String 中存储的是__已经编码过的字节序列__但它并__不知道自身是用的哪种编码__。相反的 Unicode 中存储的是记载了未编码的字串信息其中存储的就是相应字符的 Unicode 编号。在这里用程序来说明,我们建立一个简单的脚本名字为 encoding.py代码如下
#!/usr/bin/python
# -*- coding: utf-8 -*-
strs = "这是中文"
unis = "这也是中文".decode("utf8")
print strs[0:2]
print unis[0:2].encode('gbk')
print len(strs)
print len(unis)
注意Windows终端支持的中文编码为gbk
前面两行后面会解释到就是限定运行环境以及该脚本__文件的编码格式__。此脚本在这里可以下载如果你要自己写的话请务必确保脚本的编码是 utf8 而不是别的。在 Windows 下的运行结果在这里,我觉得正好能说明问题:
C:\SHARED\Dev\scripts>encoding.py
杩 #strs编码格式为utf8但是用终端用gkb编码来解释故出错。
这也 #unis是Unicode编码打印时被转换为gbk编码因此显示正常。
12
5
这里需要说明我们的程序__文件中所有字符用的是 UTF8 编码__主要意义是该程序中的所有**直接写出来的字串符**(用"", ''括起来的字串符)是运用 UTF8 格式编码的;然而 Windows 下的__命令行是 GBK 环境__。这里 strs 是一个 String。事实上在 Python 2.x 中直接写在程序中的字串符,其类型都是 String(这里不考虑 string literal)。我们先直接输出 strs[0:2],得到的是一个乱码字符(这个字符只是碰巧凑成是一个字)。如上面说的String 中存储的是没有编码信息的字串序列这里就是将strs中前两个编号取出并尝试显示。由于命令行环境为 GBK 编码,这里对应的字碰巧凑成了一个字,但是跟原本的字没有任何关系。
unis 是由一个 String 调用 decode() 方法得到,这正是在 Python 2.x 中取得 Unicode 的最基本的方式。由于 String 并不知道它本身是由什么编码格式来进行的编码,这里是我们的责任来确定他原来是用哪种编码方式进行编码。我们知道代码中的编码格式是 UTF8所以我们可以用调用 String 的 decode() 方法来进行反编码,也就是解码, 把字串符从某种编码后的格式转换为其唯一对应的 Unicode 编号。unis 为解码获得的结果,其在 Python 2.x 中对应类型就是 Unicode其中存储的就是 每个字符对应的 Unicode 编号。
我们尝试输出 unis 的前两个字符,在这里我们调用了 Unicode 的 encode() 方法。这就是编码的过程。我们知道 Windows 命令行下的编码是 GBK只有采用 GBK 编码的字符才能正确显示。所以在这里我们通过调用 Unicode 的 encode() 方法,将 unis 中存储的 Unicode 编号 按照 GBK 的规则来进行编码,并输出到屏幕上。这里我们看到这里正确的显示了 unis 中的前两个字符。要注意的是在命令行中直接 print Unicode 的话 Python 会自动根据当前环境进行编码后再显示,但这样掩盖了两者的区别。建议总是手动调用 encode 和 decode 方法,这样自己也会清楚一些。
后面两者长度的差别也是佐证我们之前的例子。strs 中存储的是 UTF8 编码后的编号序列,上面看到一个中文字符在 UTF8 编码后变成三个连续的,所以 strs 长度为 3x4 = 12。你可以想象 strs 中存放的并不是中文,而是一系列没有意义的比特序列;而 unis 中存储的是对应的中文的 Unicode 编码。我们知道每一个字符对应一个编号,所以五个字对应五个编号,长度为 5。
===== 避免,和解决编码产生的问题 =====
了解了 Python Unicode 编码解码的这些概念后,我们来看看如何尽量的避免遇到让人烦心的编码问题。
首先如果你的代码中有中文那么一定要务必声明__代码文件的编码格式__。根据 PEP-0263 中的介绍,在程序的最开始加上以下两行注释就能确定编码:
#!/usr/bin/python
# -*- coding: utf-8 -*-
其中 utf-8 就是指定的编码格式。事实上你应该总是使用 UTF8 作为你 Python 程序的编码格式,因为未来的 Python 3 所有文件都将默认以 UTF8 编码。另外除了声明,你必须确定你用来编辑 Python 程序的__编辑器是不是真的以__ UTF8 编码来存储文件。
之后就是养成关于编码解码的好习惯。当你的程序有 String 作为输入时应该__尽早的将其转换为 Unicode__再在程序中进行处理。在输出的时候也要尽可能晚直到最后输出的时刻才__将 Unicode 编码为所需编码格式的 String 进行输出__。同样的你必须保持你__程序内部所有参与运算的字串都是 Unicode 格式__。很多著名的 Python 库例如 django 就是采用的这种方式,效果也蛮好。千万不要依赖 Python 自己进行两者之间的转换__也不要将 String 和 Unicode 放在一起运算__这些行为一方面十分容易引起错误另一方面在 Python 3 中已经无法再现。
虽说确定 String 的编码格式是程序员的责任但有时候你真的不知道有些字串符到底是什么编码的。这里有一个神奇__ chardet __能够帮助你。以下是摘自其页面上的例子很好了说明了它的作用读入任意一串字符猜测其编码格式并且给出猜测的确信度。
>>> import urllib
>>> urlread = lambda url: urllib.urlopen(url).read()
>>> import chardet
>>> chardet.detect(urlread("http://google.cn/"))
{'encoding': 'GB2312', 'confidence': 0.99}
>>> chardet.detect(urlread("http://yahoo.co.jp/"))
{'encoding': 'EUC-JP', 'confidence': 0.99}
>>> chardet.detect(urlread("http://amazon.co.jp/"))
{'encoding': 'SHIFT_JIS', 'confidence': 1}
>>> chardet.detect(urlread("http://pravda.ru/"))
{'encoding': 'windows-1251', 'confidence': 0.9355}
如果 confidence 非常低的话或者 chardet 直接报错,多半是字串经过多次错误编码解码,要从别的地方找办法解决问题。
在处理包含汉字的文本文件时,也有需要注意的地方。
如果上面的介绍还不能让你理解 Unicode 的概念,这里还有几篇关于这个问题的文章:
* 介绍 Unicode 的两篇文章 [1], [2]。关于 Unicode 有更为详细的解释。
* Unicode In Python, Completely Demystified 特别针对 Python 下的 Unicode 处理进行详细的讲解。
===== 其他 =====
除了上面几个重要的问题之外,剩下的资源。
Vim Python开发 相关资源
事实上我现在自己是在用 Vim 写 Python感觉也蛮不错。以下是相关资源。
UltimateVimPythonSetup 比较新的一个专门针对 Python 的 Vim 配置文件。
Vim as Python IDE 只要搜 Python 和 Vim 就一定会找到这一篇文章。
vimcolorschemetest 所有的Vim 配色方案都在集结在这里。
Python 相关 Vim 插件
pythoncomplete.vim 按上面的介绍配置一下在自动输入的时候按Ctrl-X, Ctrl-O就有很强力的自动补全了。
python.vim 加强语法的高亮。
pyflakes.vim 很棒的语法检查分析你的语法看避免低级错误。注意这个在Vim7.2下才有用, 如果是7.1则一点效果都没有...
其他相关资源
用Python做科学计算
这个把Python(x,y)里面所有的模块基本上都讲了一遍,我觉得外国人肯定都希望这个有个英文版的。
PyMOTW
这个名字看起来像个Python库(其实它还真的是一个...),但他总体来说其实是一份文档, "Python每周一个模块"。作者持续几年每周介绍一个Python标准库中的库。你可以把他看做是一个Python标准库文档的一个很棒的补充当你看标准库中的介绍看的云里雾里的时候不妨来这边找找相应的介绍。因为这里的例子给的很全而且基本上你用的到的偏门的库这里都有介绍哦。另外一个好消息是PyMOTW有一份很棒中文翻译版。
reddit.com/r/python 和 python.org planet
Python 相关的文章和资源。就我个人经历来说,每次都能在这里看到很多有用的东西。

View File

@@ -0,0 +1,117 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T15:02:52+08:00
====== Python 线程池的实现 ======
Created 星期二 05 七月 2011
import urllib2
import time
import socket
from datetime import datetime
from thread_pool import *
def main():
url_list = {"sina":"http://www.sina.com.cn",
"sohu":"http://www.sohu.com",
"yahoo":"http://www.yahoo.com",
"xiaonei":"http://www.xiaonei.com",
"qihoo":"http://www.qihoo.com",
"laohan":"http://www.laohan.org",
"eyou":"http://www.eyou.com",
"chinaren":"http://www.chinaren.com",
"douban":"http://www.douban.com",
"163":"http://www.163.com",
"daqi":"http://www.daqi.com",
"qq":"http://www.qq.com",
"baidu_1":"http://www.baidu.com/s?wd=asdfasdf",
"baidu_2":"http://www.baidu.com/s?wd=dddddddf",
"google_1":"http://www.baidu.com/s?wd=sadfas",
"google_2":"http://www.baidu.com/s?wd=sadflasd",
"hainei":"http://www.hainei.com",
"microsoft":"http://www.microsoft.com",
"wlzuojia":"http://www.wlzuojia.com"}
#使用线程池
socket.setdefaulttimeout(10)
print 'start testing'
wm = WorkerManager(50)
for url_name in url_list.keys():
wm.add_job(do_get_con, url_name, url_list[url_name])
wm.wait_for_complete()
print 'end testing'
def do_get_con(url_name,url_link):
try:
fd = urllib2.urlopen(url_link)
data = fd.read()
f_hand = open("/tmp/ttt/%s" % url_name,"w")
f_hand.write(data)
f_hand.close()
except Exception,e:
pass
if __name__ == "__main__":
main()
thread_pool的代码非原创转自http://blog.daviesliu.net/2006/10/09/234822/
import Queue, threading, sys
from threading import Thread
import time
import urllib
# working thread
class Worker(Thread):
worker_count = 0
timeout = 1
def __init__( self, workQueue, resultQueue, **kwds):
Thread.__init__( self, **kwds )
self.id = Worker.worker_count
Worker.worker_count += 1
self.setDaemon( True )
self.workQueue = workQueue
self.resultQueue = resultQueue
self.start( )
def run( self ):
''''' the get-some-work, do-some-work main loop of worker threads '''
while True:
try:
callable, args, kwds = self.workQueue.get(timeout=Worker.timeout)
res = callable(*args, **kwds)
print "worker[%2d]: %s" % (self.id, str(res) )
self.resultQueue.put( res )
#time.sleep(Worker.sleep)
except Queue.Empty:
break
except :
print 'worker[%2d]' % self.id, sys.exc_info()[:2]
raise
class WorkerManager:
def __init__( self, num_of_workers=10, timeout = 2):
self.workQueue = Queue.Queue()
self.resultQueue = Queue.Queue()
self.workers = []
self.timeout = timeout
self._recruitThreads( num_of_workers )
def _recruitThreads( self, num_of_workers ):
for i in range( num_of_workers ):
worker = Worker( self.workQueue, self.resultQueue )
self.workers.append(worker)
def wait_for_complete( self):
# ...then, wait for each of them to terminate:
while len(self.workers):
worker = self.workers.pop()
worker.join( )
if worker.isAlive() and not self.workQueue.empty():
self.workers.append( worker )
print "All jobs are are completed."
def add_job( self, callable, *args, **kwds ):
self.workQueue.put( (callable, args, kwds) )
def get_result( self, *args, **kwds ):
return self.resultQueue.get( *args, **kwds )

View File

@@ -0,0 +1,247 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-06T11:10:34+08:00
====== Python中应用shell脚本 ======
Created Friday 06 January 2012
http://rickyluo.iteye.com/blog/1327911
===== 执行命令: =====
首先介绍一个函数:
**os.system(command) #其实在python中执行shell命令的首选方式是使用subprocess模块。**
这个函数可以调用shell运行命令行command并且返回它的__返回值__。试一下在python的解释器里输入os.system(”ls -l”)就可以看到”ls”列出了当前目录下的文件。可以说通过这个函数python就拥有了shell的所有能力。呵呵。。不过通常这条命令不需要用到。因为shell常用的那些命令在python中通常有对应而且同样简洁的写法。
===== 列出文件: =====
shell中最常用的是ls命令python对应的写法是os.listdir(dirname),这个函数**返回字符串列表**,里面是所有的文件名,不过不包含”.”和”..”。如果要遍历整个目录的话就会比较复杂一点。我们等下再说吧。先在解释器里试一下:
**>>> os.listdir(”/”)**
[tmp, misc, opt, root, .autorelabel, sbin, srv, .autofsck, mnt, usr, var, etc, selinux, lib, net, lost+found, sys, media, dev, proc, boot, home, bin]
===== 复制文件: =====
对应于cp命令的是__shutil.copy(src,dest)__这个函数有两个参数参数src是指源文件的名字参数dest则是目标文件或 者目标目录的名字。 如果dest是一个目录名就会在那个目录下创建一个相同名字的文件。与shutil.copy函数相类似的是__ shutil.copy2(src,dest)__不过copy2还会复制最后存取时间和最后更新时间。
===== 复制目录: =====
不过shell的cp命令还可以复制目录python的shutil.copy却不行第一个参数只能是一个文件。这怎么办其 实python还有个__shutil.copytree(src,dst[,symlinks])__ 。参数多了一个symlinks它是一个布尔值如果是True的话就创建符号链接。
===== 移动或重命名文件: =====
移动或者重命名文件和目录呢估计被聪明的朋友猜到了__shutil.move(src,dst)__呵呵。。与mv命令类似如果src和dst在 同一个文件系统上shutil.move只是简单改一下名字如果src和dst在不同的文件系统上shutil.move会先把src复制到 dst然后删除src文件。
看到现在大多数朋友应该已经对python的能力有点眉目了接下来我就列个表介绍一下其它的函数
**os.chdir(dirname)**
把当前工作目录切换到dirname下
**os.getcwd()**
返回当前的工作目录路径
**os.chroot(dirname)**
把dirname作为进程的根目录。和*nix下的chroot命令类似
**os.chmod(path,mode)**
更改path的权限位。mode可以是以下值(使用or)的组合:
os.S_ISUID
os.S_ISGID
os.S_ISVTX
os.S_IREAD
os.S_IWRITE
os.S_IEXEC
os.S_IRWXU
os.S_IRUSR
os.S_IWUSR
os.S_IXUSR
os.S_IRWXG
os.S_IRGRP
os.S_IWGRP
os.S_IXGRP
os.S_IRWXO
os.S_IROTH
os.S_IWOTH
os.S_IXOTH
具体它们是什么含义就不仔细说了基本上就是R代表读,W代表写X代表执行权限。USR代表用户GRP代表组OTH代表其它。
**os.chown(path,uid,gid)**
改变文件的属主。uid和gid为-1的时候不改变原来的属主。
**os.link(src,dst)**
创建硬连接
**os.mkdir(path,[mode])**
创建目录。mode的意义参见os.chmod()默认是0777
__os.makedirs(path,[mode])__
和os.mkdir()类似,不过会先创建不存在的父目录。
os.readlink(path)
返回path这个符号链接所指向的路径
os.remove(path)
删除文件,不能用于删除目录
__os.rmdir(path)__
删除文件夹,不能用于删除文件
__removedirs(path)__
递归移除目录。类似于rmdir()的工作除了如果子目录被成功地删除removedirs()尝试接连地删除每个在path中提及的父目录直到一个错误被挂起这个错误被忽略因为它通常意味父目录不为空。例如"os.removedirs(foo/bar/baz')"将首先删除"'foo/bar/baz'"目录,然后如果 "'foo/bar'"和"'foo'"为空删除它们。如果子目录不能被成功地删除挂起OSError。1.5.2版本中的新方法。
rename(src, dst)
重命名文件或目录src为dst。如果dst是一个目录OSError将被挂起。在Unix上如果dst存在并且是一个文件如果用户有权限它将被**默默地删除**。在一些Unix风格的系统上如果src和dst是不同的文件系统这个操作可能失败。如果成功重命名将是一个基本的操作这是一个POSIX要求。在Windows上如果dst已经存在甚至如果它是一个文件OSError将被挂起当dst命名一个已存在的文件时没有办法执行基本的重命名。
renames(old, new)
递归重命名目录或文件函数。类似于rename()的工作除了所有中间层目录的创建第一次试图需要有效的新的路径名。重命名后目录符合老的名称最右边路径部分将被用removedirs()删除。1.5.2版本中的新方法。注意:如果你缺乏需要删除子目录或文件的权限,使用新的目录结构重命名时这个函数会失败。
readlink(path)
返回一个代表符号连接点指向的路径的字符串。结果可以是绝对或相对路径名的其中之一如果是相对它可以用__os.path.join(os.path.dirname(path)__, result)转换成一个绝对路径。可用Macintosh, Unix。
stat(path)
在给定的路径上执行stat()系统调用。返回值是一个__对象__它的属性对应stat的结构数st_mode(保护块)st_ino(索引节点数) st_dev(设备)st_nlink(硬连接数)st_uid(所有者的用户ID)st_gid(所有者的组ID),st_size(文件大小,用字节)st_atime(当前访问的时间)st_mtime(当前内容修改的时间)st_ctime(平台依赖在Unix上当前元数据改变的时间在Windows上创建的时间)
>>> import os
>>> statinfo = __os.stat__('somefile.txt')
>>> statinfo
(33188, 422511L, 769L, 1, 1032, 100, 926L, 1105022698,1105022732, 1105022732)
>>> statinfo.st_size
926L
__os.symlink(src,dst)__
创建符号链接
tempnam([dir[, prefix]])
为创建一个临时文件合理的返回一个唯一的路径名。这将是__一个绝对路径路径__以dir目录中可能的目录项命名或是一个通常的临时文件的位置如果dir被忽略或为None。如果给定和不为Noneprefix被用来给文件名提供一个简短的前缀。应用负责使用由tempnam()返回的路径恰当地创建和管理文件;**不提供自动清除**。在Unix上环境变量TMPDIR覆盖dir在Windows上TMP被使用。这个函数的指定行为依赖于C库的执行some aspects are underspecified in system documentation.注意tempnam()的用法是危险的对于符号连接攻击考虑用tmpfile()14.1.2节)替代。可用: Macintosh, Unix, Windows。
tmpnam()
为创建一个临时文件合理的返回一个唯一的路径名。这将是一个绝对路径路径以一个通常的临时文件的位置中可能的目录项命名应用负责使用由tempnam ()返回的路径恰当地创建和管理文件;**不提供自动清除**。注意tempnam()的用法是危险的对于符号连接攻击考虑用tmpfile() 14.1.2节替代。可用Macintosh, Unix, Windows。这个函数大概在Windows上不被使用不过微软tmpnam()的实现一直创建在当前驱动的根目录中创建一个名字,它通常是一个临时文件粗略的位置(依赖于特权,你甚至用这个名称不能打开一个文件)。
TMP_MAX
再使用这些名称之前tmpnam()将生成的唯一的名称的最大数目。
**unlink(path)**
删除文件path。同remove()相同unlink()名称是它的传统的Unix名称。可用Macintosh, Unix, Windows。
__utime(path, times)__
设置由path指定的文件的访问和修改时间。如果times是None那么文件的访问和修改时间被设置为当前的时间。否则__times必须是一个二元组数__它的被用来设置访问和修改时间的格式分别是atime, mtime。目录能否由path指定依赖于操作系统是否作为一个文件执行目录Windows就不是。注意你这儿设置的精确的时间不能通过后来的 stat()调用返回依赖于正式的你的操作系统纪录的访问和修改时间参见stat()。2.0版本中的改变增加times为None的支持。可用 Macintosh, Unix, Windows。
__walk__(top[, topdown=True [, onerror=None]]),它os.listdir()明显的区别就是它能进行__纵深遍历__os.listdir()只能遍历当前目录里的所有子目录和文件。
在目录树中walk()生成文件名通过由上而下或右下而上遍历树。这个树中每个目录的根在目录top包含top自身它给出一个__三元组dirpath, dirnames, filenames__。
这个例子显示开始目录下的每个目录中非目录的文件的字节数除了不查找任意CVS子目录下的
import os
from os.path import__ join__, getsize
for __root, dirs, files __in os.walk('python/Lib/email'):
print root, "consumes",
print sum(getsize(__join(root, name)) for name in files__),
print "bytes in", len(files), "non-directory files"
if 'CVS' in dirs:
__dirs.remove__('CVS') # don't visit CVS directories
下面的例子中从下到上遍历树等于rmdir()目录为空之前不允许删除目录:
# Delete everything reachable from the directory named in 'top',
# assuming there are no symbolic links.
# CAUTION: This is dangerous! For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
os.remove(os.path.join(root, name))
for name in dirs:
os.rmdir(os.path.join(root, name))
介绍了这么多其实只要查一下__os和shutil两个模块__的文档就有了呵呵。。真正编写shell脚本的时候还需要注意
===== 1.环境变量 =====
。python的环境变量保存在os.environ这个字典里可以用普通字典的方法修改它使用system启动其它程序的时候会自动被继承。比如
os.environ[”fish”]=”nothing”
不过也要注意环境变量的值只能是字符串。和shell有些不同的是python没有export环境变量这个概念。为什么没有呢因为python没有必要有:-)
===== 2.os.path =====
这个模块里包含了很多关于路径名处理的函数。在shell里路径名处理好像不是很重要但是在python里经常需要用到。最常用的两个是分离和合并目录名和文件名
**os.path.split(**path) -> (dirname,basename)
这个函数会把一个路径分离为两部分比如os.path.split(”/foo/bar.dat”)会返回(”/foo”,”bar.dat”)
**os.path.join**(dirname,basename)
这个函数会把目录名和文件名组合成一个完整的路径名比如os.path.join(”/foo”,”bar.dat”)会返回”/foo /bar.dat”。这个函数和os.path.split()刚好相反。
还有这些函数:
os.path.__commonprefix__(list)
返回list中所有path共有的最长的路径。
如:
>>> os.path.commonprefix(['/home/td','/home/td/ff','/home/td/fff'])
'/home/td'
os.path.__lexists__(path)与os.path.__exists__(path)的不同是如果有损坏的链接会返回True
os.path.basename('/foo/bar.dat')
>>>bar.dat
os.path.dirname('/foo/bar.dat')
>>>/foo
os.path.__realpath__(path) 返回path的真实路径去除符号链接
os.path.__relpath__(path[, start]) 返回一个“相对路径”当前目录或者可选的start
Return a relative filepath to path either from the current directory or from an optional start point.
如:
>>> os.path.relpath('/home/jimin','/usr/lib/')
os.path.samefile(path1, path2)
如果path1与path2是相同的文件或目录返回真
os.path.sameopenfile(fp1, fp2)
如果fp1和fp2指向的是同一个文件返回True
os.path.samestat(stat1, stat2)
如果 stat tuple stat1和stat2指向同一个文件返回真。stat tuple结构是由fstat()、lstat()、stat()产生的
os.path.__abspath__(path)
把path转成绝对路径,相当于normpath(join(os.getcwd(), path))
os.path.expanduser(path)
把path中包含的”~”和”~user”转换成用户目录
os.path.__expandvars__(path)
接受环境变量的扩展path中可以使用环境变量
如:
>>> os.path.expandvars('$PATH')
'/usr/lib64/qt-3.3/bin:/usr/kerberos/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/bin:/usr/local
/sbin:/usr/sbin:/sbin:/home/jimin/bin'
os.path.normpath(path)
去掉path中包含的”.”和”..”
os.path.__splitext__(path)
把path分离成**基本名和扩展名**。比如os.path.splitext(”/foo/bar.tar.bz2″)返回(/foo /bar.tar, .bz2)。要注意它和os.path.split()的区别
===== 3.os.path =====
模块。在os模块有一个很好用的函数叫os.stat()没有介绍因为os.path模块里包含了一组和它具有同样功能的函数但是名字更好记一点。
os.path.**exists**(path)
判断文件或者目录是否存在
os.path.isfile(path)
判断path所指向的是否是一个普通文件而不是目录
os.path.isdir(path)
判断path所指向的是否是一个目录而不是普通文件
os.path.islink(path)
判断path所指向的是否是一个符号链接
os.path.ismount(path)
判断path所指向的是否是一个挂接点(mount point)
os.path.getatime(path)
返回path所指向的文件或者目录的最后存取时间。
os.path.getmtime(path)
返回path所指向的文件或者目录的最后修改时间
os.path.getctime(path)
返回path所指向的文件的创建时间
os.path.__getsize__(path)
返回path所指向的文件的大小
===== 4. =====
应用python编写shell脚本经常要用到**os,shutil,glob(正则表达式的文件名),tempfile(临时文 件),pwd(操作/etc/passwd文件),grp(操作/etc/group文件),commands**(取得一个命令的输出)。前面两个已经基本上介绍完了,后面几个很简单,看一下文档就可以了。
===== 5.sys.argv =====
是一个列表保存了python程序的命令行参数。其中sys.argv[0]是**程序本身**的名字。
不能光说不练,接下来我们就编写一个用于复制文件的简单脚本。前两天叫我写脚本的同事有个几万个文件的目录,他想复制这些文件到其它的目录,又不能 直接复制目录本身。他试了一下”cp src/* dest/”结果报了一个**命令行太长**的错误让我帮他写一个脚本其实还可以使用find结合xargs来操作。操起python来
import sys,os.path,shutil
for f in __os.listdir__(sys.argv[1]):
__shutil.copy__(__os.path.join__(sys.argv[1],f),__sys.argv__[2])
再试一下linuxapp版里的帖子——把一个文件夹下的所有文件重命名成1000110999。可以这样写
import os.path,sys
dirname=sys.argv[1]
i=10001
for f in os.listdir(dirname):
src=os.path.join(dirname,f)
if os.path.isdir(src):
continue
os.rename(src,str(i))
i+=1

View File

@@ -0,0 +1,132 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T14:56:28+08:00
====== Python写爬虫抓站的一些技巧 ======
Created 星期二 05 七月 2011
http://blog.csdn.net/sding/article/details/6214207
1.最基本的抓站
import urllib2
content = urllib2.urlopen('http://XXXX').read()
-
2.使用代理服务器
这在某些情况下比较有用比如IP被封了或者比如IP访问的次数受到限制等等。
import urllib2
proxy_support = urllib2.ProxyHandler({'http':'http://XX.XX.XX.XX:XXXX'})
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler)
urllib2.install_opener(opener)
content = urllib2.urlopen('http://XXXX').read()
-
3.需要登录的情况
登录的情况比较麻烦我把问题拆分一下:
-
3.1 cookie的处理
import urllib2, cookielib
cookie_support= urllib2.HTTPCookieProcessor(cookielib.CookieJar())
opener = urllib2.build_opener(cookie_support, urllib2.HTTPHandler)
urllib2.install_opener(opener)
content = urllib2.urlopen('http://XXXX').read()
是的没错如果想同时用代理和cookie那就加入proxy_support然后operner改为
opener = urllib2.build_opener(proxy_support, cookie_support, urllib2.HTTPHandler)
-
3.2 表单的处理
登录必要填表,表单怎么填?首先利用工具截取所要填表的内容
比如我一般用firefox+httpfox插件来看看自己到底发送了些什么包
这个我就举个例子好了以verycd为例先找到自己发的POST请求以及POST表单项
-
可以看到verycd的话需要填username,password,continueURI,fk,login_submit这几项其中fk是随机生成的其实不太随机看上去像是把epoch时间经过简单的编码生成的需要从网页获取也就是说得先访问一次网页用正则表达式等工具截取返回数据中的fk项。continueURI顾名思义可以随便写login_submit是固定的这从源码可以看出。还有usernamepassword那就很显然了。
-
好的有了要填写的数据我们就要生成postdata
import urllib
postdata=urllib.urlencode({
'username':'XXXXX',
'password':'XXXXX',
'continueURI':'http://www.verycd.com/',
'fk':fk,
'login_submit':'登录'
})
-
然后生成http请求再发送请求
req = urllib2.Request(
url = 'http://secure.verycd.com/signin/*/http://www.verycd.com/',
data = postdata
)
result = urllib2.urlopen(req).read()
-
3.3 伪装成浏览器访问
某些网站反感爬虫的到访,于是对爬虫一律拒绝请求
这时候我们需要伪装成浏览器这可以通过修改http包中的header来实现
#…
headers = {
'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'
}
req = urllib2.Request(
url = 'http://secure.verycd.com/signin/*/http://www.verycd.com/',
data = postdata,
headers = headers
)
#...
-
3.4 反”反盗链”
某些站点有所谓的反盗链设置其实说穿了很简单就是检查你发送请求的header里面referer站点是不是他自己所以我们只需要像3.3一样把headers的referer改成该网站即可以黑幕著称地cnbeta为例
#...
headers = {
'Referer':'http://www.cnbeta.com/articles'
}
#...
headers是一个dict数据结构你可以放入任何想要的header来做一些伪装。例如有些自作聪明的网站总喜欢窥人隐私别人通过代理访问他偏偏要读取header中的X-Forwarded-For来看看人家的真实IP没话说那就直接把X-Forwarde-For改了吧可以改成随便什么好玩的东东来欺负欺负他呵呵。
-
3.5 终极绝招
有时候即使做了3.1-3.4访问还是会被据那么没办法老老实实把httpfox中看到的headers全都写上那一般也就行了。
再不行那就只能用终极绝招了selenium直接控制浏览器来进行访问只要浏览器可以做到的那么它也可以做到。类似的还有pamiewatir等等等等。
-
4.多线程并发抓取
单线程太慢的话,就需要多线程了,这里给个简单的线程池模板
这个程序只是简单地打印了1-10但是可以看出是并发地。
from threading import Thread
from Queue import Queue
from time import sleep
#q是任务队列
#NUM是并发线程总数
#JOBS是有多少任务
q = Queue()
NUM = 2
JOBS = 10
#具体的处理函数,负责处理单个任务
def do_somthing_using(arguments):
print arguments
#这个是工作进程,负责不断从队列取数据并处理
def working():
while True:
arguments = q.get()
do_somthing_using(arguments)
sleep(1)
q.task_done()
#fork NUM个线程等待队列
for i in range(NUM):
t = Thread(target=working)
t.setDaemon(True)
t.start()
#把JOBS排入队列
for i in range(JOBS):
q.put(i)
#等待所有JOBS完成
q.join()

View File

@@ -0,0 +1,250 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T20:46:47+08:00
====== Python图片处理模块 ======
Created Sunday 23 October 2011
http://2goo.info/blog/panjj/other/2011/03/30/511
平时用Python做web开发上传图片是难免的但直接拿PIL的函数来处理总感觉有点繁琐能不能封装些功能函数让web上传处理图片更简便些。看了壑塥峈的《使用PIL调整图片分辨率》得到了启发他写的模块主要是方便本地图片的批量处理所以在他原来的基础上修改了一下让它在开发web中使用。
在Django中很容易得到file控件的值比如file = request.FILES.get("photo",None),我就从这里出发开始修改把类修改成接受file参数和路径参数path然后通过传入图片尺寸的方法处理并保存图片最后返回上传图片名字列表list不多说看看Django的views用法便知
from django.conf import settings
frommyproject.common.graphicsimportGraphics
defindex(request,template_name='apptest/picprocessor.html'):
template_var=dict()
ifrequest.method=='POST':
file=request.FILES.get("photo",None)
iffile:
path=os.path.join(settings.MEDIA_ROOT,'apptest')
resizer=Graphics(file,path)
template_var["filename"]=resizer.run_cut((150,100),(300,200),(50,50),)
returnrender_to_response(template_name,template_var,context_instance=RequestContext(request))
使用该模块你仅仅需要传入上传地址绝对地址和HttpRequest.FILES初始化一下里面的Graphics类然后使用类的方法指定需要的尺寸tuple即可返回上传图片的名字。
Graphics类里有几个方法 是外部调用的分别是run_cutrun_zoom_wrun_zoom_h和run_thumbnail。
run_cut:是根据你提供的尺寸对原图片进行剪切原图片比例和你指定的尺寸比例不相等时程序会以原图中心为准放大缩小剪切成你需要的尺寸图片不会拉伸。需要传入一个或多个尺寸tuple如(150,100),(300,200),(50,50)
run_zoom_w:是根据你提供的宽度等比列缩放。方法需要传入一个或多个宽度tuple如 150,100,200,300
run_zoom_h:是根据你提供的高度等比列缩放。方法需要传入一个或多个高度tuple如 150,100,200,300
run_thumbnail:是传统的缩略图方法需要传入一个或多个尺寸tuple如(150,100),(300,200),(50,50)
这三个方法,能把原图处理成多种尺寸规格,也就是说能同时处理并上传成 多张不同尺寸的图片。图片处理的时候全部采用Image.ANTIALIAS抗锯齿的过滤属性保存的图片质量暂时定在100这些都是为了保证剪切图片的时候最大降低失真度这样出来的图片体积就稍微大些了。图片的名字组合方式uuid+"_"+w+"_"+h.jpgae5c011e-5e98-11e0-96e6-001a6bd081a2-600-400.jpg
具体的实现方法:
#coding:utf-8
"""图片上传后端处理"""
__modify__ = '2goo.info'
__email__ ='nmgkjdxjsj@gmail.com'
VERSION = "Graphics v0.1 build 2011-03-27"
import os,Image,ImageFile,uuid
class Graphics:
def __init__(self,uploadedfile,targetpath):
'''初始化参数'''
self.uploadedfile=uploadedfile
self.targetpath = targetpath
def check_folder(self):
'''检查目标文件夹是否存在,不存在则创建之'''
if not os.path.isdir(self.targetpath):
os.mkdir(self.targetpath)
return self.targetpath
def pic_info(self, img):
'''获取照片的尺寸和确定图片横竖版'''
w, h = img.size
if w>h:
return w, h, 0 #横版照片
else:
return w, h, 1 #竖版照片
def comp_ratio(self, x, y):
'''计算比例.'''
x = float(x)
y = float(y)
return float(x/y)
def pic_cut(self, image, p_w, p_h):
'''根据设定的尺寸,对指定照片进行像素调整
图形不会变形 如果指定尺寸比例和原图比例不
相等时,最大范围剪切'''
#获取指定照片的规格一般是1024,768
img = image
w, h, isVertical = self.pic_info(img)
#判断照片横竖为竖版的话对调w,h
if isVertical:
p_w, p_h = p_h, p_w
#如果照片调整比例合适,直接输出
if self.comp_ratio(p_h, p_w) == self.comp_ratio(h, w):
target = img.resize((int(p_w), int(p_h)),Image.ANTIALIAS)#hack:高保真必备!
# ANTIALIAS: a high-quality downsampling filter
# BILINEAR: linear interpolation in a 2x2 environment
# BICUBIC: cubic spline interpolation in a 4x4 environment
return target
#比例不合适就需要对照片进行计算,保证输出照片的正中位置
#算法灵感来源于ColorStrom
if self.comp_ratio(p_h, p_w) > self.comp_ratio(h, w):
#偏高照片的处理
#以高为基准先调整照片大小
#根据新高按比例设置新宽
p_w_n = p_h * self.comp_ratio(w,h)
temp_img = img.resize((int(p_w_n), int(p_h)),Image.ANTIALIAS)
#获取中间选定大小区域
c = (p_w_n - p_w)/2 #边条大小
box = (c, 0, c+p_w, p_h) #选定容器
#换成crop需要的int形参数
box = tuple(map(int, box))
target = temp_img.crop(box)
return target
else:
#偏宽的照片
#以宽为基准先调整照片大小
p_h_n = p_w * self.comp_ratio(h, w) # 根据新宽按比例设置新高
temp_img = img.resize((int(p_w), int(p_h_n)),Image.ANTIALIAS)
#获取新图像
c = (p_h_n - p_h)/2
box = (0, c, p_w, c+p_h)
box = tuple(map(int, box))
target = temp_img.crop(box)
return target
def pic_zoom_w(self, image, p_w):
'''根据设定的宽度,对指定照片进行像素缩放 图形不会变形
图形比例不变 高度根据指定的宽度等比列放大缩小'''
#获取指定照片的规格一般是1024,768
img = image
w, h, isVertical = self.pic_info(img)
p_h=p_w * self.comp_ratio(h, w)
temp_img = img.resize((int(p_w), int(p_h)),Image.ANTIALIAS)
box = (0, 0, p_w, p_h)
box = tuple(map(int, box))
target = temp_img.crop(box)
return target
def pic_zoom_h(self, image, p_h):
'''根据设定的高度,对指定照片进行像素缩放 图形不会变形
图形比例不变 宽度根据指定的高度等比列放大缩小'''
#获取指定照片的规格一般是1024,768
img = image
w, h, isVertical = self.pic_info(img)
p_w=p_h * self.comp_ratio(w, h)
temp_img = img.resize((int(p_w), int(p_h)),Image.ANTIALIAS)
box = (0, 0, p_w, p_h)
box = tuple(map(int, box))
target = temp_img.crop(box)
return target
#外部调用方法
def run_cut(self,quality=80,*args):
'''运行调整照片尺寸进程 接纳规格列表每个规格为一个tuple'''
parser = ImageFile.Parser()
for chunk in self.uploadedfile.chunks():
parser.feed(chunk)
img = parser.close()
list=[]
uuid_str=str(uuid.uuid1())
try:
for std in args:
w, h = std[0], std[1] #获取照片的规格
filename=uuid_str+"-"+str(w)+"-"+str(h)+'.jpg'
opfile = os.path.join(self.check_folder(),filename)
tempimg = self.pic_cut(img,int(w), int(h))
tempimg.save(opfile, 'jpeg',quality=quality)
list.append(filename)
return list
except:
pass
def run_zoom_w(self,*args):
'''运行图形缩放 接纳图形宽度tuple列表每个宽度为一个整数'''
parser = ImageFile.Parser()
for chunk in self.uploadedfile.chunks():
parser.feed(chunk)
img = parser.close()
list=[]
uuid_str=str(uuid.uuid1())
w, h, isVertical = self.pic_info(img)
try:
for woh in args:#获取照片的宽度
th=int(float(woh) * self.comp_ratio(h,w))
#生成唯一的图片名字
filename=uuid_str+"-"+str(woh)+"-"+str(th)+'.jpg'
#图片路径+图片名字
opfile = os.path.join(self.check_folder(),filename)
tempimg=self.pic_zoom_w(img,int(woh))
tempimg.save(opfile, 'jpeg',quality=80)
list.append(filename)
return list
except:
pass
def run_zoom_h(self,*args):
'''运行图形缩放 接纳图形高度tuple列表每个高度为一个整数'''
parser = ImageFile.Parser()
for chunk in self.uploadedfile.chunks():
parser.feed(chunk)
img = parser.close()
list=[]
uuid_str=str(uuid.uuid1())
w, h, isVertical = self.pic_info(img)
try:
for woh in args:#获取照片的高度
tw=int(float(woh) * self.comp_ratio(w,h))
filename=uuid_str+"-"+str(tw)+"-"+str(woh)+'.jpg'
opfile = os.path.join(self.check_folder(),filename)
tempimg=self.pic_zoom_h(img,int(woh))
tempimg.save(opfile, 'jpeg',quality=80)
list.append(filename)
return list
except:
pass
def run_thumbnail(self,*args):
'''传统的生成缩略图 接纳规格列表每个规格为一个tuple'''
parser = ImageFile.Parser()
for chunk in self.uploadedfile.chunks():
parser.feed(chunk)
img = parser.close()
list=[]
uuid_str=str(uuid.uuid1())
try:
for std in args:
#获取照片的规格
w, h = std[0], std[1]
#生成唯一的图片名字
filename=uuid_str+"-"+str(w)+"-"+str(h)+'.jpg'
#图片路径+图片名字
opfile = os.path.join(self.check_folder(),filename)
tempimg=img.copy()
tempimg.thumbnail((int(w), int(h)),Image.ANTIALIAS)
tempimg.save(opfile, 'jpeg',quality=80)
list.append(filename)
return list
except:
pass
有好的建议,别忘了告诉我。该模块,会慢慢完善

View File

@@ -0,0 +1,158 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T15:07:31+08:00
====== Python多线程 简明例子 ======
Created 星期二 05 七月 2011
综述
多线程是程序设计中的一个重要方面尤其是在服务器Deamon程序方面。无论何种系统线程调度的开销都比传统的进程要快得多。
Python可以方便地支持多线程。可以快速创建线程、互斥锁、信号量等等元素支持线程读写同步互斥。美中不足的是Python的运行在Python 虚拟机上创建的多线程可能是虚拟的线程需要由Python虚拟机来轮询调度这大大降低了Python多线程的可用性。希望高版本的Python可以解决这个问题发挥多CPU的最大效率。
网上有些朋友说要获得真正多CPU的好处有两种方法
1.可以创建多个进程而不是线程进程数和cpu一样多。
2.使用Jython 或 IronPython可以得到真正的多线程。
闲话少说下面看看Python如何建立线程
Python线程创建
使用threading模块的 Thread类
类接口如下
class Thread( group=None, target=None, name=None, args=(), kwargs={})
需要关注的参数是target和args. target 是需要子线程运行的目标函数args是函数的参数以tuple的形式传递。
以下代码创建一个指向函数worker 的子线程
def worker(a_tid,a_account):
...
th = threading.Thread(target=worker,args=(i,acc) ) ;
启动这个线程
th.start()
等待线程返回
threading.Thread.join(th)
或者th.join()
如果你可以对要处理的数据进行很好的划分,而且线程之间无须通信,那么你可以使用:创建=》运行=》回收的方式编写你的多线程程序。但是如果线程之间需要访问共同的对象,则需要引入互斥锁或者信号量对资源进行互斥访问。
下面讲讲如何创建互斥锁
创建锁
g_mutex = threading.Lock()
....
使用锁
for ... :
#锁定,从下一句代码到释放前互斥访问
g_mutex.acquire()
a_account.deposite(1)
#释放
g_mutex.release()
最后模拟一个公交地铁IC卡缴车费的多线程程序
有10个读卡器每个读卡器收费器每次扣除用户一块钱进入总账中每读卡器每天一共被刷10000000次。账户原有100块。所以最后的总账应该为10000100。先不使用互斥锁来进行锁定注释掉了锁定代码看看后果如何。
import time,datetime
import threading
def worker(a_tid,a_account):
global g_mutex
print "Str " , a_tid, datetime.datetime.now()
for i in range(1000000):
#g_mutex.acquire()
a_account.deposite(1)
#g_mutex.release()
print "End " , a_tid , datetime.datetime.now()
class Account:
def __init__ (self, a_base ):
self.m_amount=a_base
def deposite(self,a_amount):
self.m_amount+=a_amount
def withdraw(self,a_amount):
self.m_amount-=a_amount
if __name__ == "__main__":
global g_mutex
count = 0
dstart = datetime.datetime.now()
print "Main Thread Start At: " , dstart
#init thread_pool
thread_pool = []
#init mutex
g_mutex = threading.Lock()
# init thread items
acc = Account(100)
for i in range(10):
th = threading.Thread(target=worker,args=(i,acc) ) ;
thread_pool.append(th)
# start threads one by one
for i in range(10):
thread_pool[i].start()
#collect all threads
for i in range(10):
threading.Thread.join(thread_pool[i])
dend = datetime.datetime.now()
print "count=",acc.m_amount
print "Main Thread End at: " ,dend , " time span " , dend-dstart;
注意,先不用互斥锁进行临界段访问控制,运行结果如下:
Main Thread Start At: 2009-01-13 00:17:55.296000
Str 0 2009-01-13 00:17:55.312000
Str 1 2009-01-13 00:17:55.453000
Str 2 2009-01-13 00:17:55.484000
Str 3 2009-01-13 00:17:55.531000
Str 4 2009-01-13 00:17:55.562000
Str 5 2009-01-13 00:17:55.609000
Str 6 2009-01-13 00:17:55.640000
Str 7 2009-01-13 00:17:55.687000
Str 8 2009-01-13 00:17:55.718000
Str 9 2009-01-13 00:17:55.781000
End 0 2009-01-13 00:18:06.250000
End 1 2009-01-13 00:18:07.500000
End 4 2009-01-13 00:18:07.531000
End 2 2009-01-13 00:18:07.562000
End 3 2009-01-13 00:18:07.593000
End 9 2009-01-13 00:18:07.609000
End 7 2009-01-13 00:18:07.640000
End 8 2009-01-13 00:18:07.671000
End 5 2009-01-13 00:18:07.687000
End 6 2009-01-13 00:18:07.718000
count= 3434612
Main Thread End at: 2009-01-13 00:18:07.718000 time span 0:00:12.422000
从结果看到程序确实是多线程运行的。但是由于没有对对象Account进行互斥访问所以结果是错误的只有3434612比原预计少了很多。
把上面阴影部分代码的注释打开,运行结果如下
Main Thread Start At: 2009-01-13 00:26:12.156000
Str 0 2009-01-13 00:26:12.156000
Str 1 2009-01-13 00:26:12.390000
Str 2 2009-01-13 00:26:12.437000
Str 3 2009-01-13 00:26:12.468000
Str 4 2009-01-13 00:26:12.515000
Str 5 2009-01-13 00:26:12.562000
Str 6 2009-01-13 00:26:12.593000
Str 7 2009-01-13 00:26:12.640000
Str 8 2009-01-13 00:26:12.671000
Str 9 2009-01-13 00:26:12.718000
End 0 2009-01-13 00:27:01.781000
End 1 2009-01-13 00:27:05.890000
End 5 2009-01-13 00:27:06.046000
End 7 2009-01-13 00:27:06.078000
End 4 2009-01-13 00:27:06.109000
End 2 2009-01-13 00:27:06.140000
End 6 2009-01-13 00:27:06.156000
End 8 2009-01-13 00:27:06.187000
End 3 2009-01-13 00:27:06.203000
End 9 2009-01-13 00:27:06.234000
count= 10000100
Main Thread End at: 2009-01-13 00:27:06.234000 time span 0:00:54.078000
这次可以看到结果正确了。运行时间比不进行互斥多了很多需要花54秒才能运行我机器烂没钱更新呵呵不过这也是同步的代价没办法。

View File

@@ -0,0 +1,501 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T15:01:08+08:00
====== Python如何使用urllib2获取网络资源 ======
Created 星期二 05 七月 2011
blog.csdn.net/b2b160/archive/2009/03/27/4030702.aspx
相关文章:
你同样可以在以下文章找到获取网络资源的相关资料
Python里的例子一个基础验证相关的教程
urllib2是Python的一个获取URLs(Uniform Resource Locators)的组件。他以urlopen函数的形式提供了一个非常简单的接口
这是具有利用不同协议获取URLs的能力他同样提供了一个比较复杂的接口来处理一般情况例如基础验证cookies,代理和其他。
它们通过handlers和openers的对象提供。
urllib2支持获取不同格式的URLs(在URL的":"前定义的字串,例如:"ftp"是"ftp:python.ort/"的前缀),它们利用它们相关网络协议(例如FTP,HTTP)
进行获取。这篇教程关注最广泛的应用--HTTP。
对于简单的应用urlopen是非常容易使用的。但当你在打开HTTP的URLs时遇到错误或异常你将需要一些超文本传输协议(HTTP)的理解。
最权威的HTTP文档当然是RFC 2616(http://rfc.net/rfc2616.html)%E3%80%82%E8%BF%99%E6%98%AF%E4%B8%80%E4%B8%AA%E6%8A%80%E6%9C%AF%E6%96%87%E6%A1%A3%EF%BC%8C%E6%89%80%E4%BB%A5%E5%B9%B6%E4%B8%8D%E6%98%93%E4%BA%8E%E9%98%85%E8%AF%BB%E3%80%82%E8%BF%99%E7%AF%87HOWTO%E6%95%99%E7%A8%8B%E7%9A%84%E7%9B%AE%E7%9A%84%E6%98%AF%E5%B1%95%E7%8E%B0%E5%A6%82%E4%BD%95%E4%BD%BF%E7%94%A8urllib2,
并提供足够的HTTP细节来帮助你理解。他并不是urllib2的文档说明而是起一个辅助作用。
获取 URLs
最简单的使用urllib2将如下所示
view plaincopy to clipboardprint?
import urllib2
response = urllib2.urlopen('http://python.org/')
html = response.read()
urllib2的很多应用就是那么简单(记住,除了"http:",URL同样可以使用"ftp:","file:"等等来替代)。但这篇文章是教授HTTP的更复杂的应用。
HTTP是基于请求和应答机制的--客户端提出请求服务端提供应答。urllib2用一个Request对象来映射你提出的HTTP请求,在它最简单的使用形式中你将用你要请求的
地址创建一个Request对象通过调用urlopen并传入Request对象将返回一个相关请求response对象这个应答对象如同一个文件对象所以你可以在Response中调用.read()。
view plaincopy to clipboardprint?
import urllib2
req = urllib2.Request('http://www.voidspace.org.uk')
response = urllib2.urlopen(req)
the_page = response.read()
记得urllib2使用相同的接口处理所有的URL头。例如你可以像下面那样创建一个ftp请求。
req = urllib2.Request('ftp://example.com/')
在HTTP请求时允许你做额外的两件事。首先是你能够发送data表单数据其次你能够传送额外的关于数据或发送本身的信息("metadata")到服务器此数据作为HTTP的"headers"来发送。
接下来让我们看看这些如何发送的吧。
Data数据
有时候你希望发送一些数据到URL(通常URL与CGI[通用网关接口]脚本或其他WEB应用程序挂接)。在HTTP中,这个经常使用熟知的POST请求发送。这个通常在你提交一个HTML表单时由你的浏览器来做。
并不是所有的POSTs都来源于表单你能够使用POST提交任意的数据到你自己的程序。一般的HTML表单data需要编码成标准形式。然后做为data参数传到Request对象。编码工作使用urllib的函数而非
urllib2。
view plaincopy to clipboardprint?
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
记住有时需要别的编码(例如从HTML上传文件--看http://www.w3.org/TR/REC-html40/interact/forms.html#h-17.13 HTML Specification, Form Submission的详细说明)。
如ugoni没有传送data参数urllib2使用GET方式的请求。GET和POST请求的不同之处是POST请求通常有"副作用",它们会由于某种途径改变系统状态(例如提交成堆垃圾到你的门口)。
尽管HTTP标准说的很清楚POSTs通常会产生副作用GET请求不会产生副作用但没有什么可以阻止GET请求产生副作用同样POST请求也可能不产生副作用。Data同样可以通过在Get请求
的URL本身上面编码来传送。
可看如下例子
view plaincopy to clipboardprint?
>>> import urllib2
>>> import urllib
>>> data = {}
>>> data['name'] = 'Somebody Here'
>>> data['location'] = 'Northampton'
>>> data['language'] = 'Python'
>>> url_values = urllib.urlencode(data)
>>> print url_values
name=Somebody+Here&language=Python&location=Northampton
>>> url = 'http://www.example.com/example.cgi'
>>> full_url = url + '?' + url_values
>>> data = urllib2.open(full_url)
Headers
我们将在这里讨论特定的HTTP头来说明怎样添加headers到你的HTTP请求。
有一些站点不喜欢被程序非人为访问访问或者发送不同版本的内容到不同的浏览器。默认的urllib2把自己作为“Python-urllib/x.y”(x和y是Python主版本和次版本号,例如Python-urllib/2.5)
这个身份可能会让站点迷惑或者干脆不工作。浏览器确认自己身份是通过User-Agent头当你创建了一个请求对象你可以给他一个包含头数据的字典。下面的例子发送跟上面一样的内容但把自身
模拟成Internet Explorer。
view plaincopy to clipboardprint?
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()
response应答对象同样有两个很有用的方法。看下面的节info and geturl我们将看到当发生错误时会发生什么。
Handle Exceptions处理异常
当urlopen不能够处理一个response时产生urlError不过通常的Python APIs异常如ValueError,TypeError等也会同时产生
HTTPError是urlError的子类通常在特定HTTP URLs中产生。
URLError
通常URLError在没有网络连接(没有路由到特定服务器),或者服务器不存在的情况下产生。这种情况下,异常同样会带有"reason"属性它是一个tuple包含了一个错误号和一个错误信息。
例如
view plaincopy to clipboardprint?
>>> req = urllib2.Request('http://www.pretend_server.org')
>>> try: urllib2.urlopen(req)
>>> except URLError, e:
>>> print e.reason
>>>
(4, 'getaddrinfo failed')
HTTPError
服务器上每一个HTTP 应答对象response包含一个数字"状态码"。有时状态码指出服务器无法完成请求。默认的处理器会为你处理一部分这种应答(例如:假如response是一个"重定向",需要客户端从别的地址获取文档
urllib2将为你处理)。其他不能处理的urlopen会产生一个HTTPError。典型的错误包含"404"(页面无法找到)"403"(请求禁止),和"401"(带验证请求)。
请看RFC 2616 第十节有所有的HTTP错误码
HTTPError实例产生后会有一个整型'code'属性,是服务器发送的相关错误号。
Error Codes错误码
因为默认的处理器处理了重定向(300以外号码)并且100-299范围的号码指示成功所以你只能看到400-599的错误号码。
BaseHTTPServer.BaseHTTPRequestHandler.response是一个很有用的应答号码字典显示了RFC 2616使用的所有的应答号。这里为了方便重新展示该字典。译者略
当一个错误号产生后服务器返回一个HTTP错误号和一个错误页面。你可以使用HTTPError实例作为页面返回的应答对象response。这表示和错误属性一样它同样包含了read,geturl,和info方法。
view plaincopy to clipboardprint?
>>> req = urllib2.Request('http://www.python.org/fish.html')
>>> try:
>>> urllib2.urlopen(req)
>>> except URLError, e:
>>> print e.code
>>> print e.read()
>>>
404
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<?xml-stylesheet href="./css/ht2html.css"
type="text/css"?>
<html><head><title>Error 404: File Not Found</title>
...... etc...
Wrapping it Up包装
所以如果你想为HTTPError或URLError做准备将有两个基本的办法。我则比较喜欢第二种。
第一个:
view plaincopy to clipboardprint?
from urllib2 import Request, urlopen, URLError, HTTPError
req = Request(someurl)
try:
response = urlopen(req)
except HTTPError, e:
print 'The server couldn/'t fulfill the request.'
print 'Error code: ', e.code
except URLError, e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:
# everything is fine
注意except HTTPError 必须在第一个否则except URLError将同样接受到HTTPError。
第二个:
view plaincopy to clipboardprint?
from urllib2 import Request, urlopen, URLError
req = Request(someurl)
try:
response = urlopen(req)
except URLError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn/'t fulfill the request.'
print 'Error code: ', e.code
else:
# everything is fine
info and geturl
urlopen返回的应答对象response(或者HTTPError实例)有两个很有用的方法info()和geturl()
geturl -- 这个返回获取的真实的URL这个很有用因为urlopen(或者opener对象使用的)或许
会有重定向。获取的URL或许跟请求URL不同。
info -- 这个返回对象的字典对象该字典描述了获取的页面情况。通常是服务器发送的特定头headers。目前是httplib.HTTPMessage 实例。
经典的headers包含"Content-length""Content-type"和其他。查看Quick Reference to HTTP Headers(http://www.cs.tut.fi/~jkorpela/http.html)
获取有用的HTTP头列表以及它们的解释意义。
Openers和Handlers
当你获取一个URL你使用一个opener(一个urllib2.OpenerDirector的实例urllib2.OpenerDirector可能名字可能有点让人混淆。)正常情况下,我们
使用默认opener -- 通过urlopen,但你能够创建个性的openersOpeners使用处理器handlers所有的“繁重”工作由handlers处理。每个handlers知道
如何通过特定协议打开URLs或者如何处理URL打开时的各个方面例如HTTP重定向或者HTTP cookies。
如果你希望用特定处理器获取URLs你会想创建一个openers例如获取一个能处理cookie的opener或者获取一个不重定向的opener。
要创建一个 opener,实例化一个OpenerDirector然后调用不断调用.add_handler(some_handler_instance).
同样可以使用build_opener这是一个更加方便的函数用来创建opener对象他只需要一次函数调用。
build_opener默认添加几个处理器但提供快捷的方法来添加或更新默认处理器。
其他的处理器handlers你或许会希望处理代理验证和其他常用但有点特殊的情况。
install_opener 用来创建全局默认opener。这个表示调用urlopen将使用你安装的opener。
Opener对象有一个open方法该方法可以像urlopen函数那样直接用来获取urls通常不必调用install_opener除了为了方便。
Basic Authentication 基本验证
为了展示创建和安装一个handler我们将使用HTTPBasicAuthHandler为了更加细节的描述本主题--包含一个基础验证的工作原理。
请看Basic Authentication Tutorialhttp://www.voidspace.org.uk/python/articles/authentication.shtml
当需要基础验证时服务器发送一个header(401错误码) 请求验证。这个指定了scheme 和一个realm看起来像这样Www-authenticate: SCHEME realm="REALM".
例如
Www-authenticate: Basic realm="cPanel Users"
客户端必须使用新的请求并在请求头里包含正确的姓名和密码。这是“基础验证”为了简化这个过程我们可以创建一个HTTPBasicAuthHandler的实例并让opener使用这个
handler。
HTTPBasicAuthHandler使用一个密码管理的对象来处理URLs和realms来映射用户名和密码。如果你知道realm(从服务器发送来的头里)是什么你就能使用HTTPPasswordMgr。
通常人们不关心realm是什么。那样的话就能用方便的HTTPPasswordMgrWithDefaultRealm。这个将在你为URL指定一个默认的用户名和密码。这将在你为特定realm提供一个其他组合时
得到提供。我们通过给realm参数指定None提供给add_password来指示这种情况。
最高层次的URL是第一个要求验证的URL。你传给.add_password()更深层次的URLs将同样合适。
view plaincopy to clipboardprint?
# 创建一个密码管理者
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# 添加用户名和密码
# 如果知道 realm, 我们可以使用他代替 ``None``.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
# 创建 "opener" (OpenerDirector 实例)
opener = urllib2.build_opener(handler)
# 使用 opener 获取一个URL
opener.open(a_url)
# 安装 opener.
# 现在所有调用 urllib2.urlopen 将用我们的 opener.
urllib2.install_opener(opener)
注意以上的例子我们仅仅提供我们的HHTPBasicAuthHandler给build_opener。默认的openers有正常状况的handlers--ProxyHandler,UnknownHandler,HTTPHandler,HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler, HTTPErrorProcessor。
top_level_url 实际上可以是完整URL(包含"http:",以及主机名及可选的端口号)例如http://example.com/%EF%BC%8C%E4%B9%9F%E5%8F%AF%E4%BB%A5%E6%98%AF%E4%B8%80%E4%B8%AA%E2%80%9Cauthority”(即主机名和可选的
包含端口号)例如“example.com” or “example.com:8080”(后者包含了端口号)。权限验证,如果递交的话不能包含"用户信息"部分,例如:
“joe@password:example.com”是错误的。
Proxies代理urllib 将自动监测你的代理设置并使用他们。这个通过ProxyHandler这个在正常处理器链中的对象来处理。通常那工作的很好。但有时不起作用
。其中一个方法便是安装我们自己的代理处理器ProxyHandler并不定义代理。这个跟使用Basic Authentication 处理器很相似。
view plaincopy to clipboardprint?
>>> proxy_support = urllib.request.ProxyHandler({})
>>> opener = urllib.request.build_opener(proxy_support)
>>> urllib.request.install_opener(opener)
注意:
此时urllib.request不支持通过代理获取https地址。但这个可以通过扩展urllib.request达到目的。
Sockets and Layers
Python支持获取网络资源是分层结构。urllib 使用http.client库再调用socket库实现。
在Python2.3你可以指定socket的等待回应超时时间。这个在需要获取网页的应用程序里很有用。默认的socket模型没有超时和挂起。现在socket超时没有暴露
给http.client或者urllib.request层。但你可以给所有的sockets设置全局的超时。
view plaincopy to clipboardprint?
import socket
import urllib.request
# 以秒计算的超时时间
timeout = 10
socket.setdefaulttimeout(timeout)
# 这个调用urllib.request.urlopen 使用我们在socket模型里设置的默认超时时间。
req = urllib.request.Request('http://www.voidspace.org.uk')
response = urllib.request.urlopen(req)

View File

@@ -0,0 +1,106 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T15:13:03+08:00
====== Python字符串格式化符号 ======
Created 星期二 05 七月 2011
python字符串格式化符号
格式化字符 转换方式
%c 转换成字符(ASCII 码值,或者长度为一的字符串)
%ra 优先用 repr()函数进行字符串转换
%s 优先用 str()函数进行字符串转换
%d / %i 转成有符号十进制数
%ub 转成无符号十进制数
%ob 转成无符号八进制数
%xb/%Xb (Unsigned)转成无符号十六进制数(x/X 代表转换后的十六进制字符的大
小写)
%e/%E 转成科学计数法(e/E 控制输出 e/E)
%f/%F 转成浮点数(小数部分自然截断)
%g/%G %e 和%f/%E 和%F 的简写
%% 输出%
格式化操作符辅助指令
符号 作用
* 定义宽度或者小数点精度
- 用做左对齐
+ 在正数前面显示加号( + )
<sp> 在正数前面显示空格
# 在八进制数前面显示零('0'),在十六进制前面显示'0x'或者'0X'(取决于
用的是'x'还是'X')
0 显示的数字前面填充0而不是默认的空格
% '%%'输出一个单一的'%'
(var) 映射变量(字典参数)
m.n m 是显示的最小总宽度,n 是小数点后的位数(如果可用的话)
以下是一些使用格式字符串的例子:
十六进制输出:
>>> "%x" % 108
'6c'
>>>
>>> "%X" % 108
'6C'
>>>
>>> "%#X" % 108
'0X6C'
>>>
>>> "%#x" % 108
'0x6c'
浮点数和科学记数法形式输出:
>>>
>>> '%f' % 1234.567890
'1234.567890'
>>>
>>> '%.2f' % 1234.567890
'1234.57'
>>>
>>> '%E' % 1234.567890
'1.234568E+03'
>>>
>>> '%e' % 1234.567890
'1.234568e+03'
>>>
>>> '%g' % 1234.567890
'1234.57'
>>>
>>> '%G' % 1234.567890
'1234.57'
>>>
>>> "%e" % (1111111111111111111111L)
'1.111111e+21'
整数和字符串输出:
>>> "%+d" % 4
'+4'
>>>
>>> "%+d" % -4
'-4'
>>>
>>> "we are at %d%%" % 100
'we are at 100%'
>>>
>>> 'Your host is: %s' % 'earth'
'Your host is: earth'
>>>
>>> 'Host: %s/tPort: %d' % ('mars', 80)
'Host: mars Port: 80'
>>>
>>> num = 123
>>> 'dec: %d/oct: %#o/hex: %#X' % (num, num, num)
'dec: 123/oct: 0173/hex: 0X7B'
>>>
>>> "MM/DD/YY = %02d/%02d/%d" % (2, 15, 67)
'MM/DD/YY = 02/15/67'
>>>
>>> w, p = 'Web', 'page'
>>> 'http://xxx.yyy.zzz/%s/%s.html' % (w, p)
'http://xxx.yyy.zzz/Web/page.html'
>> from string import Template
>>> s = Template('There are ${howmany} ${lang} Quotation Symbols')
>>>
>>> print s.substitute(lang='Python', howmany=3)
There are 3 Python Quotation Symbols

View File

@@ -0,0 +1,226 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-07-05T15:04:48+08:00
====== Python异常处理体系简介 ======
Created 星期二 05 七月 2011
Python内建异常体系结构
BaseException
+-- SystemExit
+-- KeyboardInterrupt
+-- GeneratorExit
+-- Exception
+-- StopIteration
+-- StandardError
|    +-- BufferError
|    +-- ArithmeticError
|    |    +-- FloatingPointError
|    |    +-- OverflowError
|    |    +-- ZeroDivisionError
|    +-- AssertionError
|    +-- AttributeError
|    +-- EnvironmentError
|    |    +-- IOError
|    |    +-- OSError
|    |         +-- WindowsError (Windows)
|    |         +-- VMSError (VMS)
|    +-- EOFError
|    +-- ImportError
|    +-- LookupError
|    |    +-- IndexError
|    |    +-- KeyError
|    +-- MemoryError
|    +-- NameError
|    |    +-- UnboundLocalError
|    +-- ReferenceError
|    +-- RuntimeError
|    |    +-- NotImplementedError
|    +-- SyntaxError
|    |    +-- IndentationError
|    |         +-- TabError
|    +-- SystemError
|    +-- TypeError
|    +-- ValueError
|         +-- UnicodeError
|              +-- UnicodeDecodeError
|              +-- UnicodeEncodeError
|              +-- UnicodeTranslateError
+-- Warning
+-- DeprecationWarning
+-- PendingDeprecationWarning
+-- RuntimeWarning
+-- SyntaxWarning
+-- UserWarning
+-- FutureWarning
+-- ImportWarning
+-- UnicodeWarning
+-- BytesWarning
捕获异常的方式
方法一:捕获所有的异常
 ''' 捕获异常的第一种方式,捕获所有的异常 '''
try:
a = b
b = c
except Exception,data:
print Exception,":",data
'''输出type 'exceptions.Exception' : local variable 'b' 
referenced before assignment ''
方法二采用traceback模块查看异常需要导入traceback模块
''' 捕获异常的第二种方式使用traceback查看异常 '''
try:
a = b
b = c
except:
print traceback.print_exc()
'''输出 Traceback (most recent call last):
File "test.py", line 20, in main
a = b
UnboundLocalError: local variable 'b' referenced before assignmen
方法三采用sys模块回溯最后的异常
''' 捕获异常的第三种方式使用sys模块捕获异常 '''
try:
a = b
b = c
except:
info = sys.exc_info()
print info
print info[0]
print info[1]
'''输出:
(type 'exceptions.UnboundLocalError', UnboundLocalError("local 
variable 'b' referenced before assignment",),
traceback object at 0x00D243F0)
type 'exceptions.UnboundLocalError'
local variable 'b' referenced before assignment
'''
Python异常体系介绍
  Python的异常处理可以向用户准确反馈出错信息所有异常都是基类Exception的子类。自定义异常都是从基类Exception中继承。Python自动将所有内建的异常放到内建命名空间中所以程序不必导入exceptions模块即可使用异常。
  可以使用的语句结构形式:
  方式一使用try,except语句来捕获异常,可以有无数个except语句来处理异常如果所有except语句都没捕获到,则抛出异常到调用此方法的函数内处理,直到系统的主函数来处理。
  使用except子句需要注意的事情就是多个except子句截获异常时如果各个异常类之间具有继承关系则子类应该写在前面否则父类将会直接截获子类异常。放在后面的子类异常也就不会执行到了。
try:
block
except [excpetion,[data...]]:
block
except [excpetion,[data...]]:
block
except [excpetion,[data...]]:
block
  方式二当没有异常发生的时候执行else语句
try:
block
except  [excpetion,[data...]]:
block
else:
block
  方式三finally 语句,不管有没有发生异常都将执行finally语句块
  例如我们在python中打开一个文件进行读写操作我在操作过程中不管是否出现异常最终都是要把该文件关闭的。
try:
block
finally:
block
  方式四try,except,finally
try:
block
except:
block
finally:
block
  引发异常
  raise [exception[,data]]
  在Python中要想引发异常最简单的形式就是输入关键字raise后跟要引发的异常的名称。
  异常名称标识出具体的类Python异常是那些类的对象。执行raise语句时Python会创建指定的异常类的一个对象。
  raise语句还可指定对异常对象进行初始化的参数。为此请在异常类的名称后添加一个逗号以及指定的参数或者由参数构成的一个元组
  例:
try:
raise MyError #自己抛出一个异常
except MyError:
print 'a error'
raise ValueError,invalid argument
  捕捉到的内容为:
type  = VauleError
message = invalid argument
  异常处理的一些其它用途
  除了处理实际的错误条件之外,对于异常还有许多其它的用处。在标准 Python 库中一个普通的用法就是试着导入一个模块,然后检查是否它能使用。
  导入一个并不存在的模块将引发一个 ImportError 异常。你可以使用这种方法来定义多级别的功能――依靠在运行时哪个模块是有效的,或支持多种平台 (即平台特定代码被分离到不同的模块中)。
  你也能通过创建一个从内置的 Exception 类继承的类定义你自己的异常,然后使用 raise 命令引发你的异常。如果你对此感兴趣,请看进一步阅读的部分。
  下面的例子演示了如何使用异常支持特定平台功能。代码来自 getpass 模块,一个从用户获得口令的封装模块。获得口令在 UNIX、Windows 和 Mac OS 平台上的实现是不同的,但是这个代码封装了所有的不同之处。
  例支持特定平台功能
# Bind the name getpass to the appropriate function
try:
import termios, TERMIOS                    
except ImportError:
try:
import msvcrt                          
except ImportError:
try:
from EasyDialogs import AskPassword
except ImportError:
getpass = default_getpass          
else:                                  
getpass = AskPassword
else:
getpass = win_getpass
else:
getpass = unix_getpass
  termios 是 UNIX 独有的一个模块,它提供了对于输入终端的底层控制。
  如果这个模块无效 (因为它不在你的系统上,或你的系统不支持它)则导入失败Python 引发我们捕捉的 ImportError 异常。
  OK我们没有 termios所以让我们试试 msvcrt它是 Windows 独有的一个模块,可以提供在 Microsoft Visual C++ 运行服务中的许多有用的函数的一个API。如果导入失败
  Python 会引发我们捕捉的 ImportError 异常。
  如果前两个不能工作,我们试着从 EasyDialogs 导入一个函数,它是 Mac OS 独有的一个模块提供了各种各样类型的弹出对话框。再一次如果导入失败Python 会引发一个我们捕捉的 ImportError 异常。
   这些平台特定的模块没有一个有效 (有可能,因为 Python 已经移植到了许多不同的平台上了),所以我们需要回头使用一个缺省口令输入函数 (这个函数定义在 getpass 模块中的别的地方)。注意我们在这里所做的:我们将函数 default_getpass 赋给变量 getpass。如果你读了官方 getpass 文档,它会告诉你 getpass 模块定义了一个 getpass 函数。它是这样做的:通过绑定 getpass 到正确的函数来适应你的平台。然后当你调用 getpass 函数时,你实际上调用了平台特定的函数,是这段代码已经为你设置好的。你不需要知道或关心你的代码正运行在何种平台上;只要调用 getpass则它总能正确处理。
  一个 try...except 块可以有一条 else 子句,就像 if 语句。如果在 try 块中没有异常引发,然后 else 子句被执行。在本例中,那就意味着如果 from EasyDialogs import AskPassword 导入可工作,所以我们应该绑定 getpass 到 AskPassword 函数。其它每个 try...except 块有着相似的 else 子句,当我们发现一个 import 可用时,就绑定 getpass 到适合的函数。
  自定义异常类继承Exception类及其子类
class MyError( ArithmeticError ):
pass
class MyError2 ( Exception ):
pass

View File

@@ -0,0 +1,48 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T20:45:40+08:00
====== Python点滴拾遗-生成唯一码UUID的使用 ======
Created Sunday 23 October 2011
在C#中很容易生成一组唯一码最常用的是结构体GUID的NewGuid()实例。如果C#运行Guid.NewGuid();将会得到据说世界唯一的号码形如887687be-00cf-4dca-8fe4-7c4fc19b9ecc 。最近看了一下Python的相关模块也发现了一个模块uuid。当然里面的类和函数也不亚于C#。详细的东西就不一一介绍了有兴趣的朋友搜索uuid或者参考官方文档吧。以下只是照搬官方文档的典型案例用中文说明一下
>>> import uuid
# 生成基于计算机主机ID和当前时间的UUID
>>> uuid.uuid1()
输出结果:
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')
# 基于命名空间和一个字符的MD5加密的UUID
>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org')
输出结果:
UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')
# 随机生成一个UUID
>>> uuid.uuid4()
输出结果:
UUID('16fd2706-8baf-433b-82eb-8c7fada847da')
# 基于命名空间和一个字符的SHA-1加密的UUID
>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org')
输出结果:
UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')
# make a UUID from a string of hex digits (braces and hyphens ignored)
#根据十六进制字符生成UUID英语好的请看上面的原话
>>> x = uuid.UUID('{00010203-0405-0607-0809-0a0b0c0d0e0f}')
# convert a UUID to a string of hex digits in standard formUUID
#转换成十六进制的UUID表现字符英语好的请看上面的原话
>>> str(x)
输出结果:
'00010203-0405-0607-0809-0a0b0c0d0e0f'
# 获取原始UUID的16位字符
>>> x.bytes
输出结果:
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f'
#生成16位字符的UUID
>>> uuid.UUID(bytes=x.bytes)
输出结果:
UUID('00010203-0405-0607-0809-0a0b0c0d0e0f')

View File

@@ -0,0 +1,38 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-19T16:37:55+08:00
====== RSS ======
Created Wednesday 19 October 2011
代码发芽网做RSS的两个类Django做RSS真方便
http://fayaa.com/code/view/149/
#coding=utf-8
#参照: http://www.djangoproject.com/documentation/syndication_feeds/
from django.contrib.syndication.feeds import Feed
from fayaa.coding.models import Codee, CodeeComment
class LatestCodees(Feed):
title = "代码发芽网最新代码"
__link__ = "/code/feeds/codees/" #本feed对应的地址
description = "来自代码发芽网( http://www.fayaa.com/code/ )的最新代码"
def items(self):
return Codee.objects.order_by('-create_time')[:15]
def__ item_link(__self, item): #每个条目的URL地址
return "http://www.fayaa.com/code/view/%d/" % item.id
class LatestComments(Feed):
title = "代码发芽网最新评论"
link = "/code/feeds/comments/"
description = "来自代码发芽网(http://www.fayaa.com/code/)%E7%9A%84%E6%9C%80%E6%96%B0%E8%AF%84%E8%AE%BA"
def items(self):
return CodeeComment.objects.order_by('-create_time')[:15]
def item_link(self, item):
return "http://www.fayaa.com/code/view/%d/" % item.codee.id

View File

@@ -0,0 +1,553 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-21T21:07:01+08:00
====== Request and response objects ======
Created Friday 21 October 2011
===== Quick overview =====
Django uses request and response objects to pass state through the system.
When a page is requested, Django creates an **HttpRequest **object that contains //metadata //about the request. Then Django loads the appropriate **view**, passing the **HttpRequest** as the first argument to the view function. Each view is responsible for returning an **HttpResponse** object.
This document explains the APIs for HttpRequest and HttpResponse objects.
===== HttpRequest objects =====
==== class ====
HttpRequest
==== Attributes ====
All attributes except **session** should be considered //read-only.//
**HttpRequest.path**
A string representing the //full path// to the //requested page//, not including the domain.
Example: "/music/bands/the_beatles/"
**HttpRequest.path_info**
Under some Web server configurations, the portion of the URL after the host name is split up into a **script prefix portion and a path info portion** (this happens, for example, when using the **django.root** option with the modpython handler from Apache). The path_info attribute always contains the path info portion of the path, no matter what Web server is being used. Using this instead of attr:~HttpRequest.path can make your code much easier to move between test and deployment servers.
<Location "/mysite/">
SetHandler python-program
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
PythonOption //django.root /mysite//
PythonDebug On
</Location>
For example, if the django.root for your application is set to "/minfo", then path might be "/minfo/music/bands/the_beatles/" and path_info would be "/music/bands/the_beatles/".
HttpRequest.method
A string representing the HTTP method used in the request. This is guaranteed to be uppercase. Example:
if request.method == 'GET':
do_something()
elif request.method == 'POST':
do_something_else()
HttpRequest.encoding
A string representing the current encoding used to decode form submission data (or None, which means the DEFAULT_CHARSET setting is used). You can write to this attribute to change the encoding used when accessing the form data. Any subsequent attribute accesses (such as reading from GET or POST) will use the new encoding value. Useful if you know the form data is not in the DEFAULT_CHARSET encoding.
HttpRequest.GET
A dictionary-like object containing all given HTTP GET parameters. See the QueryDict documentation below.
HttpRequest.POST
A dictionary-like object containing all given HTTP POST parameters. See the QueryDict documentation below.
It's possible that a request can come in via POST with an empty POST dictionary -- if, say, a form is requested via the POST HTTP method but does not include form data. Therefore, you shouldn't use if request.POST to check for use of the POST method; instead, use if request.method == "POST" (see above).
Note: POST does not include file-upload information. See FILES.
HttpRequest.REQUEST
For convenience, a dictionary-like object that searches POST first, then GET. Inspired by PHP's $_REQUEST.
For example, if GET = {"name": "john"} and POST = {"age": '34'}, REQUEST["name"] would be "john", and REQUEST["age"] would be "34".
It's strongly suggested that you use GET and POST instead of REQUEST, because the former are more explicit.
HttpRequest.COOKIES
A standard Python dictionary containing all cookies. Keys and values are strings.
HttpRequest.FILES
A dictionary-like object containing all uploaded files. Each key in FILES is the name from the <input type="file" name="" />. Each value in FILES is an UploadedFile as described below.
See Managing files for more information.
Note that FILES will only contain data if the request method was POST and the <form> that posted to the request had enctype="multipart/form-data". Otherwise, FILES will be a blank dictionary-like object.
HttpRequest.META
A standard Python dictionary containing all available HTTP headers. Available headers depend on the client and server, but here are some examples:
CONTENT_LENGTH -- the length of the request body (as a string).
CONTENT_TYPE -- the MIME type of the request body.
HTTP_ACCEPT_ENCODING -- Acceptable encodings for the response.
HTTP_ACCEPT_LANGUAGE -- Acceptable languages for the response.
HTTP_HOST -- The HTTP Host header sent by the client.
HTTP_REFERER -- The referring page, if any.
HTTP_USER_AGENT -- The client's user-agent string.
QUERY_STRING -- The query string, as a single (unparsed) string.
REMOTE_ADDR -- The IP address of the client.
REMOTE_HOST -- The hostname of the client.
REMOTE_USER -- The user authenticated by the Web server, if any.
REQUEST_METHOD -- A string such as "GET" or "POST".
SERVER_NAME -- The hostname of the server.
SERVER_PORT -- The port of the server (as a string).
With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name. So, for example, a header called X-Bender would be mapped to the META key HTTP_X_BENDER.
HttpRequest.user
A django.contrib.auth.models.User object representing the currently logged-in user. If the user isn't currently logged in, user will be set to an instance of django.contrib.auth.models.AnonymousUser. You can tell them apart with is_authenticated(), like so:
if request.user.is_authenticated():
# Do something for logged-in users.
else:
# Do something for anonymous users.
user is only available if your Django installation has the AuthenticationMiddleware activated. For more, see User authentication in Django.
HttpRequest.session
A readable-and-writable, dictionary-like object that represents the current session. This is only available if your Django installation has session support activated. See the session documentation for full details.
HttpRequest.raw_post_data
The raw HTTP POST data as a byte string. This is useful for processing data in different formats than of conventional HTML forms: binary images, XML payload etc. For processing form data use HttpRequest.POST.
New in Django 1.3: Please, see the release notes
You can also read from an HttpRequest using file-like interface. See HttpRequest.read().
HttpRequest.urlconf
Not defined by Django itself, but will be read if other code (e.g., a custom middleware class) sets it. When present, this will be used as the root URLconf for the current request, overriding the ROOT_URLCONF setting. See How Django processes a request for details.
Methods
HttpRequest.get_host()
Returns the originating host of the request using information from the HTTP_X_FORWARDED_HOST (if enabled in the settings) and HTTP_HOST headers (in that order). If they don't provide a value, the method uses a combination of SERVER_NAME and SERVER_PORT as detailed in PEP 3333.
Example: "127.0.0.1:8000"
Note
The get_host() method fails when the host is behind multiple proxies. One solution is to use middleware to rewrite the proxy headers, as in the following example:
class MultipleProxyMiddleware(object):
FORWARDED_FOR_FIELDS = [
'HTTP_X_FORWARDED_FOR',
'HTTP_X_FORWARDED_HOST',
'HTTP_X_FORWARDED_SERVER',
]
def process_request(self, request):
"""
Rewrites the proxy headers so that only the most
recent proxy is used.
"""
for field in self.FORWARDED_FOR_FIELDS:
if field in request.META:
if ',' in request.META[field]:
parts = request.META[field].split(',')
request.META[field] = parts[-1].strip()
HttpRequest.get_full_path()
Returns the path, plus an appended query string, if applicable.
Example: "/music/bands/the_beatles/?print=true"
HttpRequest.build_absolute_uri(location)
Returns the absolute URI form of location. If no location is provided, the location will be set to request.get_full_path().
If the location is already an absolute URI, it will not be altered. Otherwise the absolute URI is built using the server variables available in this request.
Example: "http://example.com/music/bands/the_beatles/?print=true"
HttpRequest.get_signed_cookie(key, default=RAISE_ERROR, salt='', max_age=None)
New in Django Development version.
Returns a cookie value for a signed cookie, or raises a BadSignature exception if the signature is no longer valid. If you provide the default argument the exception will be suppressed and that default value will be returned instead.
The optional salt argument can be used to provide extra protection against brute force attacks on your secret key. If supplied, the max_age argument will be checked against the signed timestamp attached to the cookie value to ensure the cookie is not older than max_age seconds.
For example:
>>> request.get_signed_cookie('name')
'Tony'
>>> request.get_signed_cookie('name', salt='name-salt')
'Tony' # assuming cookie was set using the same salt
>>> request.get_signed_cookie('non-existing-cookie')
...
KeyError: 'non-existing-cookie'
>>> request.get_signed_cookie('non-existing-cookie', False)
False
>>> request.get_signed_cookie('cookie-that-was-tampered-with')
...
BadSignature: ...
>>> request.get_signed_cookie('name', max_age=60)
...
SignatureExpired: Signature age 1677.3839159 > 60 seconds
>>> request.get_signed_cookie('name', False, max_age=60)
False
See cryptographic signing for more information.
HttpRequest.is_secure()
Returns True if the request is secure; that is, if it was made with HTTPS.
HttpRequest.is_ajax()
Returns True if the request was made via an XMLHttpRequest, by checking the HTTP_X_REQUESTED_WITH header for the string 'XMLHttpRequest'. Most modern JavaScript libraries send this header. If you write your own XMLHttpRequest call (on the browser side), you'll have to set this header manually if you want is_ajax() to work.
HttpRequest.read(size=None)
HttpRequest.readline()
HttpRequest.readlines()
HttpRequest.xreadlines()
HttpRequest.__iter__()
New in Django 1.3: Please, see the release notes
Methods implementing a file-like interface for reading from an HttpRequest instance. This makes it possible to consume an incoming request in a streaming fashion. A common use-case would be to process a big XML payload with iterative parser without constructing a whole XML tree in memory.
Given this standard interface, an HttpRequest instance can be passed directly to an XML parser such as ElementTree:
import xml.etree.ElementTree as ET
for element in ET.iterparse(request):
process(element)
UploadedFile objects
class UploadedFile
Attributes
UploadedFile.name
The name of the uploaded file.
UploadedFile.size
The size, in bytes, of the uploaded file.
Methods
UploadedFile.chunks(chunk_size=None)
Returns a generator that yields sequential chunks of data.
UploadedFile.read(num_bytes=None)
Read a number of bytes from the file.
QueryDict objects
class QueryDict
In an HttpRequest object, the GET and POST attributes are instances of django.http.QueryDict. QueryDict is a dictionary-like class customized to deal with multiple values for the same key. This is necessary because some HTML form elements, notably <select multiple="multiple">, pass multiple values for the same key.
QueryDict instances are immutable, unless you create a copy() of them. That means you can't change attributes of request.POST and request.GET directly.
Methods
QueryDict implements all the standard dictionary methods, because it's a subclass of dictionary. Exceptions are outlined here:
QueryDict.__getitem__(key)
Returns the value for the given key. If the key has more than one value, __getitem__() returns the last value. Raises django.utils.datastructures.MultiValueDictKeyError if the key does not exist. (This is a subclass of Python's standard KeyError, so you can stick to catching KeyError.)
QueryDict.__setitem__(key, value)
Sets the given key to [value] (a Python list whose single element is value). Note that this, as other dictionary functions that have side effects, can only be called on a mutable QueryDict (one that was created via copy()).
QueryDict.__contains__(key)
Returns True if the given key is set. This lets you do, e.g., if "foo" in request.GET.
QueryDict.get(key, default)
Uses the same logic as __getitem__() above, with a hook for returning a default value if the key doesn't exist.
QueryDict.setdefault(key, default)
Just like the standard dictionary setdefault() method, except it uses __setitem__() internally.
QueryDict.update(other_dict)
Takes either a QueryDict or standard dictionary. Just like the standard dictionary update() method, except it appends to the current dictionary items rather than replacing them. For example:
>>> q = QueryDict('a=1')
>>> q = q.copy() # to make it mutable
>>> q.update({'a': '2'})
>>> q.getlist('a')
[u'1', u'2']
>>> q['a'] # returns the last
[u'2']
QueryDict.items()
Just like the standard dictionary items() method, except this uses the same last-value logic as __getitem__(). For example:
>>> q = QueryDict('a=1&a=2&a=3')
>>> q.items()
[(u'a', u'3')]
QueryDict.iteritems()
Just like the standard dictionary iteritems() method. Like QueryDict.items() this uses the same last-value logic as QueryDict.__getitem__().
QueryDict.iterlists()
Like QueryDict.iteritems() except it includes all values, as a list, for each member of the dictionary.
QueryDict.values()
Just like the standard dictionary values() method, except this uses the same last-value logic as __getitem__(). For example:
>>> q = QueryDict('a=1&a=2&a=3')
>>> q.values()
[u'3']
QueryDict.itervalues()
Just like QueryDict.values(), except an iterator.
In addition, QueryDict has the following methods:
QueryDict.copy()
Returns a copy of the object, using copy.deepcopy() from the Python standard library. The copy will be mutable -- that is, you can change its values.
QueryDict.getlist(key, default)
Returns the data with the requested key, as a Python list. Returns an empty list if the key doesn't exist and no default value was provided. It's guaranteed to return a list of some sort unless the default value was no list.
Changed in Django Development version: The default parameter was added.
QueryDict.setlist(key, list_)
Sets the given key to list_ (unlike __setitem__()).
QueryDict.appendlist(key, item)
Appends an item to the internal list associated with key.
QueryDict.setlistdefault(key, default_list)
Just like setdefault, except it takes a list of values instead of a single value.
QueryDict.lists()
Like items(), except it includes all values, as a list, for each member of the dictionary. For example:
>>> q = QueryDict('a=1&a=2&a=3')
>>> q.lists()
[(u'a', [u'1', u'2', u'3'])]
QueryDict.dict()
New in Django Development version.
Returns dict representation of QueryDict. For every (key, list) pair in QueryDict, dict will have (key, item), where item is one element of the list, using same logic as QueryDict.__getitem__():
>>> q = QueryDict('a=1&a=3&a=5')
>>> q.dict()
{u'a': u'5'}
QueryDict.urlencode([safe])
Returns a string of the data in query-string format. Example:
>>> q = QueryDict('a=2&b=3&b=5')
>>> q.urlencode()
'a=2&b=3&b=5'
Changed in Django 1.3: The safe parameter was added.
Optionally, urlencode can be passed characters which do not require encoding. For example:
>>> q = QueryDict('', mutable=True)
>>> q['next'] = '/a&b/'
>>> q.urlencode(safe='/')
'next=/a%26b/'
HttpResponse objects
class HttpResponse
In contrast to HttpRequest objects, which are created automatically by Django, HttpResponse objects are your responsibility. Each view you write is responsible for instantiating, populating and returning an HttpResponse.
The HttpResponse class lives in the django.http module.
Usage
Passing strings
Typical usage is to pass the contents of the page, as a string, to the HttpResponse constructor:
>>> response = HttpResponse("Here's the text of the Web page.")
>>> response = HttpResponse("Text only, please.", mimetype="text/plain")
But if you want to add content incrementally, you can use response as a file-like object:
>>> response = HttpResponse()
>>> response.write("<p>Here's the text of the Web page.</p>")
>>> response.write("<p>Here's another paragraph.</p>")
Passing iterators
Finally, you can pass HttpResponse an iterator rather than passing it hard-coded strings. If you use this technique, follow these guidelines:
The iterator should return strings.
If an HttpResponse has been initialized with an iterator as its content, you can't use the HttpResponse instance as a file-like object. Doing so will raise Exception.
Setting headers
To set or remove a header in your response, treat it like a dictionary:
>>> response = HttpResponse()
>>> response['Cache-Control'] = 'no-cache'
>>> del response['Cache-Control']
Note that unlike a dictionary, del doesn't raise KeyError if the header doesn't exist.
HTTP headers cannot contain newlines. An attempt to set a header containing a newline character (CR or LF) will raise BadHeaderError
Telling the browser to treat the response as a file attachment
To tell the browser to treat the response as a file attachment, use the mimetype argument and set the Content-Disposition header. For example, this is how you might return a Microsoft Excel spreadsheet:
>>> response = HttpResponse(my_data, mimetype='application/vnd.ms-excel')
>>> response['Content-Disposition'] = 'attachment; filename=foo.xls'
There's nothing Django-specific about the Content-Disposition header, but it's easy to forget the syntax, so we've included it here.
Attributes
HttpResponse.content
A string representing the content, encoded from a Unicode object if necessary.
HttpResponse.status_code
The HTTP Status code for the response.
Methods
HttpResponse.__init__(content='', mimetype=None, status=200, content_type=DEFAULT_CONTENT_TYPE)
Instantiates an HttpResponse object with the given page content (a string) and MIME type. The DEFAULT_CONTENT_TYPE is 'text/html'.
content should be an iterator or a string. If it's an iterator, it should return strings, and those strings will be joined together to form the content of the response. If it is not an iterator or a string, it will be converted to a string when accessed.
status is the HTTP Status code for the response.
content_type is an alias for mimetype. Historically, this parameter was only called mimetype, but since this is actually the value included in the HTTP Content-Type header, it can also include the character set encoding, which makes it more than just a MIME type specification. If mimetype is specified (not None), that value is used. Otherwise, content_type is used. If neither is given, the DEFAULT_CONTENT_TYPE setting is used.
HttpResponse.__setitem__(header, value)
Sets the given header name to the given value. Both header and value should be strings.
HttpResponse.__delitem__(header)
Deletes the header with the given name. Fails silently if the header doesn't exist. Case-insensitive.
HttpResponse.__getitem__(header)
Returns the value for the given header name. Case-insensitive.
HttpResponse.has_header(header)
Returns True or False based on a case-insensitive check for a header with the given name.
HttpResponse.set_cookie(key, value='', max_age=None, expires=None, path='/', domain=None, secure=None, httponly=False)
Changed in Django 1.3: Please, see the release notes
The possibility of specifying a datetime.datetime object in expires, and the auto-calculation of max_age in such case was added. The httponly argument was also added.
Sets a cookie. The parameters are the same as in the Cookie.Morsel object in the Python standard library.
max_age should be a number of seconds, or None (default) if the cookie should last only as long as the client's browser session. If expires is not specified, it will be calculated.
expires should either be a string in the format "Wdy, DD-Mon-YY HH:MM:SS GMT" or a datetime.datetime object in UTC. If expires is a datetime object, the max_age will be calculated.
Use domain if you want to set a cross-domain cookie. For example, domain=".lawrence.com" will set a cookie that is readable by the domains www.lawrence.com, blogs.lawrence.com and calendars.lawrence.com. Otherwise, a cookie will only be readable by the domain that set it.
Use httponly=True if you want to prevent client-side JavaScript from having access to the cookie.
HTTPOnly is a flag included in a Set-Cookie HTTP response header. It is not part of the RFC 2109 standard for cookies, and it isn't honored consistently by all browsers. However, when it is honored, it can be a useful way to mitigate the risk of client side script accessing the protected cookie data.
HttpResponse.set_signed_cookie(key, value='', salt='', max_age=None, expires=None, path='/', domain=None, secure=None, httponly=False)
New in Django Development version.
Like set_cookie(), but cryptographic signing the cookie before setting it. Use in conjunction with HttpRequest.get_signed_cookie(). You can use the optional salt argument for added key strength, but you will need to remember to pass it to the corresponding HttpRequest.get_signed_cookie() call.
HttpResponse.delete_cookie(key, path='/', domain=None)
Deletes the cookie with the given key. Fails silently if the key doesn't exist.
Due to the way cookies work, path and domain should be the same values you used in set_cookie() -- otherwise the cookie may not be deleted.
HttpResponse.write(content)
This method makes an HttpResponse instance a file-like object.
HttpResponse.flush()
This method makes an HttpResponse instance a file-like object.
HttpResponse.tell()
This method makes an HttpResponse instance a file-like object.
HttpResponse subclasses
Django includes a number of HttpResponse subclasses that handle different types of HTTP responses. Like HttpResponse, these subclasses live in django.http.
class HttpResponseRedirect
The constructor takes a single argument -- the path to redirect to. This can be a fully qualified URL (e.g. 'http://www.yahoo.com/search/') or an absolute path with no domain (e.g. '/search/'). Note that this returns an HTTP status code 302.
class HttpResponsePermanentRedirect
Like HttpResponseRedirect, but it returns a permanent redirect (HTTP status code 301) instead of a "found" redirect (status code 302).
class HttpResponseNotModified
The constructor doesn't take any arguments. Use this to designate that a page hasn't been modified since the user's last request (status code 304).
class HttpResponseBadRequest
Acts just like HttpResponse but uses a 400 status code.
class HttpResponseNotFound
Acts just like HttpResponse but uses a 404 status code.
class HttpResponseForbidden
Acts just like HttpResponse but uses a 403 status code.
class HttpResponseNotAllowed
Like HttpResponse, but uses a 405 status code. Takes a single, required argument: a list of permitted methods (e.g. ['GET', 'POST']).
class HttpResponseGone
Acts just like HttpResponse but uses a 410 status code.
class HttpResponseServerError
Acts just like HttpResponse but uses a 500 status code.
Note
If a custom subclass of HttpResponse implements a render method, Django will treat it as emulating a SimpleTemplateResponse, and the render method must itself return a valid response object.

View File

@@ -0,0 +1,145 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-02T19:02:35+08:00
====== The Python Tutorial ======
Created Monday 02 January 2012
http://docs.python.org/tutorial/index.html
Release: 2.7
Date: January 02, 2012
Python is an easy to learn, powerful programming language. It has efficient** high-level data structures** and a simple but effective approach to **object-oriented programming**. Pythons elegant syntax and __dynamic typing__, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.
The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python Web site, http://www.python.org/, and may be freely distributed. The same site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation.
The Python interpreter is** easily extended with new functions and data types **implemented in C or C++ (or other languages callable from C). Python is also suitable as an extension language for customizable applications.
This tutorial introduces the reader informally to the basic concepts and features of the Python language and system. It helps to have a Python interpreter handy for hands-on experience, but all examples are self-contained, so the tutorial can be read off-line as well.
For a description of standard objects and modules, see **The Python Standard Library**. **The Python Language Reference** gives a more formal definition of the language. To write extensions in C or C++, read Extending and Embedding the Python Interpreter and **Python/C API Reference Manual**. There are also several books covering Python in depth.
This tutorial does not attempt to be comprehensive and cover every single feature, or even every commonly used feature. Instead, it introduces many of Pythons most noteworthy features, and will give you a good idea of the languages flavor and style. After reading it, you will be able to read and write Python modules and programs, and you will be ready to learn more about the various Python library modules described in The Python Standard Library.
The Glossary is also worth going through.
1. Whetting Your Appetite
2. Using the Python Interpreter
2.1. Invoking the Interpreter
2.1.1. Argument Passing
2.1.2. Interactive Mode
2.2. The Interpreter and Its Environment
2.2.1. Error Handling
2.2.2. Executable Python Scripts
2.2.3. Source Code Encoding
2.2.4. The Interactive Startup File
2.2.5. The Customization Modules
3. An Informal Introduction to Python
3.1. Using Python as a Calculator
3.1.1. Numbers
3.1.2. Strings
3.1.3. Unicode Strings
3.1.4. Lists
3.2. First Steps Towards Programming
4. More Control Flow Tools
4.1. if Statements
4.2. for Statements
4.3. The range() Function
4.4. break and continue Statements, and else Clauses on Loops
4.5. pass Statements
4.6. Defining Functions
4.7. More on Defining Functions
4.7.1. Default Argument Values
4.7.2. Keyword Arguments
4.7.3. Arbitrary Argument Lists
4.7.4. Unpacking Argument Lists
4.7.5. Lambda Forms
4.7.6. Documentation Strings
4.8. Intermezzo: Coding Style
5. Data Structures
5.1. More on Lists
5.1.1. Using Lists as Stacks
5.1.2. Using Lists as Queues
5.1.3. Functional Programming Tools
5.1.4. List Comprehensions
5.1.4.1. Nested List Comprehensions
5.2. The del statement
5.3. Tuples and Sequences
5.4. Sets
5.5. Dictionaries
5.6. Looping Techniques
5.7. More on Conditions
5.8. Comparing Sequences and Other Types
6. Modules
6.1. More on Modules
6.1.1. Executing modules as scripts
6.1.2. The Module Search Path
6.1.3. “Compiled” Python files
6.2. Standard Modules
6.3. The dir() Function
6.4. Packages
6.4.1. Importing * From a Package
6.4.2. Intra-package References
6.4.3. Packages in Multiple Directories
7. Input and Output
7.1. Fancier Output Formatting
7.1.1. Old string formatting
7.2. Reading and Writing Files
7.2.1. Methods of File Objects
7.2.2. The pickle Module
8. Errors and Exceptions
8.1. Syntax Errors
8.2. Exceptions
8.3. Handling Exceptions
8.4. Raising Exceptions
8.5. User-defined Exceptions
8.6. Defining Clean-up Actions
8.7. Predefined Clean-up Actions
9. Classes
9.1. A Word About Names and Objects
9.2. Python Scopes and Namespaces
9.3. A First Look at Classes
9.3.1. Class Definition Syntax
9.3.2. Class Objects
9.3.3. Instance Objects
9.3.4. Method Objects
9.4. Random Remarks
9.5. Inheritance
9.5.1. Multiple Inheritance
9.6. Private Variables
9.7. Odds and Ends
9.8. Exceptions Are Classes Too
9.9. Iterators
9.10. Generators
9.11. Generator Expressions
10. Brief Tour of the Standard Library
10.1. Operating System Interface
10.2. File Wildcards
10.3. Command Line Arguments
10.4. Error Output Redirection and Program Termination
10.5. String Pattern Matching
10.6. Mathematics
10.7. Internet Access
10.8. Dates and Times
10.9. Data Compression
10.10. Performance Measurement
10.11. Quality Control
10.12. Batteries Included
11. Brief Tour of the Standard Library Part II
11.1. Output Formatting
11.2. Templating
11.3. Working with Binary Data Record Layouts
11.4. Multi-threading
11.5. Logging
11.6. Weak References
11.7. Tools for Working with Lists
11.8. Decimal Floating Point Arithmetic
12. What Now?
13. Interactive Input Editing and History Substitution
13.1. Line Editing
13.2. History Substitution
13.3. Key Bindings
13.4. Alternatives to the Interactive Interpreter
14. Floating Point Arithmetic: Issues and Limitations
14.1. Representation Error

View File

@@ -0,0 +1,36 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-02T19:05:00+08:00
====== 1. Whetting Your Appetite ======
Created Monday 02 January 2012
If you do much work on computers, eventually you find that theres** some task youd like to automate**. For example, you may wish to perform a search-and-replace over a large number of text files, or rename and rearrange a bunch of photo files in a complicated way. Perhaps youd like to write a small custom database, or a specialized GUI application, or a simple game.
If youre a professional software developer, you may have to work with several C/C++/Java libraries but find the usual write/compile/test/re-compile cycle is too slow. Perhaps youre writing a test suite for such a library and find writing the testing code a tedious task. Or maybe youve written a program that could use an extension language, and you dont want to design and implement a whole new language for your application.
Python is just the language for you.
You could write a __Unix shell script__ or Windows batch files for some of these tasks, but shell scripts are best at moving around files and changing text data, not well-suited for GUI applications or games. You could write a C/C++/Java program, but it can take a lot of development time to get even a first-draft program. Python is simpler to use, available on Windows, Mac OS X, and Unix operating systems, and will help you __get the job done more quickly__.
Python is simple to use, but it is a real programming language, offering __much more structure__ and support for large programs than shell scripts or batch files can offer. On the other hand, Python also offers much __more error checking__ than C, and, being a very-high-level language, it has __high-level data types__ built in, such as flexible arrays and dictionaries. Because of its more general data types Python is applicable to a much larger problem domain than Awk or even Perl, yet many things are at least as easy in Python as in those languages.
Python allows you to split your program into __modules__ that can be reused in other Python programs. It comes with a large collection of **standard modules** that you can use as the basis of your programs — or as examples to start learning to program in Python. Some of these modules provide things like file I/O, system calls, sockets, and even interfaces to graphical user interface toolkits like Tk.
Python is __an interpreted language__, which can save you considerable time during program development because no compilation and linking is necessary. The interpreter can be __used interactively__, which makes it easy to experiment with features of the language, to write throw-away programs, or to test functions during bottom-up program development. It is also a handy desk calculator.
Python enables programs to be written __compactly and readably__. Programs written in Python are typically much shorter than equivalent C, C++, or Java programs, for several reasons:
* the high-level __data types__ allow you to express complex operations in a single statement;
* statement grouping is done by indentation instead of beginning and ending brackets;
* no variable or argument declarations are necessary.
Python is extensible: if you know how to program in C it is easy to add a new built-in function or module to the interpreter, either to perform critical operations at maximum speed, or to link Python programs to libraries that may only be available in binary form (such as a vendor-specific graphics library). Once you are really hooked, you can link the Python interpreter into an application written in C and use it as an extension or command language for that application.
By the way, the language is named after the BBC show “Monty Pythons Flying Circus” and has nothing to do with reptiles. Making references to Monty Python skits in documentation is not only allowed, it is encouraged!
Now that you are all excited about Python, youll want to examine it in some more detail. Since __the best way to learn a language is to use it__, the tutorial invites you to play with the Python interpreter as you read.
In the next chapter, the mechanics of using the interpreter are explained. This is rather mundane information, but essential for trying out the examples shown later.
The rest of the tutorial introduces various features of the Python language and system through examples, beginning with simple expressions, statements and data types, through functions and modules, and finally touching upon advanced concepts like exceptions and user-defined classes.

View File

@@ -0,0 +1,125 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-02T19:08:57+08:00
====== 2.1. Invoking the Interpreter ======
Created Monday 02 January 2012
The Python interpreter is usually installed as /usr/local/bin/python on those machines where it is available; putting /usr/local/bin in your Unix shells search path makes it possible to start it by typing the command python to the shell. Since the choice of the directory where the interpreter lives is an installation option, other places are possible; check with your local Python guru or system administrator. (E.g., /usr/local/python is a popular alternative location.)
On Windows machines, the Python installation is usually placed in C:\Python27, though you can change this when youre running the installer. To add this directory to your path, you can type the following command into the command prompt in a DOS box:
set path=%path%;C:\python27
Typing an __end-of-file__ character (Control-D on Unix, Control-Z on Windows) at the **primary prompt** causes the interpreter to exit with a __zero __exit status. If that doesnt work, you can exit the interpreter by typing the following command: __quit() or exit() or raise SystemExit().__
The interpreters __line-editing__ features usually arent very sophisticated. On Unix, whoever installed the interpreter may have enabled support for the** GNU readline library**, which adds more elaborate interactive editing and history features. Perhaps the quickest check to see whether command line editing is supported is typing Control-P to the first Python prompt you get. If it beeps, you have command line editing; see Appendix Interactive Input Editing and History Substitution for an introduction to the keys. If nothing appears to happen, or if ^P is echoed, command line editing isnt available; youll only be able to use backspace to remove characters from the current line.
The interpreter operates somewhat__ like the Unix shell__: when called with standard input connected to a tty device, it reads and executes commands interactively; when called with a file name argument or with a file as standard input, it reads and executes a script from that file.
当解释器和标准输入相连时,它进入了读一行---执行---再读一行的**执行循环模式**当遇到EOF时退出。
A second way of starting the interpreter is__ python -c command [arg] ...__, which executes the statement(s) in command, analogous to the **shells -c option**. Since Python statements often contain spaces or other characters that are special to the shell, it is usually advised to quote command in its entirety with single quotes.
将-c后的参数作为命令执行command一般用字符串引用而且中间__可以换行__(因此,可以输入语句块)。
Some Python **modules are also useful as scripts(主要用于测试)**. These can be invoked using __python -m module [arg] ...__, which executes the source file for module as if you had spelled out its** full name(只要提供模块名称即可python会自动在搜索路径中查找)** on the command line.
一般情况下,-c/-m不同时使用。
When a script file is used, it is sometimes useful to be able to run the script and **enter interactive mode afterwards**. This can be done by passing -i __before__ the script.
将__-i选项__放在脚本参数的前面这样解释器在执行__完__脚本中的语句时会自动进入__交互式__模式此时的环境是__执行脚本时__的环境。
===== 2.1.1. Argument Passing =====
When known to the interpreter, the script name and additional arguments thereafter are turned into **a list of strings** and assigned to the __argv__ variable in the__ sys__ module. You can access this list by executing** import sys**. The length of the list is__ at least one__; when no script and no arguments are given, sys.argv[0] is an empty string. When the script name is given as __'-'__ (meaning standard input), sys.argv[0] is set to '-'. When -c command is used, sys.argv[0] is set to '-c'. When -m module is used, sys.argv[0] is set to the** full name** of the located module. Options found after -c command or -m module __are not consumed__ by the Python interpreters option processing but left in __sys.argv__ for the command or module to handle.
* 解释器要向脚本传递__脚本名称和附加参数__脚本名称放在sys.argv[0]中附加参数从sys.argv[1]开始存放。
* 脚本名称可能为空、'-'、-c 以及module full name脚本__名称后的所有选项或参数__都会放在sys.argv[1:]中。因此在调用解释器时要注意不能像通常的GUN程序那样**无序地**指定参数。
===== 2.1.2. Interactive Mode =====
When commands are** read from a tty**, the interpreter is said to be__ in interactive mode__. In this mode it prompts for the next command with **the primary prompt**, usually three greater-than signs (>>>); for continuation lines it prompts with** the secondary prompt**, by default three dots (...). The interpreter prints a welcome message stating its version number and a copyright notice before printing the first prompt:
//python//
Python 2.7 (#1, Feb 28 2010, 00:02:06)
Type "help", "copyright", "credits" or "license" for more information.
>>>
Continuation lines are needed when entering a __multi-line construct__. As an example, take a look at this if statement:
>>>
>>> the_world_is_flat = 1
>>> if the_world_is_flat__:__
... print "Be careful not to fall off!"
...
Be careful not to fall off! #__交互式模式中使用一个空白行表明多行语句块缩进的结束。__
===== 2.2. The Interpreter and Its Environment =====
==== 2.2.1. Error Handling ====
When an error occurs, the interpreter prints __an error message and a stack trace__. In interactive mode, it then returns to the primary prompt; when input came from a file, it exits with a nonzero exit status after printing the stack trace. (Exceptions handled by an except clause in a try statement are not errors in this context.) Some errors are unconditionally fatal and cause an exit with a nonzero exit; this applies to internal inconsistencies and some cases of running out of memory. All error messages are written to the** standard error** stream; normal output from executed commands is written to** standard output**.
Typing the interrupt character (usually Control-C or DEL) to the primary or secondary prompt __cancels the input__ and returns to the primary prompt. [1] Typing an interrupt while a command is executing raises the__ KeyboardInterrupt__ exception, which may be handled by a try statement.
==== 2.2.2. Executable Python Scripts ====
On BSDish Unix systems, Python scripts can be made **directly executable**, like shell scripts, by putting the line
#! /usr/bin/__env__ python
(assuming that the interpreter is on the users PATH) at the beginning of the script and giving the file an executable mode. The #! must be the __first two __characters of the file. On some platforms, this first line must end with a Unix-style line ending ('\n'), not a Windows ('\r\n') line ending. Note that the hash, or pound, character, '#', is used to start a comment in Python.
The script can be given an executable mode, or permission, using the chmod command:
$ chmod +x myscript.py
On Windows systems, there is no notion of an “executable mode”. The Python installer automatically __associates .py files with python.exe __so that a double-click on a Python file will run it as a script. The extension can also be .pyw, in that case, the console window that normally appears is suppressed.
==== 2.2.3. Source Code Encoding ====
**特别适用于使用UTF8编码但是不设置**__文档字节编码标记__**的编辑器生成的源文件。**
It is possible to use encodings different than ASCII in Python source files. The best way to do it is to put one more special comment line __right after__ the #! line to define the source file encoding:
__# -*- coding: encoding -*-__
With that declaration, all characters in the source file will be treated as having the //encoding// encoding, and it will be possible to directly write__ Unicode string literals__ in the selected encoding. The list of possible encodings can be found in the Python Library Reference, in the section on codecs.
目前python__只支持字符串字面量或注释__使用Unicode编码所有的关键字和标示符__必须使用__ASCII编码。在指定编码时**UTF8, UTF-8, utf8, utf-8**是等价的。
For example, to write **Unicode literals** including the Euro currency symbol, the ISO-8859-15 encoding can be used, with the Euro symbol having the __ordinal value__ 164. This script will print the value 8364 (the** Unicode codepoint **corresponding to the Euro symbol) and then exit:
# -*- coding: iso-8859-15 -*-
currency = u"€"
print ord(currency)
If your editor supports saving files as UTF-8 with a__ UTF-8 byte order mark__ (aka **BOM**), you can use that instead of an encoding declaration. IDLE supports this capability if Options/General/Default Source Encoding/UTF-8 is set. Notice that this signature is not understood in older Python releases (2.2 and earlier), and also not understood by the operating system for script files with #! lines (only used on Unix systems).
By using UTF-8 (either through the signature or an encoding declaration), characters of most languages in the world can be used simultaneously in** string literals and comments**. Using non-ASCII characters in __identifiers__ is not supported. To display all these characters properly, your editor must recognize that the file is UTF-8, and it must use a __font__ that supports all the characters in the file.
==== 2.2.4. The Interactive Startup File ====
When you use Python interactively, it is frequently handy to have some **standard commands **executed every time the interpreter is started. You can do this by setting an environment variable named __PYTHONSTARTUP__ to the **name of a file** containing your start-up commands. This is similar to the .profile feature of the Unix shells.
非交互式脚本启动时__不会读取__这个环境变量指定的文件中的语句但是可以通过如下的代码__明确地执行__它们。由于解释器是一条条执行启动文件中的初始化语句因此可能在文件的开头需要**导入相应的模块**。
This file is __only read__ in interactive sessions, not when Python reads commands from a script, and not when /dev/tty is given as the explicit source of commands (which otherwise behaves like an interactive session). It is executed in __the same namespace__ where interactive commands are executed, so that objects that it defines or imports can be used without qualification in the interactive session. You can also change the prompts __sys.ps1__ and __sys.ps2__ in this file (必须在文件的开头导入sys module).
If you want to read an additional start-up file from the current directory, you can program this in the global start-up file using code like if os.path.isfile('.pythonrc.py'): __execfile__('.pythonrc.py').
如果还需要读取其它的初始化文件则可以在PYTHONSTARTUP文件中指定并用内置命令execfile执行它们。
If you want to use the startup file in a script(在**脚本文件里**明确使用启动文件__适用于非交互式脚本__), you must do this explicitly in the script:
import os
filename = os.environ.get('PYTHONSTARTUP')
if filename and os.path.isfile(filename):
execfile(filename)
==== 2.2.5. The Customization Modules ====
Python provides __two hooks__ to let you customize it: sitecustomize and usercustomize. To see how it works, you need first to find the location of your **user site-packages directory**. Start Python and run this code:
>>>
>>> import site
>>> site.getusersitepackages()
'/home/user/.local/lib/python3.2/site-packages'
Now you can create a file named **usercustomize.py** in that directory and put anything you want in it. It will affect every invocation of Python, unless it is started with the __-s__ option to disable the automatic import.
sitecustomize works in the same way(使用的是site.getsitepackages), but is typically created by an administrator of the computer in the __global site-packages__ directory, and is imported before usercustomize. See the documentation of the site module for more details.
Footnotes
[1] A problem with the GNU Readline package may prevent this.

View File

@@ -0,0 +1,223 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-04T20:11:44+08:00
====== 10. Brief Tour of the Standard Library ======
Created Wednesday 04 January 2012
===== 10.1. Operating System Interface =====
The__ os module__ provides dozens of functions for interacting with the operating system:
>>>
>>> import os
>>> os.getcwd() # Return the current working directory
'C:\\Python26'
>>> os.chdir('/server/accesslogs') # Change current working directory
>>> __os.system__('mkdir today') # Run the command mkdir in the system shell
0
Be sure to use the** import os **style instead of from os import *. This will keep os.open() from shadowing the __built-in open()__ function which operates much differently.
The __built-in dir() and help()__ functions are useful as interactive aids for working with large modules like os:
>>>
>>> import os
>>> dir(os)
<returns a list of **all module functions**>
>>> help(os)
<returns an extensive** manual page** created from the __module's docstrings__>
For daily file and directory management tasks, the __shutil __module provides a higher level interface that is easier to use:
>>>
>>> import **shutil #shell utility**
>>> shutil.copyfile('data.db', 'archive.db')
>>> shutil.move('/build/executables', 'installdir')
===== 10.2. File Wildcards =====
The __glob__ module provides a function for making file lists from directory **wildcard** searches:
>>>
>>> import glob
>>> glob.glob('*.py')
['primes.py', 'random.py', 'quote.py']
glob模块使用的通配符语法和bash使用的一样。
===== 10.3. Command Line Arguments =====
Common utility scripts often need to** process command line arguments**. These arguments are stored in the__ sys modules argv__ attribute as a list. For instance the following output results from running python demo.py one two three at the command line:
>>>
>>> import sys
>>> print sys.argv
['demo.py', 'one', 'two', 'three']
The__ getopt__ module processes sys.argv using the conventions of the Unix getopt() function. More powerful and flexible command line processing is provided by the __argparse__ module.
===== 10.4. Error Output Redirection and Program Termination =====
The sys module also has attributes for __stdin, stdout, and stderr__. The latter is useful for emitting warnings and error messages to make them visible even when stdout has been redirected:
>>>
>>> **sys.stderr.write**('Warning, log file not found starting a new one\n')
Warning, log file not found starting a new one
The most direct way to terminate a script is to use __sys.exit()__.
===== 10.5. String Pattern Matching =====
The __re __module provides regular expression tools for advanced string processing. For **complex matching and manipulation**, regular expressions offer succinct, optimized solutions:
>>>
>>> import re
>>> re.findall(__r__'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'
When only simple capabilities are needed, string methods are preferred because they are easier to read and debug:
>>>
>>> 'tea for too'.**replace**('too', 'two')
'tea for two'
===== 10.6. Mathematics =====
The __math__ module gives access to the underlying **C library functions** for floating point math:
>>>
>>> import math
>>> math.cos(math.pi / 4.0)
0.70710678118654757
>>> math.log(1024, 2)
10.0
The __random__ module provides tools for making random selections:
>>>
>>> import random
>>> random.choice(['apple', 'pear', 'banana'])
'apple'
>>> random.sample(xrange(100), 10) # sampling without replacement
[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
>>> random.random() # random float
0.17970987693706186
>>> random.randrange(6) # random integer chosen from range(6)
4
===== 10.7. Internet Access =====
There are a number of modules for accessing the internet and processing internet protocols. Two of the simplest are __urllib2__ for retrieving data from urls and __smtplib __for sending mail:
>>>
>>> import urllib2
>>> for line in urllib2.**urlopen**('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
... if 'EST' in line or 'EDT' in line: # look for Eastern Time
... print line
<BR>Nov. 25, 09:43:32 PM EST
>>> import smtplib
>>> __server__ = smtplib.**SMTP**('localhost')
>>> server.sendmail('soothsayer@example.org', 'jcaesar@example.org',
... """To: jcaesar@example.org
... From: soothsayer@example.org
...
... Beware the Ides of March.
... """)
>>> server.quit()
(Note that the second example needs a mailserver running on localhost.)
===== 10.8. Dates and Times =====
The __datetime __module supplies classes for manipulating dates and times in both simple and complex ways. While date and time arithmetic is supported, the focus of the implementation is on efficient member extraction for output formatting and manipulation. The module also supports objects that are **timezone **aware.
>>>
>>> # dates are easily constructed and formatted
>>> from datetime import date
>>> now = date.today()
>>> now
datetime.date(2003, 12, 2)
>>> now.__strftime__("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'
>>> # dates support __calendar arithmetic__
>>> birthday = date(1964, 7, 31)
>>> age = now - birthday
>>> age.days
14368
===== 10.9. Data Compression =====
Common data archiving and compression formats are directly supported by modules including: __zlib, gzip, bz2, zipfile and tarfile__.
>>>
>>> import zlib
>>> s = 'witch which has which witches wrist watch'
>>> len(s)
41
>>> t = zlib.**compress(s)**
>>> len(t)
37
>>> zlib.**decompress(t)**
'witch which has which witches wrist watch'
>>> zlib.**crc32(s)**
226805979
===== 10.10. Performance Measurement =====
Some Python users develop a deep interest in knowing the relative performance of __different approaches to the same problem__. Python provides a **measurement tool** that answers those questions immediately.
For example, it may be tempting to use the tuple packing and unpacking feature instead of the traditional approach to swapping arguments. The __timeit__ module quickly demonstrates a modest performance advantage:
>>>
>>> from __timeit__ import Timer
>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
0.57535828626024577
>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
0.54962537085770791
In contrast to timeits fine level of granularity, the __profile__ and __pstats__ modules provide tools for identifying time critical sections in larger blocks of code.
===== 10.11. Quality Control =====
One approach for developing high quality software is to **write tests for each function** as it is developed and to__ run those tests frequently __during the development process.
The __doctest__ module provides a tool for scanning a module and **validating tests embedded in a programs docstrings**. Test construction is as simple as __cutting-and-pasting__ a typical call along with its results into the docstring. This improves the documentation by **providing the user with an example **and it allows the doctest module to make sure the code remains true to the documentation:
def average(values):
"""Computes the arithmetic mean of a list of numbers.
>>> print average([20, 30, 70])
40.0
"""
return sum(values, 0.0) / len(values)
import __doctest__
doctest.testmod() # automatically validate the** embedded tests**
The __unittest __module is not as effortless as the doctest module, but it allows a more comprehensive set of tests to be maintained **in a separate file**:
import unittest
class TestStatisticalFunctions(unittest.TestCase):
def test_average(self):
self.assertEqual(average([20, 30, 70]), 40.0)
self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
self.assertRaises(ZeroDivisionError, average, [])
self.assertRaises(TypeError, average, 20, 30, 70)
unittest.main() # Calling from the command line invokes all tests
===== 10.12. Batteries Included =====
Python has a “batteries included” philosophy. This is best seen through the sophisticated and robust capabilities of its larger packages. For example:
* The __xmlrpclib__ and __SimpleXMLRPCServer__ modules make implementing remote procedure calls into an almost trivial task. Despite the modules names, no direct knowledge or handling of XML is needed.
* The __email__ package is a library for managing email messages, including MIME and other RFC 2822-based message documents. Unlike smtplib and poplib which actually send and receive messages, the email package has a complete toolset for __building or decoding complex message structures __(including attachments) and for implementing** internet encoding** and header protocols.
* The__ xml.dom__ and __xml.sax__ packages provide robust support for parsing this popular data interchange format. Likewise, the__ csv __module supports direct reads and writes in a common database format. Together, these modules and packages greatly simplify data interchange between Python applications and other tools.
* Internationalization is supported by a number of modules including **gettext, locale**, and the **codecs** package.

View File

@@ -0,0 +1,302 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-05T10:56:47+08:00
====== 11. Brief Tour of the Standard Library Part II ======
Created Thursday 05 January 2012
This second tour covers more advanced modules that support __professional programming__ needs. These modules **rarely occur **in small scripts.
===== 11.1. Output Formatting =====
The __repr __module provides a version of repr() customized for abbreviated displays of large or deeply nested containers:
>>>
>>> import repr
>>> repr.repr(set('supercalifragilisticexpialidocious'))
"set(['a', 'c', 'd', 'e', 'f', 'g', ...])"
The __pprint__ module offers more sophisticated control over printing both built-in and user defined objects in a way that is** readable by the interpreter.** When the result is longer than one line, the “pretty printer” adds line breaks and indentation to more clearly reveal data structure:
>>>
>>> import pprint
>>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
... 'yellow'], 'blue']]]
...
>>> pprint.pprint(t, width=30)
[[[['black', 'cyan'],
'white',
['green', 'red']],
[['magenta', 'yellow'],
'blue']]]
The __textwrap__ module formats paragraphs of text to fit **a given screen width**:
>>>
>>> import textwrap
>>> doc = """The wrap() method is just like fill() except that it returns
... a list of strings instead of one big string with newlines to separate
... the wrapped lines."""
...
>>> print textwrap.fill(doc, width=40)
The wrap() method is just like fill()
except that it returns a list of strings
instead of one big string with newlines
to separate the wrapped lines.
The__ locale__ module accesses a database of __culture specific data formats__. The grouping attribute of locales format function provides a direct way of formatting numbers with group separators:
>>>
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')
'English_United States.1252'
>>> conv = locale.localeconv() # get a mapping of conventions
>>> x = 1234567.8
>>> locale.format("%d", x, grouping=True)
'1,234,567'
>>> locale.format_string("%s%.*f", (conv['currency_symbol'],
... conv['frac_digits'], x), grouping=True)
'$1,234,567.80'
===== 11.2. Templating =====
The __string __module includes a versatile **Template class** with a simplified syntax suitable for editing by end-users. This allows users to customize their applications without having to alter the application.
The format uses placeholder names formed by $ with valid Python identifiers (alphanumeric characters and underscores). Surrounding the__ placeholder __with braces allows it to be followed by more alphanumeric letters with no intervening spaces. Writing $$ creates a single escaped $:
>>>
>>> from string import __Template__
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.__substitute__(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'
The substitute() method raises a **KeyError** when a placeholder is not supplied in **a dictionary or a keyword** argument. For mail-merge style applications, user supplied data may be incomplete and the __safe_substitute()__ method may be more appropriate — it will** leave placeholders unchanged** if data is missing:
>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unladen swallow')
>>> t.substitute(d)
Traceback (most recent call last):
. . .
**KeyError: 'owner'**
>>> t.__safe_substitute__(d)
'Return the unladen swallow to $owner.'
**Template **subclasses can specify __a custom delimiter__. For example, a batch renaming utility for a photo browser may elect to use percent signs for placeholders such as the current date, image sequence number, or file format:
>>>
>>> import** time**, **os.path**
>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
>>> class BatchRename(Template):
... __delimiter = '%'__
>>> fmt = raw_input('Enter rename style (%d-date %n-seqnum %f-format): ')
Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f
>>> t = BatchRename(fmt)
>>> date = time.strftime('%d%b%y')
>>> for i, filename in enumerate(photofiles):
... base, ext = __os.path.splitext__(filename)
... newname = t.substitute(d=date, n=i, f=ext)
... print '{0} --> {1}'.format(filename, newname)
img_1074.jpg --> Ashley_0.jpg
img_1076.jpg --> Ashley_1.jpg
img_1077.jpg --> Ashley_2.jpg
Another application for templating is separating program logic from the details of multiple output formats. This makes it possible to substitute custom templates for XML files, plain text reports, and HTML web reports.
===== 11.3. Working with Binary Data Record Layouts =====
The __struct __module provides pack() and unpack() functions for working with __variable length binary record formats__. The following example shows how to loop through header information in a ZIP file without using the zipfile module. Pack codes "H" and "I" represent** two and four byte** unsigned numbers respectively. The "<" indicates that they are** standard size** and in** little-endian** byte order:
import struct
data = open('myfile.zip', 'rb').read()
start = 0
for i in range(3): # show the first 3 file headers
start += 14
fields = struct.unpack('<IIIHH', data[start:start+16])
crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
start += 16
filename = data[start:start+filenamesize]
start += filenamesize
extra = data[start:start+extra_size]
print filename, hex(crc32), comp_size, uncomp_size
start += extra_size + comp_size # skip to the next header
===== 11.4. Multi-threading =====
Threading is a technique for __decoupling tasks __which are __not sequentially dependent__. Threads can be used to improve the responsiveness of applications that accept user input while other tasks run in the** background**. A related use case is running I/O in parallel with computations in another thread.
The following code shows how the high level __threading__ module can run tasks in background while the main program continues to run:
import__ threading, zipfile__
class AsyncZip(threading.Thread):
def __init__(self, infile, outfile):
threading.Thread.__init__(self)
self.infile = infile
self.outfile = outfile
def run(self):
f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
f.write(self.infile)
f.close()
print 'Finished background zip of: ', self.infile
__background__ = AsyncZip('mydata.txt', 'myarchive.zip')
__background.start()__
print 'The main program continues to run in foreground.'
__background.join()__ # Wait for the background task to finish
print 'Main program waited until background was done.'
The principal challenge of multi-threaded applications is__ coordinating threads that share data or other resources__. To that end, the threading module provides a number of__ synchronization primitives__ including locks, events, condition variables, and semaphores.
While those tools are powerful, minor design errors can result in problems that are difficult to reproduce. So, __the preferred approach to task coordination is to concentrate all access to a resource in a single thread and then use the Queue module to feed that thread with requests from other threads__. Applications using Queue.Queue objects for inter-thread communication and coordination are easier to design, more readable, and more reliable.
===== 11.5. Logging =====
The __logging__ module offers a full featured and flexible logging system. At its simplest, log messages are sent to a file or to sys.stderr:
import logging
logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')
This produces the following output:
WARNING:root:Warning:config file server.conf not found
ERROR:root:Error occurred
CRITICAL:root:Critical error -- shutting down
By default, **informational and debugging messages are suppressed** and the output is sent to __standard error__. Other output options include routing messages through email, datagrams, sockets, or to an HTTP Server. **New filters** can select different routing based on message priority: DEBUG, INFO, WARNING, ERROR, and CRITICAL.
The logging system can be configured directly from Python or can be loaded from a user editable configuration file for customized logging without altering the application.
===== 11.6. Weak References =====
Python does __automatic memory management__ (reference counting for most objects and garbage collection to eliminate cycles). The memory is freed shortly after the__ last reference __to it has been eliminated.
This approach works fine for most applications but occasionally there is a need to** track objects** only as long as they are being used by something else. Unfortunately, just tracking them creates a reference that makes them__ permanent__. __The weakref module provides tools for tracking objects without creating a reference__. When the object is no longer needed, it is automatically removed from a weakref table and a callback is triggered for weakref objects. Typical applications include caching objects that are expensive to create:
>>>
>>> import __weakref, gc__
>>> class A:
... def __init__(self, value):
... self.value = value
... def __repr__(self):
... return str(self.value)
...
>>> a = A(10) # create **a reference**
>>> d =__ weakref.WeakValueDictionary()__
>>> d['primary'] = a # __does not create a reference__
>>> d['primary'] # fetch the object if it is still alive
10
>>> del a # remove the one reference
>>>__ gc.collect()__ # run garbage collection right away
0
>>> d['primary'] # entry was automatically removed
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
d['primary'] # entry was automatically removed
File "C:/python26/lib/weakref.py", line 46, in __getitem__
o = self.data[key]()
KeyError: 'primary'
===== 11.7. Tools for Working with Lists =====
Many data structure needs can be met with the built-in list type. However, sometimes there is a need for** alternative implementations** with different performance trade-offs.
The __array__ module provides an array() object that is like a list that stores only homogeneous data and stores it more compactly. The following example shows an array of numbers stored as **two byte unsigned binary numbers** (typecode "H") rather than the usual** 16 bytes per entry** for regular lists of Python int objects:
>>>
>>> from array import array
>>> a = array('H', [4000, 10, 700, 22222])
>>> sum(a)
26932
>>> a[1:3]
array('H', [10, 700])
The __collections __module provides a__ deque() __object that is like a list with faster appends and pops from the left side but slower lookups in the middle. These objects are well suited for implementing queues and __breadth first tree searches__:
>>>
>>> from collections import deque
>>> d = deque(["task1", "task2", "task3"])
>>> d.append("task4")
>>> print "Handling", d.popleft()
Handling task1
unsearched = deque([starting_node])
def breadth_first_search(unsearched):
node = unsearched.popleft()
for m in __gen_moves__(node):
if is_goal(m):
return m
unsearched.append(m)
In addition to alternative list implementations, the library also offers other tools such as the __bisect __module with functions for manipulating sorted lists:
>>>
>>> import bisect
>>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
>>> bisect.insort(scores, (300, 'ruby'))
>>> scores
[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]
The __heapq__ module provides functions for implementing heaps based on regular lists. __The lowest valued entry is always kept at position zero__. This is useful for applications which repeatedly access the smallest element but do not want to run a full list sort:
>>>
>>> from heapq import **heapify, heappop, heappush**
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> heapify(data) # rearrange the list into heap order
>>> heappush(data, -5) # add a new entry
>>> [heappop(data) for i in range(3)] # fetch the three smallest entries
[-5, 0, 1]
===== 11.8. Decimal Floating Point Arithmetic =====
The __decimal__ module offers a Decimal datatype for **decimal floating point** arithmetic. Compared to the built-in float implementation of **binary floating point**, the class is especially helpful for
* financial applications and other uses which require__ exact decimal representation__,
* control over precision,
* control over rounding to meet legal or regulatory requirements,
* tracking of significant decimal places, or
* applications where the user expects the results to match calculations done by hand.
For example, calculating a 5% tax on a 70 cent phone charge gives different results in decimal floating point and binary floating point. The difference becomes significant if the results are rounded to the nearest cent:
>>>
>>> from decimal import *
>>> x = Decimal('0.70') * Decimal('1.05')
>>> x
Decimal('0.7350')
>>> x.quantize(Decimal('0.01')) # round to nearest cent
Decimal('0.74')
>>> round(.70 * 1.05, 2) # same calculation with floats
0.73
The Decimal result keeps a trailing zero, automatically inferring four place significance from multiplicands with two place significance. Decimal reproduces mathematics as done by hand and avoids issues that can arise when binary floating point cannot exactly represent decimal quantities.
Exact representation enables the Decimal class to perform modulo calculations and equality tests that are unsuitable for binary floating point:
>>>
>>> Decimal('1.00') % Decimal('.10')
Decimal('0.00')
>>> 1.00 % 0.10
0.09999999999999995
>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
True
>>> sum([0.1]*10) == 1.0
False
The decimal module provides arithmetic with as much precision as needed:
>>>
>>> getcontext().prec = 36
>>> Decimal(1) / Decimal(7)
Decimal('0.142857142857142857142857142857142857')

View File

@@ -0,0 +1,27 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-05T11:20:11+08:00
====== 12. What Now ======
Created Thursday 05 January 2012
Reading this tutorial has probably reinforced your interest in using Python — you should be eager to apply Python to solving your real-world problems. Where should you go to learn more?
This tutorial is part of Pythons documentation set. Some other documents in the set are:
* The Python Standard Library:
You should browse through this manual, which gives complete (though terse) reference material about__ types, functions, and the modules__ in the standard library. The standard Python distribution includes a lot of additional code. There are modules to read Unix mailboxes, retrieve documents via HTTP, generate random numbers, parse command-line options, write CGI programs, compress data, and many other tasks. Skimming through the Library Reference will give you an idea of whats available.
* Installing Python Modules explains how to install external modules written by other Python users.
* The Python Language Reference: A detailed explanation of Pythons__ syntax and semantics__. Its heavy reading, but is useful as a complete guide to the language itself.
More Python resources:
http://www.python.org: The major Python Web site. It contains code, documentation, and pointers to Python-related pages around the Web. This Web site is mirrored in various places around the world, such as Europe, Japan, and Australia; a mirror may be faster than the main site, depending on your geographical location.
http://docs.python.org: Fast access to Pythons documentation.
http://pypi.python.org: The__ Python Package Index__, previously also nicknamed the Cheese Shop, is **an index of user-created Python modules **that are available for download. Once you begin releasing code, you can register it here so that others can find it.
http://aspn.activestate.com/ASPN/Python/Cookbook/: The Python Cookbook is a sizable collection of code examples, larger modules, and useful scripts. Particularly notable contributions are collected in a book also titled Python Cookbook (OReilly & Associates, ISBN 0-596-00797-3.)
For Python-related questions and problem reports, you can post to the newsgroup comp.lang.python, or send them to the mailing list at python-list@python.org. The newsgroup and mailing list are gatewayed, so messages posted to one will automatically be forwarded to the other. There are around 120 postings a day (with peaks up to several hundred), asking (and answering) questions, suggesting new features, and announcing new modules. Before posting, be sure to check the list of Frequently Asked Questions (also called the FAQ), or look for it in the Misc/ directory of the Python source distribution. Mailing list archives are available at http://mail.python.org/pipermail/. The FAQ answers many of the questions that come up again and again, and may already contain the solution for your problem.

View File

@@ -0,0 +1,546 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-02T20:55:26+08:00
====== 3. An Informal Introduction to Python ======
Created Monday 02 January 2012
In the following examples, input and output are distinguished by the presence or absence of prompts (>>> and ...): to repeat the example, you must type everything after the prompt, when the prompt appears; lines that do not begin with a prompt are output from the interpreter. Note that a secondary prompt on a line by itself in an example means you __must type a blank line__; this is used to **end a multi-line command**.
在交互式模式中,用空行来结束多行缩进的语句块。
Many of the examples in this manual, even those entered at the interactive prompt, include comments. Comments in Python start with the hash character, #, and extend to the end of the physical line. A comment may appear at the start of a line or following whitespace or code, but not within a string literal. A hash character within a string literal is just a hash character. Since __comments are to clarify code__ and are not interpreted by Python, they may be omitted when typing in examples.
Some examples:
# this is the first comment
SPAM = 1 # and this is the second comment
# ... and now a third!
STRING = "# This is not a comment."
===== 3.1. Using Python as a Calculator =====
Lets try some simple Python commands. Start the interpreter and wait for the primary prompt, >>>. (It shouldnt take long.)
==== 3.1.1. Numbers ====
The interpreter acts as a simple calculator: you can type an expression at it and it will write the value. Expression syntax is straightforward: the operators +, -, * and / work just like in most other languages (for example, Pascal or C); parentheses can be used for grouping. For example:
>>>
>>> 2+2
4
>>> # This is a comment
... 2+2
4
>>> 2+2 # and a comment on the same line as code
4
>>> (50-5*6)/4
5
>>> #__ Integer division returns the floor__:
... 7/3
2
>>> 7/-3
-3
The equal sign ('=') is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt:
>>>
>>> width = 20
>>> height = 5*9
>>> width * height
900
__A value can be assigned to several variables simultaneously__:
>>>
>>> x = y = z = 0 # Zero x, y and z
>>> x
0
>>> y
0
>>> z
0
__Variables must be “defined” (assigned a value) before they can be used__, or an error will occur:
>>>
>>> # try to access an undefined variable
... n
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
__NameError__: name 'n' is not defined
There is** full support** for floating point; operators with mixed type operands convert the integer operand to floating point:
>>>
>>> 3 * 3.75 / 1.5
7.5
>>> 7.0 / 2
3.5
__Complex numbers __are also supported; imaginary numbers are written with a suffix of __j or J__. Complex numbers with a nonzero real component are written as__ (real+imagj)__, or can be created with the **complex(real, imag)** function.
>>>
>>> 1j * 1J
(-1+0j)
>>> 1j * complex(0,1)
(-1+0j)
>>> 3+1j*3
(3+3j)
>>> (3+1j)*3
(9+3j)
>>> (1+2j)/(1+1j)
(1.5+0.5j)
Complex numbers are always represented as two__ floating point__ numbers, the real and imaginary part. To extract these parts from a complex number z, use z.real and z.imag.
>>>
>>> a=1.5+0.5j
>>>__ a.real__
1.5
>>> __a.imag__
0.5
The** conversion functions **to floating point and integer __(float(), int() and long())__ dont work for complex numbers — there is no one correct way to convert a complex number to a real number. Use __abs(z)__ to get its magnitude (as a float) or z.real to get its real part.
>>>
>>> a=3.0+4.0j
>>> float(a)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: can't convert complex to float; use abs(z)
>>> a.real
3.0
>>> a.imag
4.0
>>>__ abs__(a) # __sqrt__(a.real**2 + a.imag**2)
5.0
In interactive mode, the** last printed expression** is assigned to the variable _____. This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example:
>>>
>>> tax = 12.5 / 100
>>> price = 100.50
>>> price * tax
12.5625
>>> price + _
113.0625
>>> __round__(_, 2)
113.06
This variable should be treated as** read-only** by the user. Dont explicitly assign a value to it — __you would create an independent local variable with the same name masking the built-in variable with its magic behavior__.
==== 3.1.2. Strings ====
转义字符在引号(单、双)中__都起__作用。
Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes or double quotes:
>>>
>>> 'spam eggs'
'spam eggs'
>>> 'doesn__\'__t'
"doesn't" #注意最外层为双引号,因为里面没有双引号
>>> "doesn't"
"doesn't" #同上
>>> '"Yes," he said.'
'"Yes," he said.' #注意最外层为单引号
>>> "\"Yes,\" he said."
'"Yes," he said.' #同上
>>>__ '"Isn\'t," she said.' #这里的转义字符是起作用的。__
'"Isn\'t," she said.' #结果中包含转义字符是因为python的字符串中包含双引号时最外层要用单引号显示。
The interpreter prints the result of string operations in the same way as they are typed for input: inside quotes, and __with quotes__ and other funny characters __escaped by backslashes__, to show the precise value. The string is enclosed in __double quotes __if the string contains a single quote and no double quotes, else its enclosed in single quotes. The** print** statement produces a more readable output for such input strings.
String literals can__ span multiple lines __in several ways. Continuation lines can be used, with a backslash as the last character on the line indicating that the next line is a **logical continuation** of the line:
注意和C语言一样python的字符串字面量中__不能直接输入换行__而只能使用转义字符\n如果要**跨行输入**则必须在当前行尾__添加一转义字符\__。(而bash的字符串中可以直接输入换行。)
但是使用特殊的__三引号形式__则其中可以换行。
hello = "This is a rather long string containing\n\
several lines of text just as you would do in C.\n\
Note that whitespace at the beginning of the line is\
significant."
print hello
Note that newlines still need to be embedded in the string using \n **the newline following the trailing backslash is discarded**. This example would print the following:
This is a rather long string containing
several lines of text just as you would do in C.
Note that whitespace at the beginning of the line is significant.
Or, strings can be surrounded in a pair of matching __triple-quotes: """ or '''__. **End of lines do not need to be escaped when using triple-quotes**, but they will be included in the string.
print """
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
"""
produces the following output:
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
If we make the string literal __a “raw” string__, \n sequences are not converted to newlines, but the backslash at the end of the line, and the newline character in the source, are both included in the string as data. Thus, the example:
原生形式的字符串字面两会忽略其中的转义字符,而且其中可以直接输入换行。
hello =__ r"__This is a rather long string containing\n\
several lines of text much as you would do in C."
print hello
would print:
This is a rather long string containing\n\
several lines of text much as you would do in C.
例如:
>>> hello = r'dfasf\tkjdk\n__\ __ #结尾的转义字符是必需的。
... dfdjf dlfjdl\
... dfdsjfl'
>>> print hello #由于是原生字符串字面量因此python__不会解释__其中的转义字符。
dfasf\tkjdk\n\
dfdjf dlfjdl\
dfdsjfl
>>> hello
__'dfasf\\tkjdk\\n\\\ndfdjf dlfjdl\\\ndfdsjfl' #注意其中的各字符已经被恰当地转义__
>>>
Strings can be concatenated (glued together) with the **+ operator**, and repeated with *:
>>>
>>> word =__ 'Help' + 'A'__
>>> word
'HelpA'
>>> '<' + __word*5 __+ '>'
'<HelpAHelpAHelpAHelpAHelpA>'
Two string literals__ next to each other are automatically concatenated__; the first line above could also have been written word = 'Help' 'A'; this__ only works__ with two literals, not with arbitrary string expressions:
>>>
>>> 'str' 'ing' # <- This is ok
'string'
>>> 'str'.strip() + 'ing' # <- This is ok
'string'
>>> **'str'.strip()** 'ing' # <- This is invalid 因为通过空格连接两相邻字符串的做法只适合于__两字符串字面量__而不适合__字符串表达式__。
File "<stdin>", line 1, in ?
'str'.strip() 'ing'
^
SyntaxError: invalid syntax
Strings can be **subscripted (indexed)**; like in C, the first character of a string has subscript (index) 0. There is __no separate character type__; a character is simply a string of size one. Like in Icon, substrings can be specified with the** slice notation**: two indices separated by a colon.
python中__没有字符(char)类型__而只有字符串(string)字面量类型,前者只是后者的特殊形式。
>>>
>>> word[4] #字符串分片的结果还是一个字符串。
'A'
>>> word[0:2]
'He'
>>> word[2:4]
'lp'
Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to **the size of the string **being sliced.
>>>
>>> word[:2] # The first two characters
'He'
>>> word[2:] # Everything except the first two characters
'lpA'
Unlike a C string, __Python strings cannot be changed__. Assigning to an indexed position in the string results in an error:
>>>
>>> word[0] = 'x'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object does not support** item assignment**
>>> word[:1] = 'Splat'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object does not support **slice assignment**
However, **creating a new string **with the combined content is easy and efficient:
>>>
>>> 'x' + word[1:]
'xelpA'
>>> 'Splat' + word[4]
'SplatA'
Heres a useful invariant of slice operations: s[:i] + s[i:] equals s.
>>>
>>> word[:2] + word[2:]
'HelpA'
>>> word[:3] + word[3:]
'HelpA'
__Degenerate slice indices__ are handled gracefully: an index that is too large is replaced by the string size, an upper bound smaller than the lower bound returns an empty string.
>>>
>>> word[1:100]
'elpA'
>>> word[10:]
''
>>> word[2:1]
'l'
Indices may be__ negative numbers, to start counting from the right__. For example:
>>>
>>> word[-1] # The last character
'A'
>>> word[-2] # The last-but-one character
'p'
>>> word[-2:] # The last two characters
'pA'
>>> word[:-2] # Everything** except** the last two characters
'Hel'
But note that__ -0 is really the same as 0__, so it does not count from the right!
>>>
>>> word[-0] # (since -0 equals 0)
'H'
__Out-of-range negative slice indices are truncated, but dont try this for single-element (non-slice) indices:__
>>>
>>> word[-100:]
'HelpA'
>>> word[-10] # error
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: string index out of range
One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:
+---+---+---+---+---+
| H | e | l | p | A |
+---+---+---+---+---+
__ 0__ 1 2 3 4 5
-5 -4 -3 -2 -1
The first row of numbers gives the position of th//e indices 0...5// in the string; the second row gives the corresponding negative indices. The slice from i to j consists of all characters between the edges labeled i and j, respectively.
For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, **the length of word[1:3] is 2**.
The built-in function__ len() __returns the length of a string:
>>>
>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34
===== See also =====
__Sequence Types __— **str, unicode, list, tuple, bytearray, buffer, xrange**
Strings, and the Unicode strings described in the next section, are examples of sequence types, and support the **common operations** supported by such types.
String Methods
Both strings and Unicode strings support a large number of methods for basic** transformations and searching**.
__String Formatting__
Information about string formatting with __str.format()__ is described here.
String Formatting Operations
The old formatting operations invoked when strings and Unicode strings are the left operand of the __% __operator are described in more detail here.
==== 3.1.3. Unicode Strings ====
Starting with Python 2.0 a new data type for** storing text data **is available to the programmer: the Unicode object. It can be used to store and manipulate Unicode data (see http://www.unicode.org/) and integrates well with the existing string objects, providing **auto-conversions** where necessary.
Unicode has the advantage of providing __one ordinal(序数,编号) for every character __in every script used in modern and ancient texts. Previously, there were only 256 possible ordinals for script characters. Texts were typically bound to** a code page** which mapped the ordinals to script characters. This lead to very much confusion especially with respect to internationalization (usually written as i18n — 'i' + 18 characters + 'n') of software.__ Unicode solves these problems by defining one code page for all scripts.__
Creating Unicode strings in Python is just as simple as creating normal strings:
>>>
>>> __u__'Hello World !'
u'Hello World !'
The small 'u' in front of the quote indicates that a Unicode string is supposed to be created. If you want to include special characters in the string, you can do so by using the __Python Unicode-Escape encoding__. The following example shows how:
>>>
>>> u'Hello\u0020World !' #该字符串中的各字符将以其在unicode字符集中的__序号来存储__。注意__字符在Unicode中的序号是惟一的但是在不同Unicode字符集编码中的值可能是不同的。__
u'Hello World !'
The **escape sequence** \u0020 indicates to insert the **Unicode character** with the ordinal value 0x0020 (the space character) at the given position.
Other characters are interpreted by using their respective ordinal values directly as Unicode ordinals. If you have literal strings in the standard Latin-1 encoding that is used in many Western countries, you will find it convenient that__ the lower 256 characters of Unicode are the same as the 256 characters__ of Latin-1.
For experts, there is also a raw mode just like the one for normal strings. You have to prefix the opening quote with ur to have Python use the__ Raw-Unicode-Escape encoding__. It will only apply the above \uXXXX conversion if there is __an uneven number(非偶数)__ of backslashes in front of the small u.
>>>
>>> ur'Hello\u0020World !' #如果u前面有奇数个反斜线则\uXXXX会被认为是Unicode-Escape否则将忽略其转义功能。
u'Hello World !'
>>> ur'Hello\\u0020World !'
u'Hello\\\\u0020World !'
The raw mode is most useful when you have to enter lots of backslashes, as can be necessary in__ regular expressions__.
Apart from these** standard encodings**, Python provides a whole set of other ways of creating Unicode strings on the basis of a __known encoding__.
The built-in function** unicode()** provides access to all__ registered Unicode codecs__ (COders and DECoders注意这里是Unicode__字符集的编码形式__). Some of the more well known encodings which these codecs can convert are **Latin-1, ASCII, UTF-8, and UTF-16**. The latter two are__ variable-length __encodings that store each Unicode character in one or more bytes.
The default encoding is normally set to ASCII, which passes through characters in the range 0 to 127 and **rejects any other characters with an error**.
__ When a Unicode string is printed, written to a file, or converted with str(), conversion takes place using this default encoding.__
这是因为Unicode字符串是__未编码__存储在解释器内部的而当将其打印或保存时必须对其进行编码。普通字符串都是编码保存的。
>>>
>>> u"abc" #未编码unicode字符串
__u__'abc'
>>>__ str(u"abc") #转换为缺省编码的普通字符串__
'abc' #注意没有u前缀
>>> u"äöü"
u'\xe4\xf6\xfc' #显示的是三个字符在Unicode字符集中的__序号注意它们不是ASCII编码。__
>>> str(u"äöü") #将Unicode编码的字符串转换为缺省编码(ASCII)的字符串。
Traceback (most recent call last):
File "<stdin>", line 1, in ?
__UnicodeEncodeError__: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
To convert __a Unicode string into an 8-bit string using a specific encoding__, Unicode objects provide an __encode()__ method that takes one argument, the name of the encoding. Lowercase names for encodings are preferred.
>>>
>>> u"äöü".encode(__'utf-8'__) #由于该三个字符不在ASCII字符集中因此不能使用缺省的ASCII编码。这里指定__编码形式为utf-8__.
'\xc3\xa4\xc3\xb6\xc3\xbc' #对Unicode字符串采用utf-8编码后得到的普通字符串
If you have data in a specific encoding and want to produce a corresponding Unicode string from it, you can use the __unicode()__ function with the encoding name as the second argument.
>>>
>>> unicode(**'\xc3\xa4\xc3\xb6\xc3\xbc'**, 'utf-8') #__第一个参数是采用第二个参数代表的编码方法编码得到的字符串,结果是一个Unicode字符串。__
u'\xe4\xf6\xfc'
===== 3.1.4. Lists =====
Python knows a number of compound data types, used to **group together** other values. The most versatile多才多艺的 is the list, which can be written as a list of comma-separated values (items) between square brackets. List items need not all have the same type.
>>>
>>> a = ['spam', 'eggs', 100, 1234]
>>> a
['spam', 'eggs', 100, 1234]
Like string indices, list indices start at 0, and lists can be sliced, concatenated and so on:
>>>
>>> a[0]
'spam'
>>> a[3]
1234
>>> a[-2]
100
>>> a[1:-1] #包括第二个参数的值。
['eggs', 100]
>>>** a[:2] + ['bacon', 2*2]**
['spam', 'eggs', 'bacon', 4]
>>> __3*a[:3]__ + ['Boo!']
['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boo!']
__All slice operations return a new list __containing the requested elements. This means that the following slice returns **a shallow copy(潜复制) **of the list a:
>>>
>>>__ a[:]__
['spam', 'eggs', 100, 1234]
Unlike strings, which are immutable, it is possible to change individual elements of a list:
>>>
>>> a
['spam', 'eggs', 100, 1234]
>>> a[2] = a[2] + 23
>>> a
['spam', 'eggs', 123, 1234]
Assignment to slices is also possible, and this can even change the size of the list or clear it entirely:
>>>
>>> # Replace some items:
... a[0:2] = [1, 12]
>>> a
[1, 12, 123, 1234]
>>> # Remove some:
... __a[0:2] = []__
>>> a
[123, 1234]
>>> # Insert some:
... __a[1:1] = ['bletch', 'xyzzy'] __
>>> a
[123, 'bletch', 'xyzzy', 1234]
>>> # Insert (a copy of) itself at the beginning
>>> a[:0] = a #也可以是__a[0:0]__
>>> a
[123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
>>> **# Clear the list:** replace all items with an empty list
>>> a[:] = []
>>> a
[]
The built-in function __len() __also applies to lists:
>>>
>>> a = ['a', 'b', 'c', 'd']
>>> len(a)
4
It is possible to **nest lists** (create lists containing other lists), for example:
>>>
>>> q = [2, 3]
>>> p = [1, q, 4]
>>> len(p)
3
>>> p[1]
[2, 3]
>>> p[1][0]
2
>>>** p[1].append('xtra')** # See section 5.1
>>> p
[1, [2, 3, 'xtra'], 4]
>>> q
[2, 3, 'xtra']
Note that in the last example, p[1] and q really refer to the same object! Well come back to object semantics later.
==== 3.2. First Steps Towards Programming ====
Of course, we can use Python for more complicated tasks than adding two and two together. For instance, we can write an initial sub-sequence of the Fibonacci series as follows:
>>>
>>> # Fibonacci series:
... # the sum of two elements defines the next
... __a, b = 0, 1__
>>> while b < 10:
... print b
... a, b = b, a+b
...
1
1
2
3
5
8
This example introduces several new features.
* The first line contains a **multiple assignment**: the variables a and b simultaneously get the new values 0 and 1. On the last line this is used again, demonstrating that the expressions on the right-hand side are all evaluated first before any of the assignments take place. The right-hand side expressions are evaluated from the left to the right.
* The **while loop **executes as long as the condition (here: b < 10) remains true. In Python, like in C, any non-zero integer value is true; zero is false. The condition may also be a string or list value, in fact any sequence; anything with a non-zero length is true, empty sequences are false. The test used in the example is a simple comparison. The** standard comparison operators** are written the same as in C: < (less than), > (greater than), == (equal to), <= (less than or equal to), >= (greater than or equal to) and != (not equal to).
* The body of the loop is indented:__ indentation is Pythons way of grouping statements__. At the interactive prompt, you have to type a tab or space(s) for each indented line. In practice you will prepare more complicated input for Python with a text editor; all decent text editors have an auto-indent facility. When a compound statement is entered interactively, it must be__ followed by a blank line__ to indicate completion (since the parser cannot guess when you have typed the last line). Note that each line within a basic block must be indented by **the same amount**.
* The** print statement **writes the value of the expression(s) it is given. It differs from just writing the expression you want to write (as we did earlier in the calculator examples) in the way it handles multiple expressions and strings. Strings are printed __without quotes, and a space__ is inserted between items, so you can format things nicely, like this:
>>>
>>> i = 256*256
>>> print 'The value of i is', i
The value of i is 65536
__ A trailing comma__ avoids the newline after the output:
>>>
>>> a, b = 0, 1
>>> while b < 1000:
... print __b,__
... a, b = b, a+b
...
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
Note that the interpreter inserts a newline before it prints the next prompt if the last line was not completed

View File

@@ -0,0 +1,435 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-03T10:52:20+08:00
====== 4. More Control Flow Tools ======
Created Tuesday 03 January 2012
Besides the **while** statement just introduced, Python knows the usual control flow statements known from other languages, with some twists.
===== 4.1. if Statements =====
Perhaps the most well-known statement type is the if statement. For example:
>>>
>>> x = int(__raw_input__("Please enter an integer: "))
Please enter an integer: 42
>>> if x < 0:
... x = 0
... print 'Negative changed to zero'
... elif x == 0:
... print 'Zero'
... elif x == 1:
... print 'Single'
... else:
... print 'More'
...
More
There can be zero or more elif parts, and the __else__ part is optional. The keyword __elif__ is short for else if, and is useful to avoid excessive indentation. An if ... elif ... elif ... sequence is a substitute for the switch or case statements found in other languages.
python没有switch...case...语句结构。
===== 4.2. for Statements =====
The for statement in Python differs a bit from what you may be used to in C or Pascal. Rather than always iterating over an arithmetic progression of numbers (like in Pascal), or giving the user the ability to define both the iteration step and halting condition (as C), __Pythons for statement iterates over the items of any sequence __(a list or a string), in the order that they appear in the sequence. For example (no pun intended):
>>>
>>> # Measure some strings:
... a = ['cat', 'window', 'defenestrate']
>>> for x in a:
... print x, len(x)
...
cat 3
window 6
defenestrate 12
It is not safe to modify the sequence being iterated over in the loop (this can only happen for mutable sequence types, such as lists). If you need to modify the list you are iterating over (for example, to duplicate selected items) you must __iterate over a copy__. The slice notation makes this particularly convenient:
>>>
>>> for x in __a[:]: # make a slice copy of the entire list__
... if len(x) > 6: a.insert(0, x)
...
>>> a
['defenestrate', 'cat', 'window', 'defenestrate']
===== 4.3. The range() Function =====
If you do need to iterate over a sequence of numbers, the built-in function __range() __comes in handy. It generates lists containing arithmetic progressions:
>>>
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The given end point is never part of the generated list; range(10) generates a list of 10 values, the legal indices for items of a sequence of length 10. It is possible to let the range start at another number, or to specify a different increment (even negative; sometimes this is called the step):
>>>
>>> range(5, 10) #第二个参数不会包含在结果序列中。
[5, 6, 7, 8, 9]
>>> range(0, 10, 3)
[0, 3, 6, 9]
>>> range(-10, -100, -30)
[-10, -40, -70]
To iterate over the indices of a sequence, you can combine range() and len() as follows:
>>>
>>> a = ['Mary', 'had', 'a', 'little', 'lamb']
>>> for i in range(len(a)):
... print i, a[i]
...
0 Mary
1 had
2 a
3 little
4 lamb
In most such cases, however, it is convenient to use the __enumerate()__ function, see Looping Techniques.
===== 4.4. break and continue Statements, and else Clauses on Loops =====
* The break statement, like in C, breaks out of the** smallest enclosing** for or while loop.
* The continue statement, also borrowed from C, continues with the next iteration of the loop.
Loop statements may have an __else clause__; it is executed when the loop terminates through__ exhaustion of the list__ (with for) or when the condition becomes false (with while), but **not when the loop is terminated by a break statement**. This is exemplified by the following loop, which searches for prime numbers:
>>>
>>> for n in range(2, 10):
... for x in range(2, n):
... if n % x == 0:
... print n, 'equals', x, '*', n/x
... break
... else: #注意python的loop循环可以使用一个else语句块该else不属于上面的if这可以**通过缩进来表示**。
... # loop fell through without finding a factor
... print n, 'is a prime number'
...
2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3
(Yes, this is the correct code. Look closely: the else clause belongs to the for loop, not the if statement.)
===== 4.5. pass Statements =====
The pass statement does nothing. It can be used when a statement is __required syntactically__ but the program requires no action. For example:
>>>
>>> while **True:**
... pass # Busy-wait for keyboard interrupt (Ctrl+C)
...
This is commonly used for creating minimal classes:
>>>
>>> class MyEmptyClass:
... pass
...
Another place pass can be used is as a__ place-holder for a function or conditional body__ when you are working on new code, allowing you to keep thinking at a more abstract level. The pass is silently ignored:
>>>
>>> def initlog(*args):
... pass # Remember to implement this!
...
===== 4.6. Defining Functions =====
We can create a function that writes the Fibonacci series to an arbitrary boundary:
>>>
>>> def fib(n): # write Fibonacci series up to n
... """Print a Fibonacci series up to n."""
... a, b = 0, 1
... while a < n:
... print __a,__
... a, b = b, a+b #右边的表达式**从左向右**执行后复制给左边。
...
>>> # Now call the function we just defined:
... fib(2000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
The keyword** def** introduces a function definition. It must be followed by the function name and the **parenthesized list of formal parameters**. The statements that form the body of the function start at the next line, and must be indented.
The first statement of the function body can optionally be **a string literal**; this string literal is the functions __documentation string__, or__ docstring__. (More about docstrings can be found in the section Documentation Strings.) There are tools which use docstrings to automatically produce** online or printed documentation**, or to let the user interactively browse through code; its good practice to include docstrings in code that you write, so make a habit of it.
__The execution of a function introduces a new symbol table used for the local variables of the function.More precisely, all variable assignments in a function store the value in the local symbol table; whereas variable references first look in the local symbol table, then in the local symbol tables of enclosing functions, then in the global symbol table, and finally in the table of built-in names. Thus, global variables cannot be directly assigned a value within a function (unless named in a global statement), although they may be referenced.__
python中的函数可以嵌套定义每个层次的函数都有**自己的符号表**。
The actual parameters (arguments) to a function call are introduced in the **local symbol table **of the called function when it is called; thus, __arguments are passed using call by value__ (where the value is __always an object reference__, not the value of the object). [1] When a function calls another function, a new local symbol table is created for that call.
函数的参数只能是值传递值总是一个__对象引用__。
__A function definition introduces the function name in the current symbol table__. The value of the function name has a type that is recognized by the interpreter as a user-defined function. This value can be assigned to another name which can then also be used as a function. __This serves as a general renaming mechanism__:
python中的所有对象都是一个类型所有类型也是一个对象。
>>>
>>> fib
<function fib at 10042ed0> #fib引用的是一个函数对象其在内存中的地址为10042ed0该值是对象的标示(identifer)
>>> f = fib
>>> f(100)
0 1 1 2 3 5 8 13 21 34 55 89
Coming from other languages, you might object that **fib is not a function** but a procedure since it doesnt return a value. In fact, even functions without a return statement do return a value, albeit a rather boring one. This value is called __None__ (its a built-in name). Writing the value None is normally suppressed by the interpreter if it would be the only value written. You can see it if you really want to using print:
没有使用return语句返回值的函数其返回的值为None。
>>>
>>> fib(0)
>>> print fib(0)
None
It is simple to write a function that returns a list of the numbers of the Fibonacci series, instead of printing it:
>>>
>>> def fib2(n): # return Fibonacci series up to n
... """Return a list containing the Fibonacci series up to n."""
... result = []
... a, b = 0, 1
... while a < n:
... result.append(a) # see below
... a, b = b, a+b
... **return result**
...
>>> **f100** = fib2(100) # call it
>>> f100 # write the result
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
This example, as usual, demonstrates some new Python features:
* The return statement returns with a value from a function. __return without an expression argument returns None__. Falling off the end of a function also returns __None__.
* The statement result.append(a) calls a method of the list object result. __A method__ is a function that belongs to an object and is named obj.methodname, where obj is some object (this may be an expression), and methodname is the name of a method that is defined by the objects type. __Different types define different methods__. Methods of different types may have the same name without causing ambiguity. (It is possible to define your own object types and methods, using classes, see Classes) The method append() shown in the example is defined for list objects; it adds a new element at the end of the list. In this example it is equivalent to result = result + [a], but more efficient.
===== 4.7. More on Defining Functions =====
It is also possible to define functions with **a variable number** of arguments. There are three forms, which can be combined.
==== 4.7.1. Default Argument Values ====
The most useful form is to specify __a default value __for one or more arguments. This creates a function that can be called with fewer arguments than it is defined to allow. For example:
def ask_ok(prompt, retries=4, complaint='Yes or no, please!'):
while __True__:
ok = __raw_input__(prompt)
__ if ok in ('y', 'ye', 'yes'):__
return __True__
if ok in ('n', 'no', 'nop', 'nope'):
return False
retries = retries - 1
if retries < 0:
__ raise IOError('refusenik user')__
print complaint
This function can be called in several ways:
* giving only the__ mandatory__ argument: ask_ok('Do you really want to quit?')
* giving one of the optional arguments: ask_ok('OK to overwrite the file?', 2)
* or even giving all arguments: ask_ok('OK to overwrite the file?', 2, 'Come on, only yes or no!')
This example also introduces the__ in__ keyword. This tests whether or not a sequence contains a certain value.
__The default values are evaluated at the point of function definition in the defining scope__, so that
i = 5
def f(arg=i): #函数参数的__初始值在函数定义时确定解释器会一直保存这个状态信息__。
print arg
i = 6
f()
will print 5.
Important warning: __The default value is evaluated only once__. This makes a difference when the default is** a mutable object** such as a list, dictionary, or instances of most classes. For example, the following function** accumulates** the arguments passed to it on subsequent calls:
def f(a, L=[]): #L的缺省值在定义时确定这里为一个可变对象------空列表Python会一直保存这个定义信息。
L.append(a)
return L
print f(1) #缺省值只会在函数第一次调用时被求值一次而且__会在以后的多次调用过程中传递__。
print f(2)
print f(3)
This will print
[1]
[1, 2]
[1, 2, 3]
If you dont want** the default to be shared between subsequent call**s, you can write the function like this instead:
def f(a, __L=None__): #这里的None标示L值未定义。
if L is None:
L = []
L.append(a)
return L
==== 4.7.2. Keyword Arguments ====
Functions can also be called using keyword arguments of the form** kwarg=value**. For instance, the following function:
def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'):
print "-- This parrot wouldn't", action,
print "if you put", voltage, "volts through it."
print "-- Lovely plumage, the", type
print "-- It's", state, "!"
accepts** one required argument **(voltage) and **three optional arguments** (state, action, and type). This function can be called in any of the following ways:
parrot(1000) # 1 positional argument
parrot(voltage=1000) # 1 keyword argument
parrot(voltage=1000000, action='VOOOOOM') # 2 keyword arguments
parrot(action='VOOOOOM', voltage=1000000) # 2 keyword arguments
parrot('a million', 'bereft of life', 'jump') # 3 positional arguments
parrot('a thousand', state='pushing up the daisies') # 1 positional, 1 keyword
but all the following calls would be invalid:
parrot() # required argument missing
parrot(voltage=5.0, 'dead') # __non-keyword argument after a keyword argument__
parrot(110, voltage=220) # **duplicate** value for the same argument
parrot(actor='John Cleese') # unknown keyword argument
**关键字参数必须放在位置参数之后。**
In a function call, __keyword arguments must follow positional arguments__. All the keyword arguments passed must match one of the arguments accepted by the function (e.g. actor is not a valid argument for the parrot function), and **their order is not important**. This also includes non-optional arguments (e.g. parrot(voltage=1000) is valid too). No argument may receive a value more than once. Heres an example that fails due to this restriction:
>>>
>>> def function(a):
... pass
...
>>> function(0, a=0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
**TypeError**: function() got multiple values for keyword argument 'a'
When a__ final __formal parameter of the form__ **name__ is present, it receives **a dictionary** (see Mapping Types — dict) containing all keyword arguments except for those corresponding to a formal parameter. This may be combined with a formal parameter of the form__ *name__ (described in the next subsection) which receives** a tuple **containing the positional arguments beyond the formal parameter list. (*name __must occur before__ **name.) For example, if we define a function like this:
def cheeseshop(kind, *arguments, **keywords):
print "-- Do you have any", kind, "?"
print "-- I'm sorry, we're all out of", kind
f**or arg in arguments**:
print arg
print "-" * 40
keys =** sorted**(__keywords.keys()__)
for kw in keys:
print kw, ":", keywords[kw]
It could be called like this:
cheeseshop("Limburger", "It's very runny, sir.",
"It's really very, VERY runny, sir.",
shopkeeper='Michael Palin',
client="John Cleese",
sketch="Cheese Shop Sketch")
and of course it would print:
-- Do you have any Limburger ?
-- I'm sorry, we're all out of Limburger
It's very runny, sir.
It's really very, VERY runny, sir.
----------------------------------------
client : John Cleese
shopkeeper : Michael Palin
sketch : Cheese Shop Sketch
Note that the list of keyword argument names is created by sorting the result of the keywords dictionarys keys() method before printing its contents; if this is not done, the order in which the arguments are printed is undefined.
==== 4.7.3. Arbitrary Argument Lists ====
Finally, the least frequently used option is to specify that a function can be called with an arbitrary number of arguments. These arguments will be wrapped up** in a tuple** (see Tuples and Sequences). Before the variable number of arguments, zero or more normal arguments may occur.
def write_multiple_items(file, separator, *args):
file.write(__separator.join(args)__)
==== 4.7.4. Unpacking Argument Lists ====
The reverse situation occurs when the arguments are **already in a list or tuple but need to be unpacked for a function call** requiring separate positional arguments. For instance, the built-in range() function expects separate start and stop arguments. If they are not available separately, write the function call with the__ *-operator to unpack__ the arguments out of a list or tuple:
>>>
>>> range(3, 6) # normal call with separate arguments
[3, 4, 5]
>>> args = [3, 6]
>>> range(*args) # call with arguments **unpacked from a list**
[3, 4, 5]
In the same fashion, **dictionaries can deliver keyword arguments with the **-operator**:
>>>
>>> def parrot(voltage, state='a stiff', action='voom'):
... print "-- This parrot wouldn't", action,
... print "if you put", voltage, "volts through it.",
... print "E's", state, "!"
...
>>> d = {"voltage": "four million", "state": "bleedin' demised", "action": "VOOM"}
>>> parrot(__**d__)
-- This parrot wouldn't VOOM if you put four million volts through it. E's bleedin' demised !
==== 4.7.5. Lambda Forms ====
By popular demand, a few features commonly found in__ functional programming__ languages like Lisp have been added to Python. With the lambda keyword, **small anonymous functions** can be created. Heres a function that returns the sum of its two arguments:** lambda a, b: a+b**. Lambda forms can be used wherever **function objects are required**. They are syntactically restricted to __a single expression__. Semantically, they are just syntactic sugar for a normal function definition. Like nested function definitions, lambda forms can reference variables from the containing scope:
>>>
>>> def make_incrementor(n):
... return lambda x: x + n
...
>>> f = make_incrementor(42)
>>> f(0)
42
>>> f(1)
43
==== 4.7.6. Documentation Strings ====
There are emerging conventions about the** content and formatting **of documentation strings.
The first line __should always be a short, concise summary of the objects purpose__. For brevity, it should not explicitly state the objects name or type, since these are available by other means (except if the name happens to be a verb describing a functions operation). This line should begin with a capital letter and end with a period.
If there are more lines in the documentation string,__ the second line should be blank__, visually separating the __summary__ from the rest of the __description__. The following lines should be one or more paragraphs describing the objects** calling conventions**, its side effects, etc.
The Python parser does** not strip** indentation from multi-line string literals in Python, so tools that process documentation have to strip indentation if desired. This is done using the following convention. The **first non-blank line after the first line** of the string determines the amount of indentation for the entire documentation string. (We cant use the first line since it is generally adjacent to the strings opening quotes so its indentation is not apparent in the string literal.) Whitespace “equivalent” to this indentation is then stripped from the start of all lines of the string. **Lines that are indented less should not occur**, but if they occur all their leading whitespace should be stripped. Equivalence of whitespace should be tested after expansion of tabs (to 8 spaces, normally).
Here is an example of a multi-line docstring:
>>>
>>> def my_function():
... """Do nothing, but document it.
...
... No, really, it doesn't do anything.
... """
... pass
...
>>> print my_function.____doc____
Do nothing, but document it.
No, really, it doesn't do anything.
===== 4.8. Intermezzo: Coding Style =====
Now that you are about to write longer, more complex pieces of Python, it is a good time to talk about coding style. Most languages can be written (or more concise, formatted) in different styles; some are more readable than others. Making it easy for others to read your code is always a good idea, and adopting a nice coding style helps tremendously for that.
For Python, PEP 8 has emerged as the style guide that most projects adhere to; it promotes a very readable and eye-pleasing coding style. Every Python developer should read it at some point; here are the most important points extracted for you:
* Use 4-space indentation, and no tabs.
4 spaces are a good compromise between small indentation (allows greater nesting depth) and large indentation (easier to read). Tabs introduce confusion, and are best left out.
* Wrap lines so that they dont exceed__ 79__ characters.
This helps users with small displays and makes it possible to have several code files side-by-side on larger displays.
* Use blank lines to__ separate __functions and classes, and larger blocks of code inside functions.
* When possible, put comments on a line of __their own__.
* Use__ docstrings__.
* Use spaces around operators and after commas, but **not directly inside bracketing constructs**: a = f(1, 2) + g(3, 4).
* Name your classes and functions consistently; the convention is to use__ CamelCase for classes and lower_case_with_underscores for functions and methods.__ Always use self as the name for the first method argument (see A First Look at Classes for more on classes and methods).
* Dont use fancy encodings if your code is meant to be used in international environments. Plain ASCII works best in any case.
Footnotes
[1] Actually, call by object reference would be a better description, since if a mutable object is passed, the caller will see any changes the callee makes to it (items inserted into a list).

View File

@@ -0,0 +1,523 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-03T21:14:26+08:00
====== se5. Data Structures ======
Created Tuesday 03 January 2012
http://docs.python.org/tutorial/datastructures.html
This chapter describes some things youve learned about already in more detail, and adds some new things as well.
===== 5.1. More on Lists =====
The list data type has some more methods. Here are all of the methods of list objects:
list.append(x)
Add an item to the end of the list; equivalent to a[len(a):] = [x].
list.extend(L)
Extend the list by appending all the items in the given list; equivalent to a[len(a):] = L.
list.insert(i, x)
Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x).
**list.remove(x)**
Remove the __first item__ from the list whose value is x. It is **an error** if there is no such item.
list.pop([i])
**Remove** the item at the given position in the list, and return it. If no index is specified, a.pop() removes and returns the// last item// in the list. (The square brackets around the i in the method signature denote that the parameter is optional, not that you should type square brackets at that position. You will see this notation frequently in the Python Library Reference.)
list.index(x)
Return the index in the list of the first item whose value is x. It is __an error__ if there is no such item.
list.count(x)
Return the number of times x appears in the list.
list.sort()
Sort the items of the list,__ in place.__
list.reverse()
Reverse the elements of the list, in place.
An example that uses most of the list methods:
>>>
>>> a = [66.25, 333, 333, 1, 1234.5]
>>> print a.count(333), a.count(66.25), a.count('x')
2 1 0
>>> a.insert(2, -1)
>>> a.append(333)
>>> a
[66.25, 333, -1, 333, 1, 1234.5, 333]
>>> a.index(333)
1
__>>> a.remove(333)__
>>> a
[66.25, -1, 333, 1, 1234.5, 333]
>>> a.reverse()
>>> a
[333, 1234.5, 1, 333, -1, 66.25]
>>> a.sort()
>>> a
[-1, 1, 66.25, 333, 333, 1234.5]
==== 5.1.1. Using Lists as Stacks ====
The list methods make it very easy to use a list as a stack, where the last element added is the first element retrieved (“last-in, first-out”). To add an item to the top of the stack, use append(). To retrieve an item from the top of the stack, use pop() without an explicit index. For example:
>>>
>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
>>> stack.pop()
6
>>> stack.pop()
5
>>> stack
[3, 4]
==== 5.1.2. Using Lists as Queues ====
It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are __not efficient __for this purpose. While appends and pops from the** end** of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be **shifted** by one).
To implement a queue, use collections.deque which was designed to have fast appends and pops from both ends. For example:
>>>
>>> __from collections import deque__
>>> queue = deque(["Eric", "John", "Michael"])
>>> queue.append("Terry") # Terry arrives
>>> queue.append("Graham") # Graham arrives
>>> queue.popleft() # The first to arrive now leaves
'Eric'
>>> queue.popleft() # The second to arrive now leaves
'John'
>>> queue # Remaining queue in order of arrival
deque(['Michael', 'Terry', 'Graham'])
==== 5.1.3. Functional Programming Tools ====
There are three built-in functions that are very useful when used with lists:__ filter(), map(), and reduce()__.
filter(function, sequence) r__eturns a sequence consisting of those items from the sequence for which function(item) is true__. If sequence is a string or tuple, the result will be of the same type; otherwise, it is **always a list.** For example, to compute a sequence of numbers not divisible by 2 and 3:
>>>
>>> def f(x): return x % 2 != 0 and x % 3 != 0
...
>>> filter(f, range(2, 25))
[5, 7, 11, 13, 17, 19, 23]
map(function, sequence) __calls function(item) for each of the sequences items and returns a list of the return values__. For example, to compute some cubes:
>>>
>>> def cube(x): return x*x*x
...
>>> map(cube, range(1, 11))
[1, 8, 27, 64, 125, 216, 343, 512, 729, 1000]
More than one sequence may be passed; __the function must then have as many arguments as there are sequences __and is called with the corresponding item from each sequence (or None if some sequence is shorter than another). For example:
>>>
>>> seq = range(8)
>>> def add(x, y): return x+y
...
>>> map(add, seq, seq)
[0, 2, 4, 6, 8, 10, 12, 14]
reduce(function, sequence)__ returns a single value constructed by calling the binary function function on the first two items of the sequence__, then on the result and the next item, and so on. For example, to compute the sum of the numbers 1 through 10:
>>>
>>> def add(x,y): return x+y
...
>>> reduce(add, range(1, 11))
55
If theres only one item in the sequence, its value is returned; if the sequence is empty, an exception is raised.
A third argument can be passed to indicate the **starting value**. In this case the starting value is returned for an empty sequence, and the function is first applied to the starting value and the first sequence item, then to the result and the next item, and so on. For example,
>>>
>>> def sum(seq):
... def add(x,y): return x+y
... return reduce(add, seq, 0)
...
>>> sum(range(1, 11))
55
>>> sum([])
0
Dont use this examples definition of sum(): since summing numbers is such a common need, a built-in function __sum(sequence)__ is already provided, and works exactly like this.
New in version 2.3.
==== 5.1.4. List Comprehensions ====
List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that __satisfy a certain condition__.
For example, assume we want to create a list of squares, like:
>>>
>>> squares = []
>>> for x in range(10):
... squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We can obtain the same result with:
squares = [x**2 for x in range(10)]
This is also equivalent to __squares = map(lambda x: x**2, range(10))__, but its more concise and readable.
A list comprehension consists of brackets containing an expression followed by a__ for__ clause, then zero or more__ for or if __clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:
>>>
>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y] #注意这两个for的__嵌套关系__。
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
and its equivalent to:
>>>
>>> combs = []
>>> for x in [1,2,3]:
... for y in [3,1,4]:
... if x != y:
... combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
Note how the order of the for and if statements is the same in both these snippets.
If the expression is a tuple (e.g. the (x, y) in the previous example), it must be parenthesized.
>>>
>>> vec = [-4, -2, 0, 2, 4]
>>> # create a new list with the values doubled
>>> [x*2 for x in vec]
[-8, -4, 0, 4, 8]
>>> #__ filter the list__ to exclude negative numbers
>>> [x for x in vec if x >= 0]
[0, 2, 4]
>>> #** apply a function** to all the elements
>>> [abs(x) for x in vec]
[4, 2, 0, 2, 4]
>>> # call a method on each element
>>> freshfruit = [' banana', ' loganberry ', 'passion fruit ']
>>> [**weapon.strip()** for weapon in freshfruit]
['banana', 'loganberry', 'passion fruit']
>>> # create** a list of 2-tuples** like (number, square)
>>> [(x, x**2) for x in range(6)]
[(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]
>>> #__ the tuple must be parenthesized__, otherwise an error is raised
>>> [x, x**2 for x in range(6)]
File "<stdin>", line 1
[x, x**2 for x in range(6)]
^
SyntaxError: invalid syntax
>>> # flatten a list using a listcomp with two 'for'
>>> vec =[ [ 1,2,3], [4,5,6], [7,8,9] ]
>>> [num for elem in vec for num in elem]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
List comprehensions can contain **complex expressions and nested functions**:
>>>
>>> from math import pi
>>> [__str(round(pi, i))__ for i in range(1, 6)]
['3.1', '3.14', '3.142', '3.1416', '3.14159']
==== 5.1.4.1. Nested List Comprehensions ====
The initial expression in a list comprehension can be any arbitrary expression, including another list comprehension.
Consider the following example of a 3x4 matrix implemented as a list of 3 lists of length 4:
>>>
>>> matrix = [
... [1, 2, 3, 4],
... [5, 6, 7, 8],
... [9, 10, 11, 12],
... ]
The following list comprehension will **transpose rows and columns**:
>>>
>>> [[row[i] for row in matrix] for i in range(4)]
[ [1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12] ]
As we saw in the previous section, __the nested listcomp is evaluated in the context of the for that follows it__, so this example is equivalent to:
>>>
>>> transposed = []
>>> for i in range(4):
... transposed.append([row[i] for row in matrix])
...
>>> transposed
[ [1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12] ]
which, in turn, is the same as:
>>>
>>> transposed = []
>>> for i in range(4):
... # the following 3 lines implement the nested listcomp
... transposed_row = []
... for row in matrix:
... transposed_row.**append**(row[i])
... transposed.append(transposed_row)
...
>>> transposed
[ [1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12] ]
In the real world, you __should prefer built-in functions to complex flow statements__. The zip() function would do a great job for this use case:
>>>
>>>__ zip(*matrix)__
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]
See Unpacking Argument Lists for details on the asterisk in this line.
===== 5.2. The del statement =====
There is a way to remove an item from a list given__ its index__ instead of its value: the **del** statement. This differs from the **pop()** method which returns a value. The del statement can also be used to __remove slices__ from a list or clear the entire list (which we did earlier by assignment of an empty list to the slice). For example:
>>>
>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> __del a[:]__
>>> a
[]
del can also be used to__ delete entire variables__:
>>>
>>> del a
Referencing the name a hereafter is an error (at least until another value is assigned to it). Well find other uses for del later.
===== 5.3. Tuples and Sequences =====
We saw that lists and strings have many** common properties**, such as indexing and slicing operations. They are two examples of__ sequence data types__ (see Sequence Types —__ str, unicode, list, tuple, bytearray, buffer, xrange__). Since Python is an evolving language, other sequence data types may be added. There is also another standard sequence data type: the tuple.
A tuple consists of a number of values **separated by commas**, for instance:
>>>
>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345
>>> t
(12345, 54321, 'hello!')
>>> # Tuples may be nested:
... u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
As you see, on__ output tuples are always enclosed in parentheses__, so that nested tuples are interpreted correctly; they may be input with or without surrounding parentheses, although often parentheses are necessary anyway (if the tuple is part of a larger expression).
Tuples have many uses. For example: (x, y) coordinate pairs, employee records from a database, etc. __Tuples, like strings, are immutable__: it is not possible to assign to the individual items of a tuple (you can simulate much of the same effect with slicing and concatenation, though). It is also possible to create tuples which contain mutable objects, such as lists.
A special problem is the construction of tuples containing 0 or 1 items: the syntax has some extra quirks to accommodate these. Empty tuples are constructed by an empty pair of parentheses; a tuple __with one item__ is constructed by__ following a value with a comma__ (it is not sufficient to enclose a single value in parentheses). Ugly, but effective. For example:
>>>
>>> empty = __()__
>>> singleton = __'hello', __ # <-- note trailing comma
>>> len(empty)
0
>>> len(singleton)
1
>>> singleton
('hello',)
注意singleton = ('aa')是不对的,需是 singleton = ('aa', )
The statement t = 12345, 54321, 'hello!' is an example of __tuple packing__: the values 12345, 54321 and 'hello!' are packed together in a tuple. The reverse operation is also possible:
>>>
>>> x, y, z = t
This is called, appropriately enough, __sequence unpacking__ and __works for any sequence__ on the right-hand side. Sequence unpacking requires the list of variables on the left to have__ the same number __of elements as the length of the sequence. Note that multiple assignment is really just a combination of tuple packing and sequence unpacking.
===== 5.4. Sets =====
Python also includes a data type for sets. A set is an __unordered collection with no duplicate elements__. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like** union, intersection, difference, and symmetric difference**.
Here is a brief demonstration:
>>>
>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> fruit = __set__(basket) # create a set without duplicates
>>> fruit
set(['orange', 'pear', 'apple', 'banana'])
>>> 'orange' in fruit # fast membership testing
True
>>> 'crabgrass' in fruit
False
>>> # Demonstrate** set operations **on unique letters from two words
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a # unique letters in a
set(['a', 'r', 'b', 'c', 'd'])
>>> a__ - __b # letters in a but not in b
set(['r', 'd', 'b'])
>>> a__ | __b # letters in either a or b
set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
>>> a__ &__ b # letters in both a and b
set(['a', 'c'])
>>> a__ ^ __b # letters in a or b but not both
set(['r', 'd', 'b', 'm', 'z', 'l'])
===== 5.5. Dictionaries =====
Another useful data type built into Python is the dictionary (see __Mapping Types — dict__). Dictionaries are sometimes found in other languages as “associative memories” or **“associative arrays”**. Unlike sequences, which are** indexed **by a range of numbers, dictionaries are __indexed by keys,__ which can be __any immutable__ type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You cant use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().
It is best to think of a dictionary as __an unordered set of key: value pairs,__ with the requirement that the** keys are unique **(within one dictionary). A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of **key:value** pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output.
The main operations on a dictionary are** storing** a value with some key and** extracting** the value given the key. It is also possible to **delete **a key:value pair with del. If you store using a key that is already in use, the old value associated with that key is forgotten. It is an **error** to extract a value using a non-existent key.
The keys() method of a dictionary object returns a __list __of all the keys used in the dictionary,** in arbitrary order** (if you want it sorted, just apply the sorted() function to it). To check whether a single key is in the dictionary, use the__ in__ keyword.
Here is a small example using a dictionary:
>>>
>>> tel = {'jack': 4098, 'sape': 4139}
>>>** tel['guido'] = 4127**
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>>__ del __tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> tel.**keys()**
['guido', 'irv', 'jack']
>>> 'guido' __in__ tel
True
The dict() constructor builds dictionaries directly from **lists of key-value pairs** stored as tuples. When the__ pairs form a pattern__, list comprehensions can compactly specify the key-value list.
>>>
>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'jack': 4098, 'guido': 4127}
>>> __dict([(x, x**2) for x in (2, 4, 6)])__ # use a list comprehension
{2: 4, 4: 16, 6: 36}
Later in the tutorial, we will learn about __Generator Expressions__ which are even better suited for the task of supplying key-values pairs to the dict() constructor.
When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments:
>>>
>>> dict(sape=4139, guido=4127, jack=4098)
{'sape': 4139, 'jack': 4098, 'guido': 4127}
===== 5.6. Looping Techniques =====
When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the __iteritems() __method.
>>>
>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
>>> for k, v in knights.__iteritems()__:
... print k, v
...
gallahad the pure
robin the brave
When looping through a sequence, the position index and corresponding value can be retrieved at the same time using the **enumerate() **function.
>>>
>>> for i, v in__ enumerate(['tic', 'tac', 'toe'])__:
... print i, v
...
0 tic
1 tac
2 toe
To loop over two or more sequences at the same time, the entries can__ be paired with the zip()__ function.
>>>
>>> questions = ['name', 'quest', 'favorite color']
>>> answers = ['lancelot', 'the holy grail', 'blue']
>>> for q, a in__ zip__(questions, answers):
... print 'What is your {0}? It is {1}.'.__format__(q, a)
...
What is your name? It is lancelot.
What is your quest? It is the holy grail.
What is your favorite color? It is blue.
To loop over a sequence in reverse, first specify the sequence in a forward direction and then call the __reversed() __function.
>>>
>>> for i in** reversed**(xrange(1,10,2)):
... print i
...
9
7
5
3
1
To loop over a sequence in sorted order, use the __sorted()__ function which returns **a new **sorted list while leaving the source unaltered.
>>>
>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> for f in sorted(set(basket)):
... print f
...
apple
banana
orange
pear
===== 5.7. More on Conditions =====
The conditions used in while and if statements can__ contain any operators, not just comparisons__.
The comparison operators__ in __and __not in__ check whether a value occurs (does not occur) in a sequence. The operators__ is and is not__ compare whether two objects are really **the same object**; this only matters for mutable objects like lists. All comparison operators have __the same priority__, which is lower than that of all numerical operators.
Comparisons can be chained. For example, __a < b == c__ tests whether a is less than b and moreover b equals c.
Comparisons may be combined using the **Boolean operators**__ and __and__ or, __and the outcome of a comparison (or of any other Boolean expression) may be negated with __not__. These have lower priorities than comparison operators; between them, **not has the highest priority and or the lowest**, so that A and not B or C is equivalent to (A and (not B)) or C. As always, parentheses can be used to express the desired composition.
优先级:算术运算符>关系运算符>逻辑运算符。关系运算符的优先级相同从左到右结合。逻辑运算符中的not优先级最高or优先级最低。
The Boolean operators __and__ and __or __are so-called __short-circuit operators__: their arguments are evaluated from left to right, and evaluation stops as soon as the outcome is determined. For example, if A and C are true but B is false, A and B and C does not evaluate the expression C. When used as a general value and not as a Boolean, the return value of a short-circuit operator is the** last evaluated argument**.
逻辑运算符返回的结果为__最后一个执行的表达式__而不一定是True或False.
It is possible to assign the result of a comparison or other Boolean expression to a variable. For example,
>>>
>>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
>>> non_null = string1 or string2 or string3 #返回结果为最后一个表达式。
>>> non_null
'Trondheim'
Note that in Python, unlike C, assignment cannot occur inside expressions. C programmers may grumble about this, but it avoids a common class of problems encountered in C programs: typing = in an expression when == was intended.
===== 5.8. Comparing Sequences and Other Types =====
Sequence objects may be compared to other objects with __the same sequence type__. The comparison uses__ lexicographical ordering__: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are** themselves sequences of the same type**, the lexicographical comparison is carried out__ recursively__. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, __the shorter sequence is the smaller__ (lesser) one. Lexicographical ordering for strings uses the **ASCII ordering** for individual characters. Some examples of comparisons between sequences of the same type:
(1, 2, 3) < (1, 2, 4)
[1, 2, 3] < [1, 2, 4]
'ABC' < 'C' < 'Pascal' < 'Python'
(1, 2, 3, 4) < (1, 2, 4)
(1, 2) < (1, 2, -1)
(1, 2, 3) == (1.0, 2.0, 3.0)
(1, 2, ('aa', 'ab')) < (1, 2, ('abc', 'a'), 4)
Note that **comparing objects of different types is legal**. The outcome is deterministic but __arbitrary__: **the types are ordered by their name**. Thus, a list is always smaller than a string, a string is always smaller than a tuple, etc. [1] Mixed numeric types are compared according to their __numeric value, so 0 equals 0.0__, etc.
Footnotes
[1] The rules for comparing objects of different types should not be relied upon; they may change in a future version of the language.

View File

@@ -0,0 +1,355 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-04T12:43:00+08:00
====== 6. Modules ======
Created Wednesday 04 January 2012
If you quit from the Python interpreter and enter it again, the** definitions** you have made (functions and variables) are lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as __creating a script__. As your program gets longer, you may want to __split it into several files__ for easier maintenance. You may also want to use a __handy function__ that youve written in several programs without copying its definition into each program.
To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such __a file is called a module__; definitions from a module can be imported into other modules or into the __main module__ (the collection of variables that you have access to in a script executed at __the top level__ and in calculator mode).
__A module is a file containing Python definitions and statements.__ The file name is the module name with the suffix **.py** appended. Within a module, the modules name (as a string) is available as the value of the global variable** __name__**. For instance, use your favorite text editor to create a file called fibo.py in the current directory with the following contents:
# Fibonacci numbers module
def fib(n): # write Fibonacci series up to n
a, b = 0, 1
while b < n:
print b,
a, b = b, a+b
def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while b < n:
result.append(b)
a, b = b, a+b
return result
Now enter the Python interpreter and import this module with the following command:
>>>
>>> import fibo
This does not enter the names of the functions defined in **fibo** directly in the __current symbol table__; it only enters the module name fibo there. Using the module name you can access the functions:
>>>
>>> fibo.fib(1000)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> fibo.fib2(100)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
>>> fibo.**__name__**
'fibo'
If you intend to use a function often you can assign it to __a local name__:
>>>
>>> fib = fibo.fib
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377
===== 6.1. More on Modules =====
A module can contain executable statements as well as function definitions. These statements are intended to __initialize the module__. They are executed only the __first time__ the module is imported somewhere. [1]
Each module has__ its own private symbol table__, which is used as the __global symbol table by all functions defined in the module__. Thus, the author of a module can **use global variables in the module without worrying about accidental clashes with a users global variables.** On the other hand, if you know what you are doing you can touch a modules global variables with the same notation used to refer to its functions, modname.itemname.
Modules can import other modules. It is customary but not required to** place all import statements at the beginning of a module **(or script, for that matter). The imported module names are placed in the importing modules __global symbol table__.
There is a variant of the import statement that imports names from a module __directly __into the importing modules symbol table. For example:
>>>
>>> from fibo import fib, fib2
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377
This does __not introduce the module name __from which the imports are taken in the local symbol table (so in the example, fibo is not defined).
There is even a variant to import all names that a module defines:
>>>
>>> from fibo import *
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377
This imports all names except those **beginning with an underscore **(_).
Note that in general the practice of importing * from a module or package is __frowned upon__, since it often causes poorly readable code. However, it is okay to use it to save typing in interactive sessions.
Note
For efficiency reasons, __each module is only imported once per interpreter session__. Therefore, if you change your modules, you must restart the interpreter or, if its just one module you want to test interactively, use reload(), e.g. __reload(modulename)__.
==== 6.1.1. Executing modules as scripts ====
When you run a Python module with
python** fibo.py** <arguments>
__the code in the module will be executed__, just as if you imported it, but with the ____name__ set to "__main__"__. That means that by adding this code at the end of your module:
if** __name__ == "__main__"**:
import sys
fib(int(sys.argv[1]))
只有在python__命令行__上将该模块当作脚本来执行时__name__ 变量的值才为 "__main__".
you can make the file usable as a script as well as an importable module, because the code that parses the command line only runs if the module is executed as the “main” file:
$ python fibo.py 50
1 1 2 3 5 8 13 21 34
__If the module is imported, the code is not run__:
>>>
>>> import fibo
>>>
This is often used either to provide a convenient user interface to a module, or for testing purposes (running the module as a script executes a test suite).
==== 6.1.2. The Module Search Path ====
When a module named spam is imported,__ the interpreter searches for a file named spam.py__ in the directory containing the input script and then in the list of directories specified by the environment variable__ PYTHONPATH__. This has the same syntax as the shell variable PATH, that is, **a list of directory names**. When PYTHONPATH is not set, or when the file is not found there, the search continues in an installation-dependent default path; on Unix, this is usually .:/usr/local/lib/python.
Actually, modules are searched in the list of directories given by the variable__ sys.path__ which is initialized from the directory containing the input script (or the current directory), PYTHONPATH and the installation- dependent default. This allows Python programs that know what theyre doing to modify or replace the module search path. Note that because the directory containing the script being run is on the search path,__ it is important that the script not have the same name as a standard module__, or Python will attempt to load the script as a module when that module is imported. This will generally be an error. See section Standard Modules for more information.
==== 6.1.3. “Compiled” Python files ====
As an important speed-up of the start-up time for short programs that use a lot of standard modules, if a file called__ spam.pyc __exists in the directory where spam.py is found, this is assumed to contain an already-“__byte-compiled__” version of the module spam.
The **modification time** of the version of spam.py used to create spam.pyc is recorded in spam.pyc, and the .pyc file is ignored if these dont match.
Normally, you dont need to do anything to create the spam.pyc file. Whenever spam.py is successfully compiled, an attempt is made to write the compiled version to spam.pyc. It is not an error if this attempt fails; if for any reason the file is not written completely, the resulting spam.pyc file will be recognized as invalid and thus ignored later. The contents of the spam.pyc file are platform independent, so __a Python module directory can be shared__ by machines of different architectures.
Some tips for experts:
When the Python interpreter is invoked with the__ -O __flag, optimized code is generated and stored in__ .pyo__ files. The optimizer currently doesnt help much; it only removes __assert statements__. When -O is used, all bytecode is optimized; .pyc files are ignored and .py files are compiled to optimized bytecode.
Passing two -O flags to the Python interpreter (__-OO__) will cause the bytecode compiler to perform optimizations that could in some rare cases result in malfunctioning programs. Currently **only __doc__ strings are removed from the bytecode**, resulting in more compact .pyo files. Since some programs may rely on having these available, you should only use this option if you know what youre doing.
A program __doesnt run any faster__ when it is read from a .pyc or .pyo file than when it is read from a .py file; the only thing thats faster about .pyc or .pyo files is the speed with which they are__ loaded__.
When a script is run by giving its name on the command line, the bytecode for the script is never written to a .pyc or .pyo file. Thus, the startup time of a script may be reduced by moving most of its code to a module and having __a small bootstrap script__ that imports that module. It is also possible to name a .pyc or .pyo file directly on the command line.
It is possible to have a file called spam.pyc (or spam.pyo when -O is used) without a file spam.py for the same module. This can be used to __distribute a library__ of Python code in a form that is moderately hard to reverse engineer.
The module **compileall** can create .pyc files (or .pyo files when -O is used) for all modules in a directory.
===== 6.2. Standard Modules =====
Python comes with a library of standard modules, described in a separate document, the **Python Library Reference** (“Library Reference” hereafter). Some modules are built into the interpreter; these provide access to operations that are not part of the core of the language but are nevertheless built in, either for efficiency or to provide access to operating system primitives such as system calls. The set of such modules is a configuration option which also depends on the underlying platform For example, the winreg module is only provided on Windows systems. One particular module deserves some attention: __sys__, which is built into every Python interpreter. The variables sys.ps1 and sys.ps2 define the strings used as primary and secondary prompts:
>>>
>>> import sys
>>> sys.ps1
'>>> '
>>> sys.ps2
'... '
>>> sys.ps1 = 'C> '
C> print 'Yuck!'
Yuck!
C>
These two variables are only defined if the interpreter is in interactive mode.
The variable __sys.path__ is a list of strings that determines the __interpreters search path__ for modules. It is initialized to a default path taken from the environment variable PYTHONPATH, or from a built-in default if PYTHONPATH is not set. You can modify it using standard list operations:
>>>
>>> import sys
>>> sys.path.append('/ufs/guido/lib/python')
===== 6.3. The dir() Function =====
The built-in function dir() is used to find out__ which names a module defines__. It returns a sorted list of strings:
>>>
>>> import fibo, sys
>>> dir(fibo)
['__name__', 'fib', 'fib2']
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__',
'__stdin__', '__stdout__', '_getframe', 'api_version', 'argv',
'builtin_module_names', 'byteorder', 'callstats', 'copyright',
'displayhook', 'exc_clear', 'exc_info', 'exc_type', 'excepthook',
'exec_prefix', 'executable', 'exit', 'getdefaultencoding', 'getdlopenflags',
'getrecursionlimit', 'getrefcount', 'hexversion', 'maxint', 'maxunicode',
'meta_path', **'modules', 'path'**, 'path_hooks', 'path_importer_cache',
'platform', 'prefix', **'ps1', 'ps2'**, 'setcheckinterval', 'setdlopenflags',
'setprofile', 'setrecursionlimit', 'settrace', **'stderr', 'stdin', 'stdout'**,
'version', 'version_info', 'warnoptions']
Without arguments, dir() lists the names you have defined __currently__:
>>>
>>> a = [1, 2, 3, 4, 5]
>>> import fibo
>>> fib = fibo.fib
>>> dir()
[__'__builtins__'__, '__doc__', '__file__',** '__name__'**, 'a', 'fib', 'fibo', 'sys']
dir()的参数__为对象的标示符__当参数为空时列出__当前命名空间__中的符号。
Note that it lists all types of names: variables, modules, functions, etc.
dir() does not list the names of __built-in functions and variables__. If you want a list of those, they are defined in the **standard module __builtin__**:
>>>
>>> import ____builtin____
>>> dir(____builtin____)
['ArithmeticError', 'AssertionError', 'AttributeError', 'DeprecationWarning',
'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', __'False'__,
'FloatingPointError', 'FutureWarning', 'IOError', 'ImportError',
'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt',
'LookupError', 'MemoryError', 'NameError', __'None'__, 'NotImplemented',
'NotImplementedError', 'OSError', 'OverflowError',
'PendingDeprecationWarning', 'ReferenceError', 'RuntimeError',
'RuntimeWarning', 'StandardError', 'StopIteration', 'SyntaxError',
'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', __'True'__,
'TypeError', 'UnboundLocalError', 'UnicodeDecodeError',
'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError',
'UserWarning', 'ValueError', 'Warning', 'WindowsError',
'ZeroDivisionError', __'_'__, '__debug__',** '__doc__'**, '__import__',
** '__name__'**,__ 'abs'__, 'apply', 'basestring',__ 'bool'__, 'buffer',
__ 'callable'__, 'chr', 'classmethod', 'cmp', 'coerce', 'compile',
**'complex'**, 'copyright', 'credits', 'delattr',__ 'dict', 'dir'__, 'divmod',
** 'enumerate'**, 'eval', __'execfile'__,__ 'exit'__, __'file'__, __'filter', 'float',__
'frozenset', 'getattr', __'globals'__, 'hasattr', 'hash', 'help', 'hex',
'id', __'input'__, 'int', 'intern',__ 'isinstance', 'issubclass'__, __'iter'__,
__ 'len'__, 'license', 'list', **'locals'**, 'long', 'map', 'max', 'memoryview',
'min', __'object'__, 'oct', __'open'__, 'ord', 'pow', 'property',__ 'quit'__, 'range',
__'raw_input'__, 'reduce', __'reload', 'repr', __'reversed', __'round',__ 'set',
'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super',
'tuple', __'type',__ 'unichr', __'unicode'__, 'vars', 'xrange', 'zip']
===== 6.4. Packages =====
Packages are a way of structuring Pythons **module namespace **by using “__dotted module names__”. For example, the module name A.B designates a submodule named B in a package named A.
Just like the** use of modules saves the authors of different modules from having to worry about each others global variable names**, the use of dotted module names saves the authors of __multi-module packages like NumPy or the Python Imaging Library from having to worry about each others module names__.
模块隔离了各个函数或全局变量的命名空间,包机制隔离了各个包中同名的模块。
Suppose you want to design a collection of modules (a “package”) for the uniform handling of sound files and sound data. There are many different sound file formats (usually recognized by their extension, for example: .wav, .aiff, .au), so you may need to create and maintain **a growing collection of modules** for the conversion between the various file formats. There are also many different operations you might want to perform on sound data (such as mixing, adding echo, applying an equalizer function, creating an artificial stereo effect), so in addition you will be writing **a never-ending **__stream__** of modules** to perform these operations. Heres a possible __structure__ for your package (expressed in terms of a hierarchical filesystem):
sound/ **Top-level **package #sound目录应该放**在sys.path的一个目录**中。
____init__.py __ Initialize the sound package
formats/ **Subpackage** for file format conversions
____init__.py__
wavread.py
wavwrite.py
aiffread.py
aiffwrite.py
auread.py
auwrite.py
...
effects/ Subpackage for sound effects
____init____.py
echo.py
surround.py
reverse.py
...
filters/ Subpackage for filters
____init____.py
equalizer.py
vocoder.py
karaoke.py
...
When** importing the package**, Python searches through the directories on __sys.path __looking for the package subdirectory**package是一个目录名**当导入一个package时python在sys.path列表中搜索含有此名的目录.
__The __init__.py files are required to make Python treat the directories as containing packages__; this is done to prevent directories with a common name, such as// string//, from unintentionally hiding valid modules that occur later on the module search path.
只有当目录中有____init____.py时python才将该目录当作一个package。这样一个普通目录(其中没有__init__.py)不会被当作package例如一个package目录pa与同名的普通目录pa放在不同的sys.path中时python会只在前者中搜索module。
In the simplest case, __init__.py can just be **an empty file**, but it can also execute initialization code for the package or set the all variable, described later.
__package中的__init__.py会在导入该package或package中的子包、模块等时被执行。实验如下__
import sound #导入一个包。创建一个新的命名空间sound__在其中__执行__init__.py。
import sound.filters #__导入一个子包时会同时导入路径中的父包创建多个命名空间__。这会创建两个命名空间:sound和sound.filterssound空间中包含其__init__.py执行的结果sound.filters中包含filters包的__init__.py执行的结果。
import sound.filters.vocoder #__导入模块时会同时导入路径中的各包创建多个命名空间__。这会创建sound、sound.filters、sound.filters.vocoder三个命名空间。
from sound import * #__在当前命名空间中执行sound的__init__.py文件__(因此__会将其中定义的符号导入到当前命名空间__)同时将文件中____all__列表__中的所有符号导入到当前空间中。注意不会创建sound命名空间。
from sound.filters import * #先在一个临时空间中执行sound的__init__.py文件然后__在当前命名空间中执行filters的__init__.py文件__(因此__会将其中定义的符号导入到当前命名空间__)同时将文件中____all__列表__中的所有符号导入到当前空间中。注意不会创建sound和sound.filters命名空间(前者被丢弃,后者被合并到当前空间)。
from sound import filters #先后__在不同空间中执行sound和filters的__init__.py文件__但是只有filters的空间被保留不会创建sound空间。
from sound import filters.vocoder #__错误的语法import右边不能有路径形式指定的包/模块以及模块中定义的符号。__
from sound.filters.vocoder import decode() #先后在不同空间中执行路径中各包的__init__.py文件但是空间__均不保留__。
注意使用from package import item时python会先执行package的__init__.py文件然后__在其结果中查找是否有item符号__若无就认为item是一个子包或模块若是子包则创建一个命名空间在其中执行子包的__init__.py文件若是模块则创建一个命名空间在其中执行模块文件。
Users of the package can import individual modules from the package, for example:
__import sound.effects.echo #这种形式只能导入subpackage或module.__
This loads the **submodule** sound.effects.echo. It must__ be referenced with its full name__.
**sound.effects.echo.echofilter**(input, output, delay=0.7, atten=4)
An alternative way of importing the submodule is:
__from sound.effects import echo #导入包中的module文件。__
This also loads the submodule echo, and makes it available **without its package prefix**, so it can be used as follows:
echo.echofilter(input, output, delay=0.7, atten=4)
Yet another variation is to import the **desired function or variable **directly:
__from sound.effects.echo import echofilter __
Again, this loads the submodule echo, but this makes its function echofilter() directly available:
echofilter(input, output, delay=0.7, atten=4)
Note that when using__ from package import item__, the item can be either a __submodule __(or __subpackage__) of the package, or some__ other name__ defined in the package, like a function, class or variable__(item不能是路径形式__). The import statement first tests whether the item is defined **in the package(也就是package中__init__.py中定义的符号item)**; if not, it assumes it is **a module** and attempts to load its file. If it fails to find it, an ImportError exception is raised.
Contrarily, when using syntax like __import item.subitem.subsubitem__, each item except for the last **must be a package**; the last item can be a module or a package but cant be a class or function or variable defined in the previous item.
==== 6.4.1. Importing * From a Package ====
Now what happens when the user writes__ from sound.effects import *__? Ideally, one would hope that this somehow goes out to the filesystem, finds which submodules are present in the package, and imports them all. This could take a long time and importing sub-modules might have unwanted side-effects that should only happen when the sub-module is explicitly imported.
The only solution is for the package author to __provide an explicit index of the package__. The import statement uses the following convention: if a packages __init__.py code defines a list named ____all____, it is taken to be the list of module names that should be imported when **from package import *** is encountered. It is up to the package author to keep this list up-to-date when a new version of the package is released. Package authors may also decide not to support it, if they dont see a use for importing * from their package. For example, the file sounds/effects/__init__.py could contain the following code:
__all__ = ["echo", "surround", "reverse"]
This would mean that __from sound.effects import *__ would import the three named submodules of the sound package。
在两个不同空间中执行sound和effects的__init__.py文件然后前一个空间被丢弃后一个空间的符号会__合并到当前空间中__同时effects中__all__列表中的包、模块等也会被导入到当前空间中。sound和sound.effects名称都不可访问。
If** __all__** is not defined, the statement** from sound.effects import * **__does not import all submodules __from the package **sound.effects** into the current namespace; it only ensures that the__ package sound.effects has been imported__ (possibly running any initialization code in __init__.py) and then imports whatever **names are defined in the package**. This includes **any names defined (and submodules explicitly loaded) by __init__.py.** It also includes any submodules of the package that were explicitly loaded by previous import statements. Consider this code:
import sound.effects.echo
import sound.effects.surround
from sound.effects import *
In this example, the echo and surround modules are imported __in the current namespace__ because they are defined in th**e sound.effects** package when the from...import statement is executed. (This **also works** when __all__ is defined.)
在包的__init__.py文件中也可以导入一些包、模块、函数等符号这些符号会保留在__init__.py的包环境中(对于from ... import ....是保留在__当前环境__中)。
Although certain modules are designed to export only names that follow certain patterns when you use import *, it is still considered bad practise in production code.
Remember, there is nothing wrong with using **from Package import specific_submodule**! In fact, this is the recommended notation unless the importing module needs to use submodules with the same name from different packages.
==== 6.4.2. Intra-package References ====
The submodules often need to__ refer to each other__. For example, the surround module might use the echo module. In fact, such references are so common that the__ import statement first looks in the containing package before looking in the standard module search path__. Thus, the surround module can simply use import echo or from echo import echofilter. If the imported module is not found in the** current package** (the package of which the current module is a submodule), the import statement looks for a** top-level** module with the given name.
使用from package import item形式时pyton会现在package的__init__.py结果中查找item符号若没有则在package的子目录中查找名为item的package或module若两者都没有会产生导入错误。
在子模块文件中导入其它模块时python会现在该模块所在的package及其父package中__逐级向上查找直到__top-level package**然后才在标准搜索路径**中查找。
When packages are structured into subpackages (as with the sound package in the example), you can use **absolute imports** to refer to submodules of __siblings__ packages. For example, if the module** sound.filters.vocoder** needs to use the echo module in the **sound.effects** package, it can use from sound.effects import echo.
上面的absolute imports 意思是__从当前模块所在的包目录结构的顶层package开始逐级指定__也被成为implicit relative imports.
Starting with Python 2.5, in addition to the implicit relative imports described above, you can write __explicit relative imports__ with the **from module import name** form of import statement. These explicit relative imports use **leading dots** to indicate the current and parent packages involved in the relative import. From the surround module for example, you might use:
from . import echo
from .. import formats
from ..filters import equalizer
Note that both explicit and implicit relative imports **are based on the name of the current module**. Since the name of the__ main module __is always "__main__", modules intended for use as the main module of a Python application should always use absolute imports.
注意包间的引用也就是上面的两种导入方式__只适合于包中__的模块文件或__init__.py文件因为相对路径是相对于当前module的。所以在非包模块如交互式module(名为__main__)中使用上述语法是错误的。
==== 6.4.3. Packages in Multiple Directories ====
Packages support one more special attribute, ____path. This is initialized to be a list containing the name of the directory holding the packages __init__.py before the code in that file is executed. This variable can be modified; doing so affects future searches for modules and subpackages contained in the package.
While this feature is not often needed, it can be used to extend the set of modules found in a package.
Footnotes
[1] In fact function definitions are also statements that are executed; the execution of a module-level function enters the function name in the modules global symbol table.

View File

@@ -0,0 +1,184 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-04T14:32:14+08:00
====== python学习笔记之module && package ======
Created Wednesday 04 January 2012
http://arganzheng.iteye.com/blog/986301
===== module =====
__import只能导入模块不能导入模块中的对象__类、函数、变量等。如一个模块AA.py中有个函数getName另一个模块不能通过import A.getName将getName导入到本模块只能用import A。如果想只导入特定的类、函数、变量则用from A import getName即可。
import一个module时会执行该module的所有方法并且将该module添加到importing module的命名空间中。A module's body executes immediately__ the first time__ the module is imported in a given run of a program...An import statement creates__ a new namespace__ containing all the attributes of the module. 如:
fibo.py
# Fibonacci numbers module
def fib(n): # write Fibonacci series up to n
a, b = 0, 1
while b < n:
print b,
a, b = b, a+b
def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while b < n:
result.append(b)
a, b = b, a+b
return result
print "EOF"
In [1]: import fibo #第一次导入fibo模块模块中的所有代码被执行一次。
EOF
In [2]: **import fibo #再次导入时,不再执行模块文件中的代码**
In [3]: fibo.
fibo.__builtins__ fibo.__doc__ fibo.__hash__ fibo.__package__ fibo.__setattr__ fibo.fib
fibo.__class__ fibo.__file__ fibo.__init__ fibo.__reduce__ fibo.__sizeof__ fibo.fib2
fibo.__delattr__ fibo.__format__ fibo.__name__ fibo.__reduce_ex__ fibo.__str__ fibo.py
fibo.__dict__ fibo.__getattribute__ fibo.__new__ fibo.__repr__ fibo.__subclasshook__ fibo.pyc
In [3]: fibo.____name____
Out[3]: 'fibo'
In [4]: fibo.fib(100)
1 1 2 3 5 8 13 21 34 55 89
In [5]: fibo.fib2(100)
Out[5]: [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
In [6]: **from fibo import fib #将fib导入到当前命名空间**__模块名称并没有__**导入到当前命名空间。**
In [7]: fib(100)
1 1 2 3 5 8 13 21 34 55 89
In [8]: fib2(100)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
/home/forrest/study/python/<ipython console> in <module>()
NameError: name 'fib2' is not defined
In [9]: from fibo import *
In [10]: fib2(100)
Out[10]: [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
会将fibo添加在当前module的名字空间并且__执行fibo.py定义的函数__定义函数表示将函数名添加到module的命名空间这样就可以通过fibo访问fibo中定义的方法。并且会执行module中的statement。上面只执行一次说明python只加载了一次。
下面这段话道出了python module的本质其实也是整个python语言的本质——绑定。
1. 变量定义赋值邦定对一个x = y ==>定义一个变量x他的值是y并且将这个变量邦定在其命名空间上如果是全局变量那么是该变量所在module。如果是函数内部变量运行时才会执行并且是邦定在函数自身的命名空间上。
2. 函数定义def functionName: 定义一个函数对象,并将其绑定它自身的命名空间中。
3. 类定义class clsName: 定义一个类,并将该类对象邦定在其命名空间中。
Attributes of a module object are normally bound by statements __in the module body__. When a statement in the body binds a variable (a global variable), what gets bound is an attribute of the **module object**. The normal purpose of a module body is exactly that of creating the module's attributes: def statements create and bind functions, class statements create and bind classes, and assignment statements can bind attributes of any type.
You can also bind and unbind module attributes outside the body (i.e., in other modules), generally using attribute reference syntax M.name (where M is any expression whose value is the module, and identifier name is the attribute name). For clarity, however, it's usually best to limit yourself to binding module attributes in the module's own body.
===== package =====
包通常总是一个目录,目录下为首的一个文件便是 __init__.py。然后是一些模块文件和目录假如子目录中也有 __init_.py 那么它就是这个包的__子包__了。差不多就像这样吧
Package1/
__init__.py
Module1.py
Module2.py
Package2/
__init__.py
Module1.py
Module2.py
我们可以就这样导入一个包:
import Package1
或者调入一个子模块和子包:
from Package1 import Module1
from Package1 import **Package2 #from...import形式可以导入子包、模块、模块中的符号。**
import Packag1.Module1
import Packag1.Package2 #import形式只能导入**子包或者模块**,而不能导入**模块中的函数**。
可以深入好几层包结构:
from Package1.Package2 import Module1
import Package1.Package2.Module1
_init_.py文件
The _init.py files are required to make Python treat the directories as containing packages. In the simplest case, __init.py can just be an empty file, but it can also execute initialization code for the package or set the __all_ variable, described later.
_init.py 控制着__包的导入行为__。假如 __init_.py 为空那么仅仅导入包是什么都做不了的。注意__无论是导入包还是导入包中的module路径中的各包的__init__.py都会被执行__。
>>> import Package1 # __#创建了一个新的命名空间在该空间中执行包的__init__.py文件。__
>>> Package1.Module1 #__但是该空间中并没有导入Module1或并没有这个符号。__
Traceback (most recent call last):
File "<pyshell#1>", line 1, in ?
Package1.Module1
AttributeError: 'module' object has no attribute 'Module1'
我们需要在 _init_.py 里把 Module1 预先导入:
import Module1 #最会将Module1导入到__所在的包空间__。
测试:
>>> import Package1
>>> Package1.Module1 #包空间中含有Module1符号。
<module 'Package1.Module1' from
'Module.pyc'>
_init.py 中还有一个重要的变量,叫做** __all__**。我们有时会使出一招**"全部导入"**,也就是这样:
from PackageName import * #前面已说了无路是from还是import都会执行路径中各包的__init__.py文件(还是在**各包空间**)。
这时 import 就会把注册在包 _init.py 文件中 __all_ 列表中的**子模块和子包**导入到**当前作用域**中来。比如:
__all__ = ['Module1', 'Module2', 'Package2']
测试:
>>> from Package1 import * #__将Package1的__init__.py中定义的符号以及其中__all__列表中定义的符号导入到当前的命名空间中。如果__init__.py中也导入了符号(用import或from...import..语句)这些符号也将导入到当前空间中。__
>>> Module2
<module 'Package1.Module2' from 'Module.pyc'>
_init_.py其实就是一个普通的python文件它会在__package被导入或包中的子包、模块等被导入__时执行。
print ">>>in package1.__init__.py"
def say_hello():
print "Hi, my name is Forrest!"
测试:
In [1]: import package1 #导入包时建立一个命名空间在其中__init__.py文件。
**>>>in package1.__init__.py**
In [2]:** package1.say_hello() ** #该包定义的函数。__可以在__init__.py文件中定义属于该包的符号__。
Hi, my name is Forrest!
__多级package——_init_.py依次被执行__
In [1]: import package1.package2 **#路径中各个包的__init__.py都将会在各自的包空间执行但是只有最后的package2空间被保留**。使用路径形式时package1的__init__.py中__不必__导入package2符号或将其放到__all__中。
<<<in package1.__init__.py>>>
<<<in package1.package2.__init__.py>>>
In [4]: **package1.package2**.foo_bar()
foobar!
将package/_init_.py改成如下
print "<<<in package1.__init__.py>>>"

View File

@@ -0,0 +1,294 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-04T14:15:54+08:00
====== 7. Input and Output ======
Created Wednesday 04 January 2012
There are several ways to present the output of a program; data can be **printed in a human-readable form**, or written to** a file **for future use. This chapter will discuss some of the possibilities.
===== 7.1. Fancier Output Formatting =====
So far weve encountered two ways of writing values: **expression statements and the print statement**. (A third way is using the __write() method of file objects__; the standard output file can be referenced as **sys.stdout**. See the Library Reference for more information on this.)
Often youll want more control over the** formatting of your output **than simply printing space-separated values. There are two ways to format your output; the first way is to do all the string handling yourself;** using string slicing and concatenation operations** you can create any layout you can imagine. The string types have some methods that perform useful operations for padding strings to a given column width; these will be discussed shortly. The second way is to use the __str.format()__ method.
The string module contains a** Template class **which offers yet another way to substitute values into strings.
One question remains, of course: how do you **convert values to strings**? Luckily, Python has ways to convert any value to a string: pass it to the__ repr() or str()__ functions.
The str() function is meant to **return representations of values which are fairly human-readable, while repr() is meant to generate representations which can be read by the interpreter **(or will force a SyntaxError if there is not equivalent syntax). For objects which dont have a particular representation for human consumption, str() will return the same value as repr(). Many values, such as** numbers **or structures like** lists **and **dictionaries**, have the same representation using either function.__ Strings__ and__ floating point__ numbers, in particular, have two distinct representations.
Some examples:
>>>
>>> s = 'Hello, world.'
>>> str(s)
**'Hello, world.'**
>>> repr(s)
__"'Hello, world.'"__
>>> str(1.0/7.0)
'0.142857142857'
>>> repr(1.0/7.0)
__'0.14285714285714285'__
>>> x = 10 * 3.25
>>> y = 200 * 200
>>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'
>>> print s
The value of x is 32.5, and y is 40000...
>>> # __The repr() of a string adds string quotes and backslashes:__
... hello = 'hello, world\n'
>>> hellos = repr(hello)
>>> print hellos __#如果是str则print的结果没有引号和转义字符。__
**'hello, world\n'**
>>> # __The argument to repr() may be any Python object:__
... repr((x, y, ('spam', 'eggs')))
"(32.5, 40000, ('spam', 'eggs'))"
Here are two ways to write a table of squares and cubes:
>>>
>>> for x in range(1, 11):
... print repr(x).__rjust(2)__, repr(x*x).rjust(3)__,__
... # Note trailing comma on previous line
... print repr(x*x*x).rjust(4)
...
1 1 1
2 4 8
3 9 27
4 16 64
5 25 125
6 36 216
7 49 343
8 64 512
9 81 729
10 100 1000
字符串的rjust(n)方法用于表示__字符串至少占n位字符空间不足n位的右对齐__。
>>> for x in range(1,11):
... print '{0:2d} {1:3d} {2:4d}'__.format__(x, x*x, x*x*x)
...
1 1 1
2 4 8
3 9 27
4 16 64
5 25 125
6 36 216
7 49 343
8 64 512
9 81 729
10 100 1000
(Note that in the first example, one space between each column was added by the way print works: **it always adds spaces between its arguments.**)
This example demonstrates the__ str.rjust() __method of string objects, which **right-justifies a string in a field of a given width by padding it with spaces on the left.** There are similar methods __str.ljust() and str.center()__. These methods do not write anything, they just __return a new string__. If the input string is too long, they dont truncate it, but return it unchanged; this will mess up your column lay-out but thats usually better than the alternative, which would be lying about a value. (If you really want truncation you can always add a slice operation, as in __x.ljust(n)[:n]__.)
There is another method, __str.zfill()__, which pads a numeric string on the left with zeros. It understands about **plus and minus** signs:
>>>
>>> '12'.zfill(5)
'00012'
>>> '-3.14'.zfill(7)
'-003.14' #符号和小数点各占一位。
>>> '3.14159265359'.zfill(5)
'3.14159265359'
Basic usage of the__ str.format() __method looks like this:
>>>
>>> print 'We are the {} who say "{}!"'.format('knights', 'Ni')
We are the knights who say "Ni!"
The brackets and characters within them (called __format fields__) are** replaced **with the objects passed into the str.format() method. A number in the brackets refers to the **position of the object** passed into the str.format() method.
>>>
>>> print '{0} and {1}'.format('spam', 'eggs')
spam and eggs
>>> print '{1} and {0}'.format('spam', 'eggs')
eggs and spam
If **keyword arguments** are used in the str.format() method, their values are referred to by using the name of the argument.
>>>
>>> print 'This {food} is {adjective}.'.format(
... __food='spam', adjective='absolutely horrible'__)
This spam is absolutely horrible.
Positional and keyword arguments can be **arbitrarily combined**:
>>>
>>> print 'The story of {0}, {1}, and {other}.'.format('Bill', 'Manfred',
... other='Georg')
The story of Bill, Manfred, and Georg.
__'!s' __(apply str()) and__ '!r'__ (apply repr()) can be used to **convert the value before it is formatted**.
>>>
>>> import math
>>> print 'The value of PI is approximately {}.'.format(math.pi)
The value of PI is approximately 3.14159265359.
>>> print 'The value of PI is approximately__ {!r}__.'.format(math.pi)
The value of PI is approximately 3.141592653589793.
An optional__ ':' and format specifier __can follow the field name. This allows greater control over how the value is formatted. The following example rounds Pi to three places after the decimal.
>>>
>>> import math
>>> print 'The value of PI is approximately__ {0:.3f}__.'.format(**math.pi**)
The value of PI is approximately 3.142.
Passing an integer after the ':' will cause that field to be a** minimum number of characters wide**. This is useful for making tables pretty.
>>>
>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}
>>> for name, phone in table.items():
... print '{0:10} ==> {1:10d}'.format(name, phone)
...
Jack ==> 4098
Dcab ==> 7678
Sjoerd ==> 4127
If you have a really long format string that you dont want to split up, it would be nice if you could reference the variables to be **formatted by name instead of by position**. This can be done by simply passing the __dict__ and __using square brackets '[]' to access the keys__
>>>
>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
>>> print ('Jack: {__0[Jack]__:d}; Sjoerd: {0[Sjoerd]:d}; '
... 'Dcab: {0[Dcab]:d}'.format(table)) #因为只有一个参数所以所有的位置参数为0. {n[key]:format-spec},n表示format()的第n个参数它是一个dictn[key]表示该key对应的值。
Jack: 4098; Sjoerd: 4127; Dcab: 8637678
This could also be done by passing the table as **keyword arguments with the ** notation**.
>>>
>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
>>> print 'Jack: {Jack:d}; Sjoerd: {Sjoerd:d}; Dcab: {Dcab:d}'.format(**table)
Jack: 4098; Sjoerd: 4127; Dcab: 8637678
This is particularly useful in combination with the__ built-in function vars()__, which returns a dictionary containing **all local variables**.
For a complete overview of string formatting with str.format(), see Format String Syntax.
==== 7.1.1. Old string formatting ====
The % operator can also be used for string formatting. It interprets the __left argument much like a sprintf()-style format string__ to be applied to the right argument, and returns the string resulting from this formatting operation. For example:
>>>
>>> import math
>>> print 'The value of PI is approximately %5.3f.' % math.pi
The value of PI is approximately 3.142.
Since str.format() is quite new, a lot of Python code still uses the % operator. However, because this old style of formatting will eventually be** removed from** the language, __str.format() should generally be used__.
More information can be found in the String Formatting Operations section.
===== 7.2. Reading and Writing Files =====
open() returns a __file object__, and is most commonly used with two arguments: open(filename, mode).
>>>
>>> f = open('/tmp/workfile', 'w')
>>> print f
<open file '/tmp/workfile', mode 'w' at 80a0960>
The first argument is a string containing the filename. The second argument is another string containing a few characters describing the way in which the file will be used. mode can be__ 'r'__ when the file will only be read,__ 'w'__ for only writing (an existing file with the same name will be erased), and__ 'a'__ opens the file for appending; any data written to the file is automatically added to the end. __'r+'__ opens the file for both reading and writing. The mode argument is optional; 'r' will be assumed if its omitted.
On Windows,__ 'b' __appended to the mode opens the file in binary mode, so there are also modes like '__rb', 'wb', and 'r+b'__. Python on Windows makes a distinction between text and binary files; the __end-of-line__ characters in text files are automatically altered slightly when data is read or written. This behind-the-scenes modification to file data is **fine for ASCII text files**, but itll corrupt binary data like that in JPEG or EXE files. Be very careful to __use binary mode__ when reading and writing such files. On Unix, it doesnt hurt to append a 'b' to the mode, so you can use it platform-independently for all binary files.
==== 7.2.1. Methods of File Objects ====
The rest of the examples in this section will assume that **a file object** called f has already been created.
To read a files contents, call__ f.read(size)__, which reads some quantity of data and** returns it as a string**. size is an optional numeric argument. When size is omitted or negative, the __entire __contents of the file will be read and returned; its your problem if the file is twice as large as your machines memory. Otherwise, **at most size bytes **are read and returned. If the end of the file has been reached, f.read() will return __an empty string ("")__.
>>>
>>> f.read()
'This is the entire file.\n'
>>> f.read()
''
__f.readline()__ reads a single line from the file; a newline character (\n)__ is left__ at the end of the string, and is only omitted on the** last line** of the file if the file doesnt end in a newline. This makes the return value unambiguous; if f.readline() returns **an empty string**, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.
>>>
>>> f.readline()
'This is the first line of the file.\n'
>>> f.readline()
'Second line of the file\n'
>>> f.readline()
''
__f.readlines()__ returns __a list__ containing all the lines of data in the file. If given an optional parameter** sizehint**, it reads that many** bytes **from the file and enough more** to complete** a line, and returns the lines from that. This is often used to allow efficient reading of a large file by lines, but without having to load the entire file in memory. __Only complete lines will be returned__.
>>>
>>> f.readlines()
['This is the first line of the file.\n', 'Second line of the file\n']
An alternative approach to reading lines is __to loop over the file object.__ This is memory efficient, fast, and leads to simpler code:
>>>
>>> for line in f: #line字符串包含行尾的换行符所以用print打印该行时可以省略换行符。
print__ line,__
This is the first line of the file.
Second line of the file
The alternative approach is simpler but does not provide as **fine-grained** control. Since the two approaches manage line buffering differently, they should not be mixed.
__f.write(string)__ writes the contents of __string__ to the file, returning__ None__.
>>>
>>> f.write('This is a test\n')
To write something other than a string, it needs to be **converted to a string first**:
>>>
>>> value = ('the answer', 42)
>>> s = str(value)
>>> f.write(s)
__f.tell()__ returns an integer giving the **file objects current position** in the file, measured in **bytes **from the beginning of the file. To change the file objects position, use __f.seek(offset, from_what)__. The position is computed from **adding offset to a reference point;** the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and __defaults to 0__, using the beginning of the file as the reference point.
>>>
>>> f = open('/tmp/workfile', **'r+'**)
>>> f.write('0123456789abcdef')
>>> f.seek(5) # Go to the 6th byte in the file
>>> f.read(1)
'5'
>>> f.seek(-3, 2) # Go to the 3rd byte before the end
>>> f.read(1)
'd'
When youre done with a file, call __f.close()__ to close it and free up any system resources taken up by the open file. After calling f.close(), attempts to use the file object will automatically fail.
>>>
>>> f.close()
>>> f.read()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: I/O operation on closed file
It is good practice to use the __with keyword when dealing with file objects__. This has the advantage that the file is properly closed after its suite finishes, even if an exception is raised on the way. It is also much shorter than writing equivalent **try-finally** blocks:
>>>
>>>__ with open('/tmp/workfile', 'r') as f:__
... read_data = f.read()
>>> f.closed
True
File objects have some additional methods, such as **isatty()** and **truncate()** which are less frequently used; consult the Library Reference for a complete guide to file objects.
==== 7.2.2. The pickle Module ====
Strings can easily be written to and read from a file. Numbers take a bit more effort, since the** read() method only returns strings**, which will have to be passed to a function like int(), which takes a string like '123' and returns its numeric value 123. However, when you want to save **more complex data types** like lists, dictionaries, or class instances, things get a lot more complicated.
Rather than have users be constantly writing and debugging code__ to save complicated data types__, Python provides a standard module called** pickle**. This is an amazing module that** can take almost any Python object** (even some forms of Python code!), and convert it to__ a string representation__; this process is called **pickling**.__ Reconstructing __the object from the string representation is called unpickling. Between pickling and unpickling, the string representing the object may have been** stored in a file or data**, or sent over a network connection to some distant machine.
If you have an object x, and a file object f thats been **opened for writing**, the simplest way to pickle the object takes only one line of code:
__pickle.dump(x, f)__
To unpickle the object again, if f is a file object which has been opened for reading:
__x = pickle.load(f)__
(There are other variants of this, used when pickling many objects or when you dont want to write the pickled data to a file; consult the complete documentation for pickle in the Python Library Reference.)
pickle is the__ standard way__ to make Python objects which can be** stored and reused **by other programs or by a future invocation of the same program; the technical term for this is__ a persistent object__. Because pickle is so widely used, many authors who write Python extensions take care to ensure that new data types such as matrices can be properly pickled and unpickled.

View File

@@ -0,0 +1,286 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-04T19:19:44+08:00
====== 8. Errors and Exceptions ======
Created Wednesday 04 January 2012
Until now error messages havent been more than mentioned, but if you have tried out the examples you have probably seen some. There are (at least) two distinguishable kinds of errors: __syntax errors and exceptions__.
===== 8.1. Syntax Errors =====
Syntax errors, also known as __parsing errors__, are perhaps the most common kind of complaint you get while you are still learning Python:
>>>
>>> while True print 'Hello world'
** File "<stdin>", line 1, in ?**
while True print 'Hello world'
^
SyntaxError: invalid syntax
The parser repeats the offending line and displays a little arrow pointing at the earliest point in the line where the error was detected. The error is caused by (or at least detected at) __the token preceding the arrow__: in the example, the error is detected at the keyword print, since a colon (':') is missing before it. File name and line number are printed so you know where to look in case the input came from a script.
===== 8.2. Exceptions =====
Even if a statement or expression is syntactically correct, it may cause an error when an attempt is made to execute it. **Errors detected during execution are called exceptions** and are not unconditionally fatal: you will soon learn how to **handle them** in Python programs. Most exceptions are not handled by programs, however, and result in error messages as shown here:
>>>
>>> 10 * (1/0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
**ZeroDivisionError**: integer division or modulo by zero
>>> 4 + spam*3
Traceback (most recent call last):
File "<stdin>", line 1, in ?
**NameError**: name 'spam' is not defined
>>> '2' + 2
Traceback (most recent call last):
File "<stdin>", line 1, in ?
**TypeError**: cannot concatenate 'str' and 'int' objects
The__ last line__ of the error message indicates what happened. Exceptions come in __different types__, and the type is printed as part of the message: the types in the example are ZeroDivisionError, NameError and TypeError. The string printed as the exception type is the name of the built-in exception that occurred. This is true for all built-in exceptions, but need not be true for user-defined exceptions (although it is a useful convention). **Standard exception names are built-in identifiers** (not reserved keywords).
The rest of the line provides detail based on the type of exception and what caused it.
The preceding part of the error message __shows the context__ where the exception happened, in the__ form of a stack traceback__. In general it contains a stack traceback listing source lines; however, it will not display lines read from standard input.
Built-in Exceptions lists the built-in exceptions and their meanings.
===== 8.3. Handling Exceptions =====
It is possible to write programs that **handle selected exceptions**. Look at the following example, which asks the user for input until a valid integer has been entered, but allows the user to** interrupt** the program (using Control-C or whatever the operating system supports); note that __a user-generated interruption is signalled by raising the KeyboardInterrupt exception__.
>>>
>>> while True:
... try:
... x = int(**raw_input**("Please enter a number: "))
... break
... except **ValueError**:
... print "Oops! That was no valid number. Try again..."
...
The try statement works as follows.
* First, the try clause (the statement(s) between the try and except keywords) is executed.
* If no exception occurs, __the except clause is skipped__ and execution of the try statement is finished.
* If an exception occurs during execution of the try clause, __the rest of the clause is skipped__. Then if its **type matches** the exception named after the except keyword, the except clause is executed, and then execution continues **after** the try statement.
* If an exception occurs which does not match the exception named in the except clause, it is__ passed on to outer try statements__; if no handler is found, it is an __unhandled exception__ and** execution stops** with a message as shown above.
A try statement may have more than one except clause, to specify handlers for different exceptions. __At most one handler will be executed__. Handlers only handle exceptions that occur in the corresponding try clause, not in other handlers of the same try statement. An except clause may name** multiple exceptions** as a parenthesized tuple, for example:
... except (RuntimeError, TypeError, NameError):
... pass
The last except clause may __omit __the exception name(s), to__ serve as a wildcard__. Use this with extreme caution, since it is easy to **mask **a real programming error in this way! It can also be used to print an error message and then __re-raise__ the exception (allowing a caller to handle the exception as well):
import sys
try:
f = open('myfile.txt')
s = f.readline()
i = int(s.strip())
except **IOError **as** (errno, strerror)**:
print "I/O error({0}): {1}".format(errno, strerror)
except** ValueError**:
print "Could not convert data to an integer."
except:
print "Unexpected error:", **sys.exc_info()[0]**
__raise__
The try ... except statement has an optional __else clause__, which, when present, must follow all except clauses. It is useful for code that **must be executed if the try clause **__does not__** raise an exception**. For example:
for arg in **sys.argv[1:]**:
try:
f = open(arg, 'r')
except IOError:
print 'cannot open', arg
else:
print arg, 'has', len(f.readlines()), 'lines'
f.close()
The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasnt raised by the code being protected by the try ... except statement.
When an exception occurs, it may have an associated value, also known as the** exceptions argument**. The presence and type of the argument depend on the exception type.
The except clause may **specify a variable after the exception name** (or tuple). The variable is __bound to an exception instance__ with the arguments stored in __instance.args__. For convenience, the exception instance defines __str__() so the arguments can be** printed directly** without having to reference .args.
One may also instantiate an exception first before raising it and add any attributes to it as desired.
>>>
>>> try:
... raise Exception(__'spam', 'eggs'__)
... except Exception as **inst**:
... print type(inst) # the exception instance
... print inst.__args __ # arguments stored in .args
... **print inst ** #** __str__ allows args to printed directly**
... x, y = inst # **__getitem__ allows args to be unpacked directly**
... print 'x =', x
... print 'y =', y
...
<type 'exceptions.Exception'>
('spam', 'eggs')
('spam', 'eggs')
x = spam
y = eggs
If an exception has an argument, it is printed as the last part (detail) of the message for unhandled exceptions.
Exception handlers dont just handle exceptions if they occur **immediately in** the try clause, but also if they occur inside functions that__ are called__ (even indirectly) in the try clause. For example:
>>>
>>> def this_fails():
... x = 1/0
...
>>> try:
... this_fails()
... except ZeroDivisionError as detail:
... print 'Handling run-time error:', detail
...
Handling run-time error: integer division or modulo by zero
===== 8.4. Raising Exceptions =====
The** raise** statement allows the programmer to force a specified exception to occur. For example:
>>>
>>> raise NameError('HiThere')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: HiThere
The sole argument to raise indicates** the exception** to be raised. This must be either __an exception instance or an exception class__ (a class that derives from Exception).
If you need to determine whether an exception was raised but dont intend to handle it, a simpler form of the raise statement allows you to __re-raise__ the exception:
>>>
>>> try:
... raise NameError('HiThere')
... except NameError:
... print 'An exception flew by!'
... __raise__
...
An exception flew by!
Traceback (most recent call last):
File "<stdin>", line 2, in ?
NameError: HiThere
===== 8.5. User-defined Exceptions =====
Programs may name their own exceptions by __creating a new exception class__ (see Classes for more about Python classes). Exceptions should typically be derived from the __Exception class__, either directly or indirectly. For example:
>>>
>>> class MyError(Exception):
... def __init__(self, value):
... self.value = value
... def ____str____(self):
... return repr(self.value)
...
>>> try:
... raise MyError(2*2)
... except MyError __as e__:
... print 'My exception occurred, value:', **e.value**
...
My exception occurred, value: 4
>>> raise MyError('oops!')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
__main__.MyError: 'oops!'
In this example, the default __init__() of Exception has been** overridden**. The new behavior simply creates the value attribute. This__ replaces the default behavior of creating the args attribute__.
Exception classes can be defined which do anything any other class can do, but are usually kept simple, often only** offering a number of attributes** that allow __information about the error to be extracted by handlers__ for the exception.
When creating a module that can raise several distinct errors, a common practice is to__ create a base class__ for exceptions defined by that module, and __subclass__ that to create specific exception classes for different error conditions:
class Error(Exception):
"""Base class for exceptions in this module."""
pass
class InputError(Error):
"""Exception raised for errors in the input.
Attributes:
expr -- input expression in which the error occurred
msg -- explanation of the error
"""
def __init__(self, expr, msg):
self.__expr __= expr
self.__msg __= msg
class TransitionError(Error):
"""Raised when an operation attempts a state transition that's not
allowed.
Attributes:
prev -- state at beginning of transition
next -- attempted new state
msg -- explanation of why the specific transition is not allowed
"""
def __init__(self, prev, next, msg):
self.prev = prev
self.next = next
self.msg = msg
Most exceptions are defined with names that __end in “Error,” __similar to the naming of the standard exceptions.
Many standard modules define their own exceptions to report errors that may occur in functions they define. More information on classes is presented in chapter Classes.
===== 8.6. Defining Clean-up Actions =====
The try statement has another optional clause which is intended to define__ clean-up actions__ that __must be executed under all circumstances__. For example:
>>>
>>> try:
... raise KeyboardInterrupt
... finally:
... print 'Goodbye, world!'
...
Goodbye, world!
KeyboardInterrupt
__A finally clause is always executed before leaving the try statement__, finally子句总是在控制流即将离开try子句前执行不管异常是否发生。whether an exception has occurred or not. When an exception has occurred in the try clause and has not been handled by an except clause (or it has occurred in a except or **else** clause), it is __re-raised after the finally clause has been executed__. The finally clause is also executed “on the way out” when any other clause of the try statement is left via a break, continue or return statement. A more complicated example (having **except and finally** clauses in the same try statement works as of Python 2.5):
>>>
>>> def divide(x, y):
... try:
... result = x / y
... except ZeroDivisionError:
... print "division by zero!"
... __else: #try子句成功执行后才执行。__
... print "result is", result
... __finally:__
... print "executing finally clause"
...
>>> divide(2, 1)
result is 2 #else子句在finally子句__之前__执行。
executing finally clause
>>> divide(2, 0) #产生的异常**先被捕获然后执行**finally子句(不会再重新触发异常)
division by zero!
**executing finally clause**
>>> divide("2", "1")
executing finally clause #产生未捕获的异常时fianlly子句执行完后__会自动重新触发__异常。
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 3, in divide
TypeError: unsupported operand type(s) for /: 'str' and 'str'
As you can see, **the finally clause is executed in any event**. The TypeError raised by dividing two strings is not handled by the except clause and therefore __re-raised__ after the finally clause has been executed.
In real world applications, the finally clause is useful for** releasing external resources** (such as files or network connections), regardless of whether the use of the resource was successful.
===== 8.7. Predefined Clean-up Actions =====
Some objects define s**tandard clean-up actions **to be undertaken when the object is no longer needed, regardless of whether or not the operation using the object succeeded or failed. Look at the following example, which tries to open a file and print its contents to the screen.
for line in open("myfile.txt"):
print line
The problem with this code is that it** leaves the file open** for an indeterminate amount of time after the code has finished executing. This is not an issue in simple scripts, but can be a problem for larger applications. The __with statement __allows objects like files to be used in a way that **ensures they are always cleaned up promptly and correctly**.
with open("myfile.txt") __as f__:
for line in f:
print line
After the statement is executed,** the file f is always closed**, even if a problem was encountered while processing the lines. Other objects which provide predefined clean-up actions will indicate this in their documentation.

View File

@@ -0,0 +1,463 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2012-01-04T19:20:13+08:00
====== 9. Classes ======
Created Wednesday 04 January 2012
Compared with other programming languages, Pythons class mechanism adds classes with a minimum of new syntax and semantics. It is a mixture of the class mechanisms found in C++ and Modula-3. Python classes provide** all the standard features of Object Oriented Programming**: the class inheritance mechanism allows** multiple base classes**, a derived class can **override any **methods of its base class or classes, and a method can call the method of a base class with the same name. Objects can contain arbitrary amounts and kinds of data. As is true for modules, classes partake of the__ dynamic nature__ of Python: they are created at runtime, and can be modified further after creation.
In C++ terminology, normally class members (including the data members) are public (except see below Private Variables), and all member functions are virtual. As in Modula-3, there are no shorthands for referencing the objects members from its methods: the method function is declared with an __explicit first argument representing the object__, which is provided implicitly by the call. As in Smalltalk,__ classes themselves are objects__. This provides semantics for **importing and renaming**. Unlike C++ and Modula-3, built-in types can be used as base classes for extension by the user. Also, like in C++, most **built-in operators **with special syntax (arithmetic operators, subscripting etc.) can be redefined for class instances.
(Lacking universally accepted terminology to talk about classes, I will make occasional use of Smalltalk and C++ terms. I would use Modula-3 terms, since its object-oriented semantics are closer to those of Python than C++, but I expect that few readers have heard of it.)
===== 9.1. A Word About Names and Objects =====
Objects have individuality, and **multiple names (in multiple scopes) can be bound to the same object**. This is known as __aliasing__ in other languages. This is usually not appreciated on a first glance at Python, and can be safely ignored when dealing with immutable basic types (numbers, strings, tuples). However, aliasing has a possibly surprising effect on the semantics of Python code involving **mutable objects** such as lists, dictionaries, and most other types. This is usually used to the benefit of the program, since __aliases behave like pointers in some respects.__ For example, passing an object is cheap since only a pointer is passed by the implementation; and if a function modifies an object passed as an argument, **the caller will see the change** — this eliminates the need for two different argument passing mechanisms as in Pascal.
===== 9.2. Python Scopes and Namespaces =====
Before introducing classes, I first have to tell you something about Pythons __scope rules__. Class definitions play some neat tricks with namespaces, and you need to know how scopes and namespaces work to fully understand whats going on. Incidentally, knowledge about this subject is useful for any advanced Python programmer.
Lets begin with some definitions.
__A namespace is a mapping from names to objects.__ Most namespaces are currently implemented as Python **dictionaries**, but thats normally not noticeable in any way (except for performance), and it may change in the future. Examples of namespaces are: **the set of **__built-in names__ (containing functions such as abs(), and built-in exception names); the__ global names__ in a module; and the__ local names__ in a function invocation. In a sense __the set of attributes of an object__ also form a namespace.
The important thing to know about namespaces is that __there is absolutely no relation between names in different namespaces__; for instance, two different modules may both define a function maximize without confusion — users of the modules must prefix it with the module name.
By the way, I use the word__ attribute__ for any name following a dot — for example, in the expression z.real, real is an attribute of the object z.
Strictly speaking, **references to names in modules are attribute references**: in the expression modname.funcname, modname is a__ module object__ and funcname is an attribute of it.
In this case there happens to be a straightforward mapping between__ the modules attributes and the global names defined in the module: they share the same namespace! __[1]
Attributes may be read-only or writable. In the latter case, assignment to attributes is possible. Module attributes are writable: you can write modname.the_answer = 42. __Writable attributes may also be deleted with the del statement__. For example, del modname.the_answer will remove the attribute the_answer from the object named by modname.
Namespaces are** created at different moments and have different lifetime**s. The namespace containing the__ built-in__ names is created **when the Python interpreter starts up**, and is never deleted.__ The global namespace for a module__ is created when **the module definition is read in**; normally, module namespaces also last until the interpreter quits.
The statements executed by the** top-level** invocation of the interpreter, either read from a script file or interactively, are considered part of a module called ____main, so they have their own global namespace. (The built-in names actually also live in a module; this is called__ __builtin____.)
__The local namespace__ for a function is created when **the function is called**, and deleted when the function returns or raises an exception that is not handled within the function. (Actually, forgetting would be a better way to describe what actually happens.) Of course, recursive invocations each have __their own__ local namespace.
__A scope__ is a** textual region** of a Python program where a namespace is **directly accessible**. “Directly accessible” here means that an unqualified reference to a name attempts to find the name in the namespace.
__Although scopes are determined statically, they are used dynamically.__ At any time during execution, there are at least **three nested scopes** whose namespaces are directly accessible:
* the innermost scope, which is searched first, contains the** local names**
* the scopes of any **enclosing functions**, which are searched starting with the nearest enclosing scope, contains __non-local__, but also __non-global__ names
* the next-to-last scope contains the __current modules global names__
* the outermost scope (searched last) is the namespace__ containing built-in names__
If a name is declared **global**, then all references and assignments go directly to the middle scope containing the__ modules global names__. Otherwise, all variables found outside of the innermost scope are** read-only** (an attempt to write to such a variable will **simply create **a new local variable in the innermost scope, leaving the identically named outer variable unchanged).
Usually, the local scope references the local names of the (textually) current function. __Outside__ functions, the local scope references the same namespace as the **global scope**: the modules namespace. Class definitions place yet another namespace in the local scope.
It is important to realize that __scopes are determined textually__: the global scope of a function defined in a module is that __modules namespace__, no matter from where or by what alias the function is called.
On the other hand, __the actual search for names is done dynamically__, at run time — however, the language definition is evolving towards static name resolution, at “compile” time, so dont rely on dynamic name resolution! (In fact, local variables are already determined statically.)
A special quirk of Python is that __if no global statement is in effect assignments to names always go into the innermost scope__.
__Assignments do not copy data — they just bind names to objects__.
The same is true for deletions: __the statement del x removes the binding of x from the namespace referenced by the local scope__.
In fact, all operations that introduce new names use the local scope: in particular, import statements and function definitions bind the module or function name __in the local scope__. (The global statement can be used to indicate that particular variables live in the global scope.)
===== 9.3. A First Look at Classes =====
Classes introduce a little bit of new syntax, three new object types, and some new semantics.
==== 9.3.1. Class Definition Syntax ====
The simplest form of class definition looks like this:
class ClassName:
<statement-1>
.
.
.
<statement-N>
Class definitions, like function definitions (def statements) must be executed before they have any effect. (You could conceivably place a class definition in a branch of an if statement, or inside a function.)
In practice, the statements inside a class definition will usually be function definitions, but other statements are allowed, and sometimes useful — well come back to this later. The function definitions inside a class normally have __a peculiar form of argument list__, dictated by the calling conventions for methods — again, this is explained later.
When a class definition is entered, __a new namespace is created, and used as the local scope__ — thus, all assignments to local variables go into this new namespace(类定义中的所有赋值语句产生的名称和对象都位于class定义时产生的namespace。). In particular, function definitions bind the name of the new function here.
When a class definition is left normally (via the end), __a class object__ is created. This is basically **a wrapper around the contents of the namespace created by the class definition**; well learn more about class objects in the next section. The original local scope (the one in effect just before the class definition was entered) is reinstated, and the class object is bound here to the class name given in the class definition header (ClassName in the example).
类定义结束后即在当前namespace中产生了一个__类对象(不是类实例)__定义时的__类名与该对象在当前namspace中被绑定__。类对象其实是对类命名空间的一种封装。
==== 9.3.2. Class Objects ====
Class objects support two kinds of operations:** attribute references and instantiation**.
类对象支持两类操作:属性引用和实例化。
__Attribute references__ use the standard syntax used for all attribute references in Python: obj.name. Valid attribute names are all the names that were in the** classs namespace** when the class object was created. So, if the class definition looked like this:
class MyClass:
"""A simple example class"""
i = 12345
def f(self):
return 'hello world'
#符号i和f都在类空间中定义。
then MyClass.i and MyClass.f are valid attribute references, returning an integer and a function object, respectively. Class attributes can also be assigned to, so you can change the value of MyClass.i by assignment. **__doc__** is also a valid attribute, returning the docstring belonging to the class: "A simple example class".
__Class instantiation__ uses function notation. Just pretend that the class object is a parameterless function that** returns a new instance of the class**. For example (assuming the above class):
x = MyClass()
creates a new instance of the class and assigns this object to the** local variable x**.
The instantiation operation (“calling” a class object) creates an empty object. Many classes like to create objects with instances customized to __a specific initial state__. Therefore a class may define a special method named __init__(), like this:
def** __init__**(self):
self.data = []
When a class defines an __init__() method, class instantiation__ automatically invokes __init__() __for the newly-created class instance. So in this example, a new, initialized instance can be obtained by:
x = MyClass() #生成一个实例对象用位于当前命名空间中的符号x与该对象绑定。
Of course, the __init__() method may have arguments for greater flexibility. In that case,__ arguments given to the class instantiation operator are passed on to __init__()__. For example,
>>>
>>> class Complex:
... def __init__(self, realpart, imagpart): #符号__init__位于__类对象空间__中。
... self.r = realpart #__self表示符号r位于实例对象命名空间中__。
... self.i = imagpart
...
>>> x = Complex(3.0, -4.5)
>>> x.r, x.i
(3.0, -4.5)
==== 9.3.3. Instance Objects ====
Now what can we do with instance objects? __The only operations understood by instance objects are attribute references__. There are two kinds of valid attribute names**, data attributes and methods**.
data attributes correspond to **“instance variables” **in Smalltalk, and to “data members” in C++. __Data attributes need not be declared; like local variables__, they spring into existence__ when they are first assigned to__. For example, if x is the instance of MyClass created above, the following piece of code will print the value 16, without leaving a trace:
**x.counter** = 1 #不用管x对象中是否包含conter属性。
while x.counter < 10:
x.counter = x.counter * 2
print x.counter
del x.counter
The other kind of instance attribute reference is a method. __A method is a function that “belongs to” an object.__ (In Python, the term method is not unique to class instances: other object types can have methods as well. For example, list objects have methods called append, insert, remove, sort, and so on. However, in the following discussion, well use the term method exclusively to mean methods of class instance objects, unless explicitly stated otherwise.)
Valid method names of an instance object __depend on its class__. By definition, all attributes of a class that are function objects define corresponding methods of its instances. So in our example, x.f is a valid method reference, since MyClass.f is a function, but x.i is not, since MyClass.i is not.
But x.f is not the same thing as MyClass.f —__ it is a method object, not a function object__.
方法对象和函数对象是两个不同的概念。
==== 9.3.4. Method Objects ====
Usually, a method is called right after it is bound:
x.f()
In the MyClass example, this will return the string 'hello world'. However, it is not necessary to call a method right away: __x.f is a method object, and can be stored away and called at a later time. __For example:
xf = x.f
while True:
print xf()
will continue to print hello world until the end of time.
What exactly happens when a method is called? You may have noticed that x.f() was called without an argument above, even though the function definition for f() specified an argument. What happened to the argument? Surely Python **raises an exception** when a function that requires an argument is called without any — even if the argument isnt actually used...
Actually, you may have guessed the answer:__ the special thing about methods is that the object(实例对象) is passed as the first argument of the function__. In our example, the **call x.f() is exactly equivalent to MyClass.f(x)**. In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the methods object before the first argument.
If you still dont understand how methods work, a look at the implementation can perhaps clarify matters. When an** instance attribute** is referenced that isnt a data attribute, __its class is searched__. If the name denotes a valid class attribute that is a function object, a method object is created by packing (pointers to) the instance object and the function object just found together in an abstract object: this is the method object. When the method object is called with an argument list, a new argument list is constructed from the instance object and the argument list, and the function object is called with this new argument list.
===== 9.4. Random Remarks =====
__Data attributes override method attributes with the same name;__ to avoid accidental name conflicts, which may cause hard-to-find bugs in large programs, it is wise to use some kind of convention that minimizes the chance of conflicts.
Possible conventions include__ capitalizing method names__, prefixing data attribute names with a small unique string (perhaps just an underscore), or__ using verbs for methods and nouns for data attributes.__
Data attributes may be referenced by methods as well as by** ordinary users (“clients”) of an object**.(实例对象的数据属性不但可以被实例方法所使用,也可以被普通用户如调用函数使用。) In other words, classes are not usable to implement __pure abstract data types__. In fact, nothing in Python makes it possible to enforce data hiding — it is all based upon convention. (On the other hand, the Python implementation, written in C, can completely hide implementation details and control access to an object if necessary; this can be used by extensions to Python written in C.)
**Clients should use data attributes with care** — clients may mess up invariants maintained by the methods by stamping on their data attributes. Note that __clients may add data attributes of their own to an instance object__ without affecting the validity of the methods, as long as name conflicts are avoided — again, a naming convention can save a lot of headaches here.
客户可以使用对象的数据属性,因此就有可能破坏本来**由类方法**负责维护的对象状态的完整性。
There is __no shorthand__ for referencing data attributes (or other methods!) from within methods. I find that this actually increases the readability of methods: there is no chance of **confusing local variables(位于类对象中的命名空间) and instance variables(位于实例对象的命名空间)** when glancing through a method.
Often, the first argument of a method is called **self**. This is nothing more than a convention: the name self has absolutely no special meaning to Python. Note, however, that by not following the convention your code may be less readable to other Python programmers, and it is also conceivable that a class browser program might be written that relies upon such a convention.
__Any function object that is a class attribute defines a method for instances of that class.__ It is not necessary that the function definition is textually enclosed in the class definition: assigning a function object to **a local variable in the class **is also ok. For example:
# Function defined outside the class
def f1(self, x, y): #self不可省。
return min(x, x+y)
class C:
__f = f1__
def g(self):
return 'hello world'
h = g
Now **f, g and h are all attributes of class C** that refer to __function objects__, and consequently they are __all methods of instances of C__ — h being exactly equivalent to g. Note that this practice usually only serves to confuse the reader of a program.
Methods may call other methods by using __method attributes of the self__ argument:
class Bag:
def __init__(self):
__self.__data = []
def add(self, x):
self.data.append(x)
def addtwice(self, x):
__self.add__(x)
self.add(x)
Methods may reference __global names __in the same way as ordinary functions. The global scope associated with a method is__ the module __containing the class definition. (The class itself is never used as a global scope.) While one rarely encounters a good reason for using global data in a method, there are many legitimate uses of the global scope: for one thing, functions and modules imported into the global scope(类定义所在的module) can be used by methods, as well as functions and classes defined in it. Usually, the class containing the method is itself defined in this global scope, and in the next section well find some good reasons why a method would want to reference its own class.
类定义中的属性可以使用全局变量该变量位于类定义所在的module。类属性和函数可以使用模块导入的函数或其它模块。
__Each value is an object__, and therefore has a class (also called its type). It is stored as **object.__class__**.
在Python中所有的值都是一个有类型的对象它的内部保存的变量__calss__指示了类型名称。
===== 9.5. Inheritance =====
Of course, a language feature would not be worthy of the name “class” without supporting inheritance. The syntax for a derived class definition looks like this:
class DerivedClassName(BaseClassName):
<statement-1>
.
.
.
<statement-N>
The name BaseClassName must be defined in a scope containing the derived class definition. In place of a base class name, **other arbitrary expressions** are also allowed. This can be useful, for example, when the base class is defined in another module:
class DerivedClassName(__modname.BaseClassName__):
Execution of a derived class definition proceeds the same as for a base class. When the class object is constructed, **the base class is remembered**. This is used for __resolving attribute references__: if a requested attribute is not found in the class, the search proceeds to look in the base class. This rule is applied__ recursively__ if the base class itself is derived from some other class.
Theres nothing special about instantiation of derived classes: DerivedClassName() creates a** new instance** of the class. __Method references__ are resolved as follows: the corresponding class attribute is searched, descending down the chain of base classes if necessary, and the method reference is valid if this yields a function object.
Derived classes may __override methods__ of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it. (For C++ programmers: __all methods in Python are effectively virtual__.)
python中__所有的方法都是虚方法__这样都存在多态性。因此基类中一个方法调用**同类中**定义的另一个方法时实际调用的可能是__子类中重载的该方法__。如果要确信实际调用的是本类定义的方法可以使用BaseClassName.methodname(self, arguments)的形式。
An overriding method in a derived class may in fact want to **extend **rather than simply replace the base class method of the **same** name. There is a simple way to __call the base class method directly__: just call **BaseClassName.methodname(self, arguments).** This is occasionally useful to clients as well. (Note that this only works if the base class is accessible as BaseClassName in the global scope.)
Python has two built-in functions that work with inheritance:
* Use isinstance() to check an instances type: isinstance(obj, int) will be True only if __obj.class__ is int or some class derived from int.
* Use issubclass() to check class inheritance: issubclass(bool, int) is True since bool is a subclass of int. However, issubclass(unicode, str) is False since unicode is not a subclass of str (they only share a common ancestor, **basestring**).
==== 9.5.1. Multiple Inheritance ====
Python supports a limited form of multiple inheritance as well. A class definition with multiple base classes looks like this:
class DerivedClassName(Base1, Base2, Base3):
<statement-1>
.
.
.
<statement-N>
For **old-style **classes, the only rule is__ depth-first, left-to-right__. Thus, if an attribute is not found in DerivedClassName, it is searched in Base1, then (recursively) in the base classes of Base1, and only if it is not found there, it is searched in Base2, and so on.
多重继承的子类在属性或方法解析时,使用的是深度优先,自左向右。
(To some people breadth first — searching Base2 and Base3 before the **base classes of Base1** — looks more natural. However, this would require you to know whether a particular attribute of Base1 is actually defined in Base1 or in one of its base classes before you can figure out the consequences of a __name conflict__ with an attribute of Base2. The depth-first rule makes no differences between** direct and inherited **attributes of Base1.)
For new-style classes, the method resolution order changes dynamically to support cooperative calls to__ super()__. This approach is known in some other multiple-inheritance languages as__ call-next-method__ and is more powerful than the super call found in single-inheritance languages.
With new-style classes, **dynamic ordering** is necessary because all cases of multiple inheritance exhibit one or more diamond relationships (where at least one of the parent classes can be accessed through multiple paths from the bottommost class). For example, all new-style classes inherit from object, so any case of multiple inheritance provides __more than one path to reach object__. To keep the base classes from being accessed more than once, the dynamic algorithm __linearizes the search order__ in a way that preserves the left-to-right ordering specified in each class, that calls each parent only once, and that is monotonic (meaning that a class can be subclassed without affecting the precedence order of its parents). Taken together, these properties make it possible to design reliable and extensible classes with multiple inheritance. For more detail, see http://www.python.org/download/releases/2.3/mro/.
===== 9.6. Private Variables =====
“Private” instance variables that cannot be accessed except from inside an object __dont exist__ in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a **non-public **part of the API (whether it is a function, a method or a data member). It should be considered an **implementation detail** and subject to change without notice.
Since there is a valid use-case for class-private members (namely to avoid name clashes of names with names defined by subclasses), there is limited support for such a mechanism, called __name mangling ['mæŋgəl]v.碾压,损坏, 糟蹋, 乱切n.碾压机__. Any identifier of the form **__spam** (at least two leading underscores, at most one trailing underscore) is textually replaced with** _classname__spam**, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, as long as it occurs within the definition of a class.
Name mangling is helpful for __letting subclasses override methods without breaking intraclass method calls__. For example:
命名切换非常适合于**子类想重载父类的方法**但又不破坏**父类方法调用原定义于父类中的方法的情况(正常情况下,类中定义的所有函数都是虚函数,因此父类中方法调用的可能是子类重载的方法。)。**
class Mapping:
def init(self, iterable):
self.items_list = []
self.__update(iterable) #注意使用的是命名切换的方法名称。__update()实际为_Mapping__update()。这样__update方法__就可以当作私有方法__。
def update(self, iterable): #update()为即将在子类中重载的方法。
for item in iterable:
self.items_list.append(item)
____update = update __ # private copy of original update() method __update__不随__update的重载而改变。
class MappingSubclass(Mapping):
def update(self, keys, values):
# provides new signature for update()
# but does not break init()
for item in zip(keys, values):
self.items_list.append(item)
Note that the mangling rules are designed mostly to __avoid accidents__; it still is possible to access or modify a variable that is considered private. This can even be useful in special circumstances, such as in the debugger.
Notice that code passed to **exec, eval() or execfile()** does not consider the classname of the invoking class to be the current class; this is similar to the effect of the global statement, the effect of which is likewise restricted to code that is byte-compiled together. The same restriction applies to **getattr(), setattr() and delattr()**, as well as when referencing dict directly.
===== 9.7. Odds and Ends =====
Sometimes it is useful to have a data type similar to the Pascal “record” or __C “struct”__, bundling together a few named data items. __An empty class __definition will do nicely:
class Employee:
pass
john = Employee() # Create an empty employee record
# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000
空类实例可以当作一个__结构体__来使用。(这是由于python中对象的数据属性可以__动态添加__。)
A piece of Python code that expects **a particular abstract data type** can often be passed a class that emulates the methods of that data type instead. For instance, if you have a function that formats some data from a file object, you can define a class with methods read() and readline() that get the data from a string buffer instead, and pass it as an argument.
Instance method objects have attributes, too: m.im_self is the instance object with the method m(), and m.im_func is the function object corresponding to the method.
===== 9.8. Exceptions Are Classes Too =====
User-defined exceptions are identified by classes as well. Using this mechanism it is possible __to create extensible hierarchies of exceptions__.
There are two new valid (semantic) forms for the raise statement:
* raise Class, instance
* raise instance
In the first form, instance must be an instance of Class or of a class derived from it. The second form is a shorthand for:
raise instance.class, instance
A class in an except clause is compatible with an exception if it is the same class or a base class thereof (but not the other way around — an except clause listing a derived class is not compatible with a base class). For example, the following code will print B, C, D in that order:
也就是说__子类实例也是父类的实例但是父类实例不是子类的实例。__
class B:
pass
class C(B):
pass
class D(C):
pass
D类实例是C类实例是B类实例反之则不行。
for c in [B, C, D]:
try:
raise c()
except D:
print "D"
except C:
print "C"
except B:
print "B"
Note that if the except clauses were reversed (with except B first), it would have printed B, B, B — the first matching except clause is triggered.
When an error message is printed for an unhandled exception, the exceptions __class name__ is printed, then a colon and a space, and finally the__ instance converted to a string__ using the built-in function** str()**.
===== 9.9. Iterators =====
By now you have probably noticed that most __container objects__ can be looped over using a for statement:
for element in [1, 2, 3]:
print element
for element in (1, 2, 3):
print element
for key in {'one':1, 'two':2}:
print key
for char in "123":
print char
for line in **open("myfile.txt")**:
print line____
This style of access is clear, concise, and convenient. __The use of iterators pervades and unifies Python__. Behind the scenes, the for statement calls** iter() **on the container object. The function returns an** iterator object** that defines the method** next()** which accesses elements in the container one at a time. When there are no more elements, next() raises a __StopIteration exception __which tells the for loop to terminate. This example shows how it all works:
>>>
>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> it.next()
'a'
>>> it.next()
'b'
>>> it.next()
'c'
>>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
it.next()
**StopIteration**
Having seen the mechanics behind the iterator protocol, it is easy to **add iterator behavior to your classes**. Define an__ iter()__ method which returns an object with a__ next()__ method. **If the class defines next(), then iter() can just return self**:
class Reverse:
"""Iterator for looping over a sequence backwards."""
def init(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def __next__(self):
if self.index == 0:
raise **StopIteration**
self.index = self.index - 1
return self.data[self.index]
>>>
>>> rev = Reverse('spam')
>>> iter(rev)
<main.**Reverse object** at 0x00A1DB50>
>>> for char in rev:
... print char
...
m
a
p
s
===== 9.10. Generators =====
Generators are a simple and powerful tool __for creating iterators__. They are written like regular functions but use the** yield statement** whenever they want to return data. Each time next() is called, the generator resumes where it left-off (it remembers all the data values and which statement was last executed). An example shows that generators can be trivially easy to create:
def reverse(data):
for index in range(len(data)-1, -1, -1):
__ yield data[index]__
>>>
>>> for char in reverse('golf'): #reverse()产生一个__生成器对象__。
... print char
...
f
l
o
g
Anything that can be done with generators can also be done with class based iterators as described in the previous section. What makes generators so compact is that the** iter() and next() methods are created automatically.**
Another key feature is that the __local variables and execution state are automatically saved between calls__. This made the function easier to write and much more clear than an approach using instance variables like self.index and self.data.
In addition to automatic method creation and saving program state, when generators terminate, they **automatically raise StopIteration**. In combination, these features make it easy to create iterators with no more effort than writing a regular function.
===== 9.11. Generator Expressions =====
Some simple generators can be coded succinctly as expressions using a syntax similar to __list comprehensions__ but __with parentheses__** instead of brackets**. These expressions are designed for situations where the generator is used right away __by an enclosing function__. Generator expressions are more compact but less versatile than full generator definitions and tend to be more memory friendly than equivalent list comprehensions.
Examples:
>>>
>>> sum(i*i for i in range(10)) # sum of squares
285
>>> xvec = [10, 20, 30]
>>> yvec = [7, 5, 3]
>>> sum(x*y for x,y in zip(xvec, yvec)) # dot product
260
>>> from math import __pi, sin__
>>> sine_table = dict(**(x, sin(x*pi/180))** for x in range(0, 91))
>>> unique_words = set(__word for line in page for word in line.split()__)
>>> valedictorian = max((student.gpa, student.name) for student in graduates)
>>> data = 'golf'
>>> list(data[i] for i in range(len(data)-1,-1,-1))
['f', 'l', 'o', 'g']
Footnotes
[1] Except for one thing.** Module objects** have a secret read-only attribute called __dict__ which returns the dictionary used to implement the** modules namespace**; the name dict is an attribute but not a global name. Obviously, using this violates the abstraction of namespace implementation, and should be restricted to things like post-mortem debuggers.

View File

@@ -0,0 +1,7 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-13T17:18:21+08:00
====== WSGI ======
Created Thursday 13 October 2011

View File

@@ -0,0 +1,726 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-13T22:14:43+08:00
====== Python Web Server Gateway Interface v1.0 ======
Created Thursday 13 October 2011
http://www.python.org/dev/peps/pep-0333/
PEP: 333
Title: Python Web Server Gateway Interface v1.0
Version: 763b6e5c6cf1
Last-Modified: 2011-03-04 04:58:22 +0000 (Fri, 04 Mar 2011)
Author: Phillip J. Eby <pje at telecommunity.com>
Discussions-To: Python Web-SIG <web-sig at python.org>
Status: Final
Type: Informational
Content-Type: text/x-rst
Created: 07-Dec-2003
Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004, 27-Sep-2010
Superseded-By: 3333
Contents
Preface
Abstract
Rationale and Goals
Specification Overview
The Application/Framework Side
The Server/Gateway Side
Middleware: Components that Play Both Sides
Specification Details
environ Variables
Input and Error Streams
The start_response() Callable
Handling the Content-Length Header
Buffering and Streaming
Middleware Handling of Block Boundaries
The write() Callable
Unicode Issues
Error Handling
HTTP 1.1 Expect/Continue
Other HTTP Features
Thread Support
Implementation/Application Notes
Server Extension APIs
Application Configuration
URL Reconstruction
Supporting Older (<2.2) Versions of Python
Optional Platform-Specific File Handling
Questions and Answers
Proposed/Under Discussion
Acknowledgements
References
Copyright
====== Preface ======
Note: For an updated version of this spec that supports Python 3.x and includes community errata, addenda, and clarifications, please see PEP 3333 instead.
===== Abstract =====
This document specifies a proposed **standard interface** **between web servers and Python web applications or frameworks, to** **promote web application portability across a variety of web servers.**
===== Rationale and Goals =====
Python currently boasts a wide variety of **web application frameworks**, such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to name just a few [1]. This wide variety of choices can be a problem for new Python users, because generally speaking, **their choice of web framework will limit their choice of usable web servers**, and vice versa.
By contrast, although Java has just as many web application frameworks available, Java's "servlet" API makes it possible for applications written with any Java web application framework to run in any web server that supports the servlet API.
The availability and widespread use of such an API in web servers for Python -- whether those servers are written in Python (e.g. Medusa), embed Python (e.g. mod_python), or invoke Python via a gateway protocol (e.g. CGI, FastCGI, etc.) -- would** separate choice of framework from choice of web server**, freeing users to choose a pairing that suits them, while freeing framework and server developers to focus on their preferred area of specialization.
__This PEP, therefore, proposes a simple and universal interface between web servers and web applications or frameworks: the Python Web Server Gateway Interface (WSGI).__
But the mere existence of a WSGI spec does nothing to address the existing state of servers and frameworks for Python web applications. Server and framework authors and maintainers must actually** implement WSGI **for there to be any effect.
However, since no existing servers or frameworks support WSGI, there is little immediate reward for an author who implements WSGI support. Thus, WSGI must be easy to implement, so that an author's initial investment in the interface can be reasonably low.
Thus, __simplicity of implementation __on both the server and framework sides of the interface is absolutely critical to the utility of the WSGI interface, and is therefore the principal criterion for any design decisions.
Note, however, that simplicity of implementation for a framework author is not the same thing as ease of use for a web application author. WSGI presents an absolutely "no frills(褶皱、花边、装饰)" interface to the framework author, because bells and whistles like response objects and cookie handling would just get in the way of existing frameworks' handling of these issues. Again, __the goal of WSGI is to facilitate easy interconnection of existing servers and applications or frameworks__, not to create a new web framework.
Note also that this goal precludes WSGI from requiring anything that is not already available in deployed versions of Python. Therefore, new standard library modules are not proposed or required by this specification, and nothing in WSGI requires a Python version greater than 2.2.2. (It would be a good idea, however, for future versions of Python to include support for this interface in web servers provided by the standard library.)
In addition to ease of implementation for existing and future frameworks and servers, it should also be easy to create request preprocessors, response postprocessors, and other __WSGI-based "middleware" components__ that look like an application to their containing server, while acting as a server for their contained applications.
If middleware can be both simple and robust, and WSGI is widely available in servers and frameworks, it allows for the possibility of an entirely new kind of Python web application framework: one consisting of loosely-coupled WSGI middleware components. Indeed, existing framework authors may even choose to refactor their frameworks' existing services to be provided in this way, becoming more like libraries used with WSGI, and less like monolithic frameworks. This would then allow application developers to choose "best-of-breed" components for specific functionality, rather than having to commit to all the pros and cons of a single framework.
Of course, as of this writing, that day is doubtless quite far off. In the meantime, it is a sufficient short-term goal for WSGI to enable the use of any framework with any server.
Finally, it should be mentioned that the current version of WSGI does not prescribe any particular mechanism for "deploying" an application for use with a web server or server gateway. At the present time, this is necessarily implementation-defined by the server or gateway. After a sufficient number of servers and frameworks have implemented WSGI to provide field experience with varying deployment requirements, it may make sense to create another PEP, describing a deployment standard for WSGI servers and application frameworks.
====== Specification Overview ======
The WSGI interface has two sides: the "server" or "gateway" side, and the "application" or "framework" side. The server side invokes a callable object that is provided by the application side. The specifics of how that object is provided are up to the server or gateway. It is assumed that some servers or gateways will require an application's deployer to write a short script to create an instance of the server or gateway, and supply it with the application object. Other servers and gateways may use configuration files or other mechanisms to specify where an application object should be imported from, or otherwise obtained.
In addition to "pure" servers/gateways and applications/frameworks, it is also possible to create "middleware" components that implement both sides of this specification. Such components act as an application to their containing server, and as a server to a contained application, and can be used to provide extended APIs, content transformation, navigation, and other useful functions.
Throughout this specification, we will use the term "a callable" to mean "a function, method, class, or an instance with a __call__ method". It is up to the server, gateway, or application implementing the callable to choose the appropriate implementation technique for their needs. Conversely, a server, gateway, or application that is invoking a callable must not have any dependency on what kind of callable was provided to it. Callables are only to be called, not introspected upon.
The Application/Framework Side
The application object is simply a callable object that accepts two arguments. The term "object" should not be misconstrued as requiring an actual object instance: a function, method, class, or instance with a __call__ method are all acceptable for use as an application object. Application objects must be able to be invoked more than once, as virtually all servers/gateways (other than CGI) will make such repeated requests.
(Note: although we refer to it as an "application" object, this should not be construed to mean that application developers will use WSGI as a web programming API! It is assumed that application developers will continue to use existing, high-level framework services to develop their applications. WSGI is a tool for framework and server developers, and is not intended to directly support application developers.)
Here are two example application objects; one is a function, and the other is a class:
def simple_app(environ, start_response):
"""Simplest possible application object"""
status = '200 OK'
response_headers = [('Content-type', 'text/plain')]
start_response(status, response_headers)
return ['Hello world!\n']
class AppClass:
"""Produce the same output, but using a class
(Note: 'AppClass' is the "application" here, so calling it
returns an instance of 'AppClass', which is then the iterable
return value of the "application callable" as required by
the spec.
If we wanted to use *instances* of 'AppClass' as application
objects instead, we would have to implement a '__call__'
method, which would be invoked to execute the application,
and we would need to create an instance for use by the
server or gateway.
"""
def __init__(self, environ, start_response):
self.environ = environ
self.start = start_response
def __iter__(self):
status = '200 OK'
response_headers = [('Content-type', 'text/plain')]
self.start(status, response_headers)
yield "Hello world!\n"
The Server/Gateway Side
The server or gateway invokes the application callable once for each request it receives from an HTTP client, that is directed at the application. To illustrate, here is a simple CGI gateway, implemented as a function taking an application object. Note that this simple example has limited error handling, because by default an uncaught exception will be dumped to sys.stderr and logged by the web server.
import os, sys
def run_with_cgi(application):
environ = dict(os.environ.items())
environ['wsgi.input'] = sys.stdin
environ['wsgi.errors'] = sys.stderr
environ['wsgi.version'] = (1, 0)
environ['wsgi.multithread'] = False
environ['wsgi.multiprocess'] = True
environ['wsgi.run_once'] = True
if environ.get('HTTPS', 'off') in ('on', '1'):
environ['wsgi.url_scheme'] = 'https'
else:
environ['wsgi.url_scheme'] = 'http'
headers_set = []
headers_sent = []
def write(data):
if not headers_set:
raise AssertionError("write() before start_response()")
elif not headers_sent:
# Before the first output, send the stored headers
status, response_headers = headers_sent[:] = headers_set
sys.stdout.write('Status: %s\r\n' % status)
for header in response_headers:
sys.stdout.write('%s: %s\r\n' % header)
sys.stdout.write('\r\n')
sys.stdout.write(data)
sys.stdout.flush()
def start_response(status, response_headers, exc_info=None):
if exc_info:
try:
if headers_sent:
# Re-raise original exception if headers sent
raise exc_info[0], exc_info[1], exc_info[2]
finally:
exc_info = None # avoid dangling circular ref
elif headers_set:
raise AssertionError("Headers already set!")
headers_set[:] = [status, response_headers]
return write
result = application(environ, start_response)
try:
for data in result:
if data: # don't send headers until body appears
write(data)
if not headers_sent:
write('') # send headers now if body was empty
finally:
if hasattr(result, 'close'):
result.close()
Middleware: Components that Play Both Sides
Note that a single object may play the role of a server with respect to some application(s), while also acting as an application with respect to some server(s). Such "middleware" components can perform such functions as:
Routing a request to different application objects based on the target URL, after rewriting the environ accordingly.
Allowing multiple applications or frameworks to run side-by-side in the same process
Load balancing and remote processing, by forwarding requests and responses over a network
Perform content postprocessing, such as applying XSL stylesheets
The presence of middleware in general is transparent to both the "server/gateway" and the "application/framework" sides of the interface, and should require no special support. A user who desires to incorporate middleware into an application simply provides the middleware component to the server, as if it were an application, and configures the middleware component to invoke the application, as if the middleware component were a server. Of course, the "application" that the middleware wraps may in fact be another middleware component wrapping another application, and so on, creating what is referred to as a "middleware stack".
For the most part, middleware must conform to the restrictions and requirements of both the server and application sides of WSGI. In some cases, however, requirements for middleware are more stringent than for a "pure" server or application, and these points will be noted in the specification.
Here is a (tongue-in-cheek) example of a middleware component that converts text/plain responses to pig latin, using Joe Strout's piglatin.py. (Note: a "real" middleware component would probably use a more robust way of checking the content type, and should also check for a content encoding. Also, this simple example ignores the possibility that a word might be split across a block boundary.)
from piglatin import piglatin
class LatinIter:
"""Transform iterated output to piglatin, if it's okay to do so
Note that the "okayness" can change until the application yields
its first non-empty string, so 'transform_ok' has to be a mutable
truth value.
"""
def __init__(self, result, transform_ok):
if hasattr(result, 'close'):
self.close = result.close
self._next = iter(result).next
self.transform_ok = transform_ok
def __iter__(self):
return self
def next(self):
if self.transform_ok:
return piglatin(self._next())
else:
return self._next()
class Latinator:
# by default, don't transform output
transform = False
def __init__(self, application):
self.application = application
def __call__(self, environ, start_response):
transform_ok = []
def start_latin(status, response_headers, exc_info=None):
# Reset ok flag, in case this is a repeat call
del transform_ok[:]
for name, value in response_headers:
if name.lower() == 'content-type' and value == 'text/plain':
transform_ok.append(True)
# Strip content-length if present, else it'll be wrong
response_headers = [(name, value)
for name, value in response_headers
if name.lower() != 'content-length'
]
break
write = start_response(status, response_headers, exc_info)
if transform_ok:
def write_latin(data):
write(piglatin(data))
return write_latin
else:
return write
return LatinIter(self.application(environ, start_latin), transform_ok)
# Run foo_app under a Latinator's control, using the example CGI gateway
from foo_app import foo_app
run_with_cgi(Latinator(foo_app))
Specification Details
The application object must accept two positional arguments. For the sake of illustration, we have named them environ and start_response, but they are not required to have these names. A server or gateway must invoke the application object using positional (not keyword) arguments. (E.g. by calling result = application(environ, start_response) as shown above.)
The environ parameter is a dictionary object, containing CGI-style environment variables. This object must be a builtin Python dictionary (not a subclass, UserDict or other dictionary emulation), and the application is allowed to modify the dictionary in any way it desires. The dictionary must also include certain WSGI-required variables (described in a later section), and may also include server-specific extension variables, named according to a convention that will be described below.
The start_response parameter is a callable accepting two required positional arguments, and one optional argument. For the sake of illustration, we have named these arguments status, response_headers, and exc_info, but they are not required to have these names, and the application must invoke the start_response callable using positional arguments (e.g. start_response(status, response_headers)).
The status parameter is a status string of the form "999 Message here", and response_headers is a list of (header_name, header_value) tuples describing the HTTP response header. The optional exc_info parameter is described below in the sections on The start_response() Callable and Error Handling. It is used only when the application has trapped an error and is attempting to display an error message to the browser.
The start_response callable must return a write(body_data) callable that takes one positional parameter: a string to be written as part of the HTTP response body. (Note: the write() callable is provided only to support certain existing frameworks' imperative output APIs; it should not be used by new applications or frameworks if it can be avoided. See the Buffering and Streaming section for more details.)
When called by the server, the application object must return an iterable yielding zero or more strings. This can be accomplished in a variety of ways, such as by returning a list of strings, or by the application being a generator function that yields strings, or by the application being a class whose instances are iterable. Regardless of how it is accomplished, the application object must always return an iterable yielding zero or more strings.
The server or gateway must transmit the yielded strings to the client in an unbuffered fashion, completing the transmission of each string before requesting another one. (In other words, applications should perform their own buffering. See the Buffering and Streaming section below for more on how application output must be handled.)
The server or gateway should treat the yielded strings as binary byte sequences: in particular, it should ensure that line endings are not altered. The application is responsible for ensuring that the string(s) to be written are in a format suitable for the client. (The server or gateway may apply HTTP transfer encodings, or perform other transformations for the purpose of implementing HTTP features such as byte-range transmission. See Other HTTP Features, below, for more details.)
If a call to len(iterable) succeeds, the server must be able to rely on the result being accurate. That is, if the iterable returned by the application provides a working __len__() method, it must return an accurate result. (See the Handling the Content-Length Header section for information on how this would normally be used.)
If the iterable returned by the application has a close() method, the server or gateway must call that method upon completion of the current request, whether the request was completed normally, or terminated early due to an error. (This is to support resource release by the application. This protocol is intended to complement PEP 325's generator support, and other common iterables with close() methods.
(Note: the application must invoke the start_response() callable before the iterable yields its first body string, so that the server can send the headers before any body content. However, this invocation may be performed by the iterable's first iteration, so servers must not assume that start_response() has been called before they begin iterating over the iterable.)
Finally, servers and gateways must not directly use any other attributes of the iterable returned by the application, unless it is an instance of a type specific to that server or gateway, such as a "file wrapper" returned by wsgi.file_wrapper (see Optional Platform-Specific File Handling). In the general case, only attributes specified here, or accessed via e.g. the PEP 234 iteration APIs are acceptable.
environ Variables
The environ dictionary is required to contain these CGI environment variables, as defined by the Common Gateway Interface specification [2]. The following variables must be present, unless their value would be an empty string, in which case they may be omitted, except as otherwise noted below.
REQUEST_METHOD
The HTTP request method, such as "GET" or "POST". This cannot ever be an empty string, and so is always required.
SCRIPT_NAME
The initial portion of the request URL's "path" that corresponds to the application object, so that the application knows its virtual "location". This may be an empty string, if the application corresponds to the "root" of the server.
PATH_INFO
The remainder of the request URL's "path", designating the virtual "location" of the request's target within the application. This may be an empty string, if the request URL targets the application root and does not have a trailing slash.
QUERY_STRING
The portion of the request URL that follows the "?", if any. May be empty or absent.
CONTENT_TYPE
The contents of any Content-Type fields in the HTTP request. May be empty or absent.
CONTENT_LENGTH
The contents of any Content-Length fields in the HTTP request. May be empty or absent.
SERVER_NAME, SERVER_PORT
When combined with SCRIPT_NAME and PATH_INFO, these variables can be used to complete the URL. Note, however, that HTTP_HOST, if present, should be used in preference to SERVER_NAME for reconstructing the request URL. See the URL Reconstruction section below for more detail. SERVER_NAME and SERVER_PORT can never be empty strings, and so are always required.
SERVER_PROTOCOL
The version of the protocol the client used to send the request. Typically this will be something like "HTTP/1.0" or "HTTP/1.1" and may be used by the application to determine how to treat any HTTP request headers. (This variable should probably be called REQUEST_PROTOCOL, since it denotes the protocol used in the request, and is not necessarily the protocol that will be used in the server's response. However, for compatibility with CGI we have to keep the existing name.)
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with "HTTP_"). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request.
A server or gateway should attempt to provide as many other CGI variables as are applicable. In addition, if SSL is in use, the server or gateway should also provide as many of the Apache SSL environment variables [5] as are applicable, such as HTTPS=on and SSL_PROTOCOL. Note, however, that an application that uses any CGI variables other than the ones listed above are necessarily non-portable to web servers that do not support the relevant extensions. (For example, web servers that do not publish files will not be able to provide a meaningful DOCUMENT_ROOT or PATH_TRANSLATED.)
A WSGI-compliant server or gateway should document what variables it provides, along with their definitions as appropriate. Applications should check for the presence of any variables they require, and have a fallback plan in the event such a variable is absent.
Note: missing variables (such as REMOTE_USER when no authentication has occurred) should be left out of the environ dictionary. Also note that CGI-defined variables must be strings, if they are present at all. It is a violation of this specification for a CGI variable's value to be of any type other than str.
In addition to the CGI-defined variables, the environ dictionary may also contain arbitrary operating-system "environment variables", and must contain the following WSGI-defined variables:
Variable Value
wsgi.version The tuple (1, 0), representing WSGI version 1.0.
wsgi.url_scheme A string representing the "scheme" portion of the URL at which the application is being invoked. Normally, this will have the value "http" or "https", as appropriate.
wsgi.input An input stream (file-like object) from which the HTTP request body can be read. (The server or gateway may perform reads on-demand as requested by the application, or it may pre- read the client's request body and buffer it in-memory or on disk, or use any other technique for providing such an input stream, according to its preference.)
wsgi.errors
An output stream (file-like object) to which error output can be written, for the purpose of recording program or other errors in a standardized and possibly centralized location. This should be a "text mode" stream; i.e., applications should use "\n" as a line ending, and assume that it will be converted to the correct line ending by the server/gateway.
For many servers, wsgi.errors will be the server's main error log. Alternatively, this may be sys.stderr, or a log file of some sort. The server's documentation should include an explanation of how to configure this or where to find the recorded output. A server or gateway may supply different error streams to different applications, if this is desired.
wsgi.multithread This value should evaluate true if the application object may be simultaneously invoked by another thread in the same process, and should evaluate false otherwise.
wsgi.multiprocess This value should evaluate true if an equivalent application object may be simultaneously invoked by another process, and should evaluate false otherwise.
wsgi.run_once This value should evaluate true if the server or gateway expects (but does not guarantee!) that the application will only be invoked this one time during the life of its containing process. Normally, this will only be true for a gateway based on CGI (or something similar).
Finally, the environ dictionary may also contain server-defined variables. These variables should be named using only lower-case letters, numbers, dots, and underscores, and should be prefixed with a name that is unique to the defining server or gateway. For example, mod_python might define variables with names like mod_python.some_variable.
Input and Error Streams
The input and error streams provided by the server must support the following methods:
Method Stream Notes
read(size) input 1
readline() input 1, 2
readlines(hint) input 1, 3
__iter__() input
flush() errors 4
write(str) errors
writelines(seq) errors
The semantics of each method are as documented in the Python Library Reference, except for these notes as listed in the table above:
The server is not required to read past the client's specified Content-Length, and is allowed to simulate an end-of-file condition if the application attempts to read past that point. The application should not attempt to read more data than is specified by the CONTENT_LENGTH variable.
The optional "size" argument to readline() is not supported, as it may be complex for server authors to implement, and is not often used in practice.
Note that the hint argument to readlines() is optional for both caller and implementer. The application is free not to supply it, and the server or gateway is free to ignore it.
Since the errors stream may not be rewound, servers and gateways are free to forward write operations immediately, without buffering. In this case, the flush() method may be a no-op. Portable applications, however, cannot assume that output is unbuffered or that flush() is a no-op. They must call flush() if they need to ensure that output has in fact been written. (For example, to minimize intermingling of data from multiple processes writing to the same error log.)
The methods listed in the table above must be supported by all servers conforming to this specification. Applications conforming to this specification must not use any other methods or attributes of the input or errors objects. In particular, applications must not attempt to close these streams, even if they possess close() methods.
The start_response() Callable
The second parameter passed to the application object is a callable of the form start_response(status, response_headers, exc_info=None). (As with all WSGI callables, the arguments must be supplied positionally, not by keyword.) The start_response callable is used to begin the HTTP response, and it must return a write(body_data) callable (see the Buffering and Streaming section, below).
The status argument is an HTTP "status" string like "200 OK" or "404 Not Found". That is, it is a string consisting of a Status-Code and a Reason-Phrase, in that order and separated by a single space, with no surrounding whitespace or other characters. (See RFC 2616, Section 6.1.1 for more information.) The string must not contain control characters, and must not be terminated with a carriage return, linefeed, or combination thereof.
The response_headers argument is a list of (header_name, header_value) tuples. It must be a Python list; i.e. type(response_headers) is ListType, and the server may change its contents in any way it desires. Each header_name must be a valid HTTP header field-name (as defined by RFC 2616, Section 4.2), without a trailing colon or other punctuation.
Each header_value must not include any control characters, including carriage returns or linefeeds, either embedded or at the end. (These requirements are to minimize the complexity of any parsing that must be performed by servers, gateways, and intermediate response processors that need to inspect or modify response headers.)
In general, the server or gateway is responsible for ensuring that correct headers are sent to the client: if the application omits a header required by HTTP (or other relevant specifications that are in effect), the server or gateway must add it. For example, the HTTP Date: and Server: headers would normally be supplied by the server or gateway.
(A reminder for server/gateway authors: HTTP header names are case-insensitive, so be sure to take that into consideration when examining application-supplied headers!)
Applications and middleware are forbidden from using HTTP/1.1 "hop-by-hop" features or headers, any equivalent features in HTTP/1.0, or any headers that would affect the persistence of the client's connection to the web server. These features are the exclusive province of the actual web server, and a server or gateway should consider it a fatal error for an application to attempt sending them, and raise an error if they are supplied to start_response(). (For more specifics on "hop-by-hop" features and headers, please see the Other HTTP Features section below.)
The start_response callable must not actually transmit the response headers. Instead, it must store them for the server or gateway to transmit only after the first iteration of the application return value that yields a non-empty string, or upon the application's first invocation of the write() callable. In other words, response headers must not be sent until there is actual body data available, or until the application's returned iterable is exhausted. (The only possible exception to this rule is if the response headers explicitly include a Content-Length of zero.)
This delaying of response header transmission is to ensure that buffered and asynchronous applications can replace their originally intended output with error output, up until the last possible moment. For example, the application may need to change the response status from "200 OK" to "500 Internal Error", if an error occurs while the body is being generated within an application buffer.
The exc_info argument, if supplied, must be a Python sys.exc_info() tuple. This argument should be supplied by the application only if start_response is being called by an error handler. If exc_info is supplied, and no HTTP headers have been output yet, start_response should replace the currently-stored HTTP response headers with the newly-supplied ones, thus allowing the application to "change its mind" about the output when an error has occurred.
However, if exc_info is provided, and the HTTP headers have already been sent, start_response must raise an error, and should raise the exc_info tuple. That is:
raise exc_info[0], exc_info[1], exc_info[2]
This will re-raise the exception trapped by the application, and in principle should abort the application. (It is not safe for the application to attempt error output to the browser once the HTTP headers have already been sent.) The application must not trap any exceptions raised by start_response, if it called start_response with exc_info. Instead, it should allow such exceptions to propagate back to the server or gateway. See Error Handling below, for more details.
The application may call start_response more than once, if and only if the exc_info argument is provided. More precisely, it is a fatal error to call start_response without the exc_info argument if start_response has already been called within the current invocation of the application. (See the example CGI gateway above for an illustration of the correct logic.)
Note: servers, gateways, or middleware implementing start_response should ensure that no reference is held to the exc_info parameter beyond the duration of the function's execution, to avoid creating a circular reference through the traceback and frames involved. The simplest way to do this is something like:
def start_response(status, response_headers, exc_info=None):
if exc_info:
try:
# do stuff w/exc_info here
finally:
exc_info = None # Avoid circular ref.
The example CGI gateway provides another illustration of this technique.
Handling the Content-Length Header
If the application does not supply a Content-Length header, a server or gateway may choose one of several approaches to handling it. The simplest of these is to close the client connection when the response is completed.
Under some circumstances, however, the server or gateway may be able to either generate a Content-Length header, or at least avoid the need to close the client connection. If the application does not call the write() callable, and returns an iterable whose len() is 1, then the server can automatically determine Content-Length by taking the length of the first string yielded by the iterable.
And, if the server and client both support HTTP/1.1 "chunked encoding" [3], then the server may use chunked encoding to send a chunk for each write() call or string yielded by the iterable, thus generating a Content-Length header for each chunk. This allows the server to keep the client connection alive, if it wishes to do so. Note that the server must comply fully with RFC 2616 when doing this, or else fall back to one of the other strategies for dealing with the absence of Content-Length.
(Note: applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as "hop-by-hop" operations, these encodings are the province of the actual web server/gateway. See Other HTTP Features below, for more details.)
Buffering and Streaming
Generally speaking, applications will achieve the best throughput by buffering their (modestly-sized) output and sending it all at once. This is a common approach in existing frameworks such as Zope: the output is buffered in a StringIO or similar object, then transmitted all at once, along with the response headers.
The corresponding approach in WSGI is for the application to simply return a single-element iterable (such as a list) containing the response body as a single string. This is the recommended approach for the vast majority of application functions, that render HTML pages whose text easily fits in memory.
For large files, however, or for specialized uses of HTTP streaming (such as multipart "server push"), an application may need to provide output in smaller blocks (e.g. to avoid loading a large file into memory). It's also sometimes the case that part of a response may be time-consuming to produce, but it would be useful to send ahead the portion of the response that precedes it.
In these cases, applications will usually return an iterator (often a generator-iterator) that produces the output in a block-by-block fashion. These blocks may be broken to coincide with mulitpart boundaries (for "server push"), or just before time-consuming tasks (such as reading another block of an on-disk file).
WSGI servers, gateways, and middleware must not delay the transmission of any block; they must either fully transmit the block to the client, or guarantee that they will continue transmission even while the application is producing its next block. A server/gateway or middleware may provide this guarantee in one of three ways:
Send the entire block to the operating system (and request that any O/S buffers be flushed) before returning control to the application, OR
Use a different thread to ensure that the block continues to be transmitted while the application produces the next block.
(Middleware only) send the entire block to its parent gateway/server
By providing this guarantee, WSGI allows applications to ensure that transmission will not become stalled at an arbitrary point in their output data. This is critical for proper functioning of e.g. multipart "server push" streaming, where data between multipart boundaries should be transmitted in full to the client.
Middleware Handling of Block Boundaries
In order to better support asynchronous applications and servers, middleware components must not block iteration waiting for multiple values from an application iterable. If the middleware needs to accumulate more data from the application before it can produce any output, it must yield an empty string.
To put this requirement another way, a middleware component must yield at least one value each time its underlying application yields a value. If the middleware cannot yield any other value, it must yield an empty string.
This requirement ensures that asynchronous applications and servers can conspire to reduce the number of threads that are required to run a given number of application instances simultaneously.
Note also that this requirement means that middleware must return an iterable as soon as its underlying application returns an iterable. It is also forbidden for middleware to use the write() callable to transmit data that is yielded by an underlying application. Middleware may only use their parent server's write() callable to transmit data that the underlying application sent using a middleware-provided write() callable.
The write() Callable
Some existing application framework APIs support unbuffered output in a different manner than WSGI. Specifically, they provide a "write" function or method of some kind to write an unbuffered block of data, or else they provide a buffered "write" function and a "flush" mechanism to flush the buffer.
Unfortunately, such APIs cannot be implemented in terms of WSGI's "iterable" application return value, unless threads or other special mechanisms are used.
Therefore, to allow these frameworks to continue using an imperative API, WSGI includes a special write() callable, returned by the start_response callable.
New WSGI applications and frameworks should not use the write() callable if it is possible to avoid doing so. The write() callable is strictly a hack to support imperative streaming APIs. In general, applications should produce their output via their returned iterable, as this makes it possible for web servers to interleave other tasks in the same Python thread, potentially providing better throughput for the server as a whole.
The write() callable is returned by the start_response() callable, and it accepts a single parameter: a string to be written as part of the HTTP response body, that is treated exactly as though it had been yielded by the output iterable. In other words, before write() returns, it must guarantee that the passed-in string was either completely sent to the client, or that it is buffered for transmission while the application proceeds onward.
An application must return an iterable object, even if it uses write() to produce all or part of its response body. The returned iterable may be empty (i.e. yield no non-empty strings), but if it does yield non-empty strings, that output must be treated normally by the server or gateway (i.e., it must be sent or queued immediately). Applications must not invoke write() from within their return iterable, and therefore any strings yielded by the iterable are transmitted after all strings passed to write() have been sent to the client.
Unicode Issues
HTTP does not directly support Unicode, and neither does this interface. All encoding/decoding must be handled by the application; all strings passed to or from the server must be standard Python byte strings, not Unicode objects. The result of using a Unicode object where a string object is required, is undefined.
Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.
On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all "strings" referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). It is a fatal error for an application to supply strings containing any other Unicode character or code point. Similarly, servers and gateways must not supply strings to an application containing any other Unicode characters.
Again, all strings referred to in this specification must be of type str or StringType, and must not be of type unicode or UnicodeType. And, even if a given platform allows for more than 8 bits per character in str/StringType objects, only the lower 8 bits may be used, for any value referred to in this specification as a "string".
Error Handling
In general, applications should try to trap their own, internal errors, and display a helpful message in the browser. (It is up to the application to decide what "helpful" means in this context.)
However, to display such a message, the application must not have actually sent any data to the browser yet, or else it risks corrupting the response. WSGI therefore provides a mechanism to either allow the application to send its error message, or be automatically aborted: the exc_info argument to start_response. Here is an example of its use:
try:
# regular application code here
status = "200 Froody"
response_headers = [("content-type", "text/plain")]
start_response(status, response_headers)
return ["normal body goes here"]
except:
# XXX should trap runtime issues like MemoryError, KeyboardInterrupt
# in a separate handler before this bare 'except:'...
status = "500 Oops"
response_headers = [("content-type", "text/plain")]
start_response(status, response_headers, sys.exc_info())
return ["error body goes here"]
If no output has been written when an exception occurs, the call to start_response will return normally, and the application will return an error body to be sent to the browser. However, if any output has already been sent to the browser, start_response will reraise the provided exception. This exception should not be trapped by the application, and so the application will abort. The server or gateway can then trap this (fatal) exception and abort the response.
Servers should trap and log any exception that aborts an application or the iteration of its return value. If a partial response has already been written to the browser when an application error occurs, the server or gateway may attempt to add an error message to the output, if the already-sent headers indicate a text/* content type that the server knows how to modify cleanly.
Some middleware may wish to provide additional exception handling services, or intercept and replace application error messages. In such cases, middleware may choose to not re-raise the exc_info supplied to start_response, but instead raise a middleware-specific exception, or simply return without an exception after storing the supplied arguments. This will then cause the application to return its error body iterable (or invoke write()), allowing the middleware to capture and modify the error output. These techniques will work as long as application authors:
Always provide exc_info when beginning an error response
Never trap errors raised by start_response when exc_info is being provided
HTTP 1.1 Expect/Continue
Servers and gateways that implement HTTP 1.1 must provide transparent support for HTTP 1.1's "expect/continue" mechanism. This may be done in any of several ways:
Respond to requests containing an Expect: 100-continue request with an immediate "100 Continue" response, and proceed normally.
Proceed with the request normally, but provide the application with a wsgi.input stream that will send the "100 Continue" response if/when the application first attempts to read from the input stream. The read request must then remain blocked until the client responds.
Wait until the client decides that the server does not support expect/continue, and sends the request body on its own. (This is suboptimal, and is not recommended.)
Note that these behavior restrictions do not apply for HTTP 1.0 requests, or for requests that are not directed to an application object. For more information on HTTP 1.1 Expect/Continue, see RFC 2616, sections 8.2.3 and 10.1.1.
Other HTTP Features
In general, servers and gateways should "play dumb" and allow the application complete control over its output. They should only make changes that do not alter the effective semantics of the application's response. It is always possible for the application developer to add middleware components to supply additional features, so server/gateway developers should be conservative in their implementation. In a sense, a server should consider itself to be like an HTTP "gateway server", with the application being an HTTP "origin server". (See RFC 2616, section 1.3, for the definition of these terms.)
However, because WSGI servers and applications do not communicate via HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to WSGI internal communications. WSGI applications must not generate any "hop-by-hop" headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming "hop-by-hop" headers in the environ dictionary. WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.
Applying these principles to a variety of HTTP features, it should be clear that a server may handle cache validation via the If-None-Match and If-Modified-Since request headers and the Last-Modified and ETag response headers. However, it is not required to do this, and the application should perform its own cache validation if it wants to support that feature, since the server/gateway is not required to do such validation.
Similarly, a server may re-encode or transport-encode an application's response, but the application should use a suitable content encoding on its own, and must not apply a transport encoding. A server may transmit byte ranges of the application's response if requested by the client, and the application doesn't natively support byte ranges. Again, however, the application should perform this function on its own if desired.
Note that these restrictions on applications do not necessarily mean that every application must reimplement every HTTP feature; many HTTP features can be partially or fully implemented by middleware components, thus freeing both server and application authors from implementing the same features over and over again.
Thread Support
Thread support, or lack thereof, is also server-dependent. Servers that can run multiple requests in parallel, should also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server.
Implementation/Application Notes
Server Extension APIs
Some server authors may wish to expose more advanced APIs, that application or framework authors can use for specialized purposes. For example, a gateway based on mod_python might wish to expose part of the Apache API as a WSGI extension.
In the simplest case, this requires nothing more than defining an environ variable, such as mod_python.some_api. But, in many cases, the possible presence of middleware can make this difficult. For example, an API that offers access to the same HTTP headers that are found in environ variables, might return different data if environ has been modified by middleware.
In general, any extension API that duplicates, supplants, or bypasses some portion of WSGI functionality runs the risk of being incompatible with middleware components. Server/gateway developers should not assume that nobody will use middleware, because some framework developers specifically intend to organize or reorganize their frameworks to function almost entirely as middleware of various kinds.
So, to provide maximum compatibility, servers and gateways that provide extension APIs that replace some WSGI functionality, must design those APIs so that they are invoked using the portion of the API that they replace. For example, an extension API to access HTTP request headers must require the application to pass in its current environ, so that the server/gateway may verify that HTTP headers accessible via the API have not been altered by middleware. If the extension API cannot guarantee that it will always agree with environ about the contents of HTTP headers, it must refuse service to the application, e.g. by raising an error, returning None instead of a header collection, or whatever is appropriate to the API.
Similarly, if an extension API provides an alternate means of writing response data or headers, it should require the start_response callable to be passed in, before the application can obtain the extended service. If the object passed in is not the same one that the server/gateway originally supplied to the application, it cannot guarantee correct operation and must refuse to provide the extended service to the application.
These guidelines also apply to middleware that adds information such as parsed cookies, form variables, sessions, and the like to environ. Specifically, such middleware should provide these features as functions which operate on environ, rather than simply stuffing values into environ. This helps ensure that information is calculated from environ after any middleware has done any URL rewrites or other environ modifications.
It is very important that these "safe extension" rules be followed by both server/gateway and middleware developers, in order to avoid a future in which middleware developers are forced to delete any and all extension APIs from environ to ensure that their mediation isn't being bypassed by applications using those extensions!
Application Configuration
This specification does not define how a server selects or obtains an application to invoke. These and other configuration options are highly server-specific matters. It is expected that server/gateway authors will document how to configure the server to execute a particular application object, and with what options (such as threading options).
Framework authors, on the other hand, should document how to create an application object that wraps their framework's functionality. The user, who has chosen both the server and the application framework, must connect the two together. However, since both the framework and the server now have a common interface, this should be merely a mechanical matter, rather than a significant engineering effort for each new server/framework pair.
Finally, some applications, frameworks, and middleware may wish to use the environ dictionary to receive simple string configuration options. Servers and gateways should support this by allowing an application's deployer to specify name-value pairs to be placed in environ. In the simplest case, this support can consist merely of copying all operating system-supplied environment variables from os.environ into the environ dictionary, since the deployer in principle can configure these externally to the server, or in the CGI case they may be able to be set via the server's configuration files.
Applications should try to keep such required variables to a minimum, since not all servers will support easy configuration of them. Of course, even in the worst case, persons deploying an application can create a script to supply the necessary configuration values:
from the_app import application
def new_app(environ, start_response):
environ['the_app.configval1'] = 'something'
return application(environ, start_response)
But, most existing applications and frameworks will probably only need a single configuration value from environ, to indicate the location of their application or framework-specific configuration file(s). (Of course, applications should cache such configuration, to avoid having to re-read it upon each invocation.)
URL Reconstruction
If an application wishes to reconstruct a request's complete URL, it may do so using the following algorithm, contributed by Ian Bicking:
from urllib import quote
url = environ['wsgi.url_scheme']+'://'
if environ.get('HTTP_HOST'):
url += environ['HTTP_HOST']
else:
url += environ['SERVER_NAME']
if environ['wsgi.url_scheme'] == 'https':
if environ['SERVER_PORT'] != '443':
url += ':' + environ['SERVER_PORT']
else:
if environ['SERVER_PORT'] != '80':
url += ':' + environ['SERVER_PORT']
url += quote(environ.get('SCRIPT_NAME', ''))
url += quote(environ.get('PATH_INFO', ''))
if environ.get('QUERY_STRING'):
url += '?' + environ['QUERY_STRING']
Note that such a reconstructed URL may not be precisely the same URI as requested by the client. Server rewrite rules, for example, may have modified the client's originally requested URL to place it in a canonical form.
Supporting Older (<2.2) Versions of Python
Some servers, gateways, or applications may wish to support older (<2.2) versions of Python. This is especially important if Jython is a target platform, since as of this writing a production-ready version of Jython 2.2 is not yet available.
For servers and gateways, this is relatively straightforward: servers and gateways targeting pre-2.2 versions of Python must simply restrict themselves to using only a standard "for" loop to iterate over any iterable returned by an application. This is the only way to ensure source-level compatibility with both the pre-2.2 iterator protocol (discussed further below) and "today's" iterator protocol (see PEP 234).
(Note that this technique necessarily applies only to servers, gateways, or middleware that are written in Python. Discussion of how to use iterator protocol(s) correctly from other languages is outside the scope of this PEP.)
For applications, supporting pre-2.2 versions of Python is slightly more complex:
You may not return a file object and expect it to work as an iterable, since before Python 2.2, files were not iterable. (In general, you shouldn't do this anyway, because it will perform quite poorly most of the time!) Use wsgi.file_wrapper or an application-specific file wrapper class. (See Optional Platform-Specific File Handling for more on wsgi.file_wrapper, and an example class you can use to wrap a file as an iterable.)
If you return a custom iterable, it must implement the pre-2.2 iterator protocol. That is, provide a __getitem__ method that accepts an integer key, and raises IndexError when exhausted. (Note that built-in sequence types are also acceptable, since they also implement this protocol.)
Finally, middleware that wishes to support pre-2.2 versions of Python, and iterates over application return values or itself returns an iterable (or both), must follow the appropriate recommendations above.
(Note: It should go without saying that to support pre-2.2 versions of Python, any server, gateway, application, or middleware must also use only language features available in the target version, use 1 and 0 instead of True and False, etc.)
Optional Platform-Specific File Handling
Some operating environments provide special high-performance file- transmission facilities, such as the Unix sendfile() call. Servers and gateways may expose this functionality via an optional wsgi.file_wrapper key in the environ. An application may use this "file wrapper" to convert a file or file-like object into an iterable that it then returns, e.g.:
if 'wsgi.file_wrapper' in environ:
return environ['wsgi.file_wrapper'](filelike, block_size)
else:
return iter(lambda: filelike.read(block_size), '')
If the server or gateway supplies wsgi.file_wrapper, it must be a callable that accepts one required positional parameter, and one optional positional parameter. The first parameter is the file-like object to be sent, and the second parameter is an optional block size "suggestion" (which the server/gateway need not use). The callable must return an iterable object, and must not perform any data transmission until and unless the server/gateway actually receives the iterable as a return value from the application. (To do otherwise would prevent middleware from being able to interpret or override the response data.)
To be considered "file-like", the object supplied by the application must have a read() method that takes an optional size argument. It may have a close() method, and if so, the iterable returned by wsgi.file_wrapper must have a close() method that invokes the original file-like object's close() method. If the "file-like" object has any other methods or attributes with names matching those of Python built-in file objects (e.g. fileno()), the wsgi.file_wrapper may assume that these methods or attributes have the same semantics as those of a built-in file object.
The actual implementation of any platform-specific file handling must occur after the application returns, and the server or gateway checks to see if a wrapper object was returned. (Again, because of the presence of middleware, error handlers, and the like, it is not guaranteed that any wrapper created will actually be used.)
Apart from the handling of close(), the semantics of returning a file wrapper from the application should be the same as if the application had returned iter(filelike.read, ''). In other words, transmission should begin at the current position within the "file" at the time that transmission begins, and continue until the end is reached.
Of course, platform-specific file transmission APIs don't usually accept arbitrary "file-like" objects. Therefore, a wsgi.file_wrapper has to introspect the supplied object for things such as a fileno() (Unix-like OSes) or a java.nio.FileChannel (under Jython) in order to determine if the file-like object is suitable for use with the platform-specific API it supports.
Note that even if the object is not suitable for the platform API, the wsgi.file_wrapper must still return an iterable that wraps read() and close(), so that applications using file wrappers are portable across platforms. Here's a simple platform-agnostic file wrapper class, suitable for old (pre 2.2) and new Pythons alike:
class FileWrapper:
def __init__(self, filelike, blksize=8192):
self.filelike = filelike
self.blksize = blksize
if hasattr(filelike, 'close'):
self.close = filelike.close
def __getitem__(self, key):
data = self.filelike.read(self.blksize)
if data:
return data
raise IndexError
and here is a snippet from a server/gateway that uses it to provide access to a platform-specific API:
environ['wsgi.file_wrapper'] = FileWrapper
result = application(environ, start_response)
try:
if isinstance(result, FileWrapper):
# check if result.filelike is usable w/platform-specific
# API, and if so, use that API to transmit the result.
# If not, fall through to normal iterable handling
# loop below.
for data in result:
# etc.
finally:
if hasattr(result, 'close'):
result.close()
Questions and Answers
Why must environ be a dictionary? What's wrong with using a subclass?
The rationale for requiring a dictionary is to maximize portability between servers. The alternative would be to define some subset of a dictionary's methods as being the standard and portable interface. In practice, however, most servers will probably find a dictionary adequate to their needs, and thus framework authors will come to expect the full set of dictionary features to be available, since they will be there more often than not. But, if some server chooses not to use a dictionary, then there will be interoperability problems despite that server's "conformance" to spec. Therefore, making a dictionary mandatory simplifies the specification and guarantees interoperabilty.
Note that this does not prevent server or framework developers from offering specialized services as custom variables inside the environ dictionary. This is the recommended approach for offering any such value-added services.
Why can you call write() and yield strings/return an iterable? Shouldn't we pick just one way?
If we supported only the iteration approach, then current frameworks that assume the availability of "push" suffer. But, if we only support pushing via write(), then server performance suffers for transmission of e.g. large files (if a worker thread can't begin work on a new request until all of the output has been sent). Thus, this compromise allows an application framework to support both approaches, as appropriate, but with only a little more burden to the server implementor than a push-only approach would require.
What's the close() for?
When writes are done during the execution of an application object, the application can ensure that resources are released using a try/finally block. But, if the application returns an iterable, any resources used will not be released until the iterable is garbage collected. The close() idiom allows an application to release critical resources at the end of a request, and it's forward-compatible with the support for try/finally in generators that's proposed by PEP 325.
Why is this interface so low-level? I want feature X! (e.g. cookies, sessions, persistence, ...)
This isn't Yet Another Python Web Framework. It's just a way for frameworks to talk to web servers, and vice versa. If you want these features, you need to pick a web framework that provides the features you want. And if that framework lets you create a WSGI application, you should be able to run it in most WSGI-supporting servers. Also, some WSGI servers may offer additional services via objects provided in their environ dictionary; see the applicable server documentation for details. (Of course, applications that use such extensions will not be portable to other WSGI-based servers.)
Why use CGI variables instead of good old HTTP headers? And why mix them in with WSGI-defined variables?
Many existing web frameworks are built heavily upon the CGI spec, and existing web servers know how to generate CGI variables. In contrast, alternative ways of representing inbound HTTP information are fragmented and lack market share. Thus, using the CGI "standard" seems like a good way to leverage existing implementations. As for mixing them with WSGI variables, separating them would just require two dictionary arguments to be passed around, while providing no real benefits.
What about the status string? Can't we just use the number, passing in 200 instead of "200 OK"?
Doing this would complicate the server or gateway, by requiring them to have a table of numeric statuses and corresponding messages. By contrast, it is easy for an application or framework author to type the extra text to go with the specific response code they are using, and existing frameworks often already have a table containing the needed messages. So, on balance it seems better to make the application/framework responsible, rather than the server or gateway.
Why is wsgi.run_once not guaranteed to run the app only once?
Because it's merely a suggestion to the application that it should "rig for infrequent running". This is intended for application frameworks that have multiple modes of operation for caching, sessions, and so forth. In a "multiple run" mode, such frameworks may preload caches, and may not write e.g. logs or session data to disk after each request. In "single run" mode, such frameworks avoid preloading and flush all necessary writes after each request.
However, in order to test an application or framework to verify correct operation in the latter mode, it may be necessary (or at least expedient) to invoke it more than once. Therefore, an application should not assume that it will definitely not be run again, just because it is called with wsgi.run_once set to True.
Feature X (dictionaries, callables, etc.) are ugly for use in application code; why don't we use objects instead?
All of these implementation choices of WSGI are specifically intended to decouple features from one another; recombining these features into encapsulated objects makes it somewhat harder to write servers or gateways, and an order of magnitude harder to write middleware that replaces or modifies only small portions of the overall functionality.
In essence, middleware wants to have a "Chain of Responsibility" pattern, whereby it can act as a "handler" for some functions, while allowing others to remain unchanged. This is difficult to do with ordinary Python objects, if the interface is to remain extensible. For example, one must use __getattr__ or __getattribute__ overrides, to ensure that extensions (such as attributes defined by future WSGI versions) are passed through.
This type of code is notoriously difficult to get 100% correct, and few people will want to write it themselves. They will therefore copy other people's implementations, but fail to update them when the person they copied from corrects yet another corner case.
Further, this necessary boilerplate would be pure excise, a developer tax paid by middleware developers to support a slightly prettier API for application framework developers. But, application framework developers will typically only be updating one framework to support WSGI, and in a very limited part of their framework as a whole. It will likely be their first (and maybe their only) WSGI implementation, and thus they will likely implement with this specification ready to hand. Thus, the effort of making the API "prettier" with object attributes and suchlike would likely be wasted for this audience.
We encourage those who want a prettier (or otherwise improved) WSGI interface for use in direct web application programming (as opposed to web framework development) to develop APIs or frameworks that wrap WSGI for convenient use by application developers. In this way, WSGI can remain conveniently low-level for server and middleware authors, while not being "ugly" for application developers.
Proposed/Under Discussion
These items are currently being discussed on the Web-SIG and elsewhere, or are on the PEP author's "to-do" list:
Should wsgi.input be an iterator instead of a file? This would help for asynchronous applications and chunked-encoding input streams.
Optional extensions are being discussed for pausing iteration of an application's output until input is available or until a callback occurs.
Add a section about synchronous vs. asynchronous apps and servers, the relevant threading models, and issues/design goals in these areas.
Acknowledgements
Thanks go to the many folks on the Web-SIG mailing list whose thoughtful feedback made this revised draft possible. Especially:
Gregory "Grisha" Trubetskoy, author of mod_python, who beat up on the first draft as not offering any advantages over "plain old CGI", thus encouraging me to look for a better approach.
Ian Bicking, who helped nag me into properly specifying the multithreading and multiprocess options, as well as badgering me to provide a mechanism for servers to supply custom extension data to an application.
Tony Lownds, who came up with the concept of a start_response function that took the status and headers, returning a write function. His input also guided the design of the exception handling facilities, especially in the area of allowing for middleware that overrides application error messages.
Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython (well before the spec was finalized) helped to shape the "supporting older versions of Python" section, as well as the optional wsgi.file_wrapper facility.
Mark Nottingham, who reviewed the spec extensively for issues with HTTP RFC compliance, especially with regard to HTTP/1.1 features that I didn't even know existed until he pointed them out.
References
[1] The Python Wiki "Web Programming" topic (http://www.python.org/cgi-bin/moinmoin/WebProgramming)
[2] The Common Gateway Interface Specification, v 1.1, 3rd Draft (http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt)
[3] "Chunked Transfer Coding" -- HTTP/1.1, section 3.6.1 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)
[4] "End-to-end and Hop-by-hop Headers" -- HTTP/1.1, Section 13.5.1 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1)
[5] mod_ssl Reference, "Environment Variables" (http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25)
Copyright
This document has been placed in the public domain.

View File

@@ -0,0 +1,195 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-13T22:34:33+08:00
====== wsgi初探 ======
Created Thursday 13 October 2011
http://linluxiang.iteye.com/blog/799163
===== 前言 =====
本文不涉及WSGI的具体协议的介绍也不会有协议完整的实现甚至描述中还会掺杂着本人自己对于WSGI的见解。所有的WSGI官方定义请看http://www.python.org/dev/peps/pep-3333/。
===== WSGI是什么 =====
WSGI的官方定义是the Python Web Server Gateway Interface。从名字就可以看出来这东西是一个Gateway也就是网关。网关的作用就是在协议之间进行转换。
也就是说WSGI就像是一座桥梁一边连着web服务器另一边连着用户的应用。但是呢这个桥的功能很弱有时候还需要别的桥来帮忙才能进行处理。
下面对本文出现的一些名词做定义。
wsgi app ,又称应用 就是一个WSGI application。
wsgi container ,又称 容器 虽然这个部分常常被称为handler不过我个人认为handler容易和app混淆所以我称之为容器。 wsgi_middleware ,又称 中间件 。一种特殊类型的程序,专门负责在容器和应用之间干坏事的。
一图胜千言直接来一个我自己理解的WSGI架构图吧。
{{~/sync/notes/zim/python/WSGI/wsgi初探/1.jpg}}
可以看出,服务器,容器和应用之间存在着十分纠结的关系。下面就要把这些纠结的关系理清楚。
===== WSGI应用 =====
WSGI应用其实就是一个**callable的对象**。举一个最简单的例子,假设存在如下的一个应用:
def application(environ, start_response):
status = '200 OK'
output = 'World!'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(12)]
__ write =__ start_response(status, response_headers)
write('Hello ')
return [output]
这个WSGI应用简单的可以用简陋来形容但是他的确是一个功能完整的WSGI应用。只不过给人留下了太多的疑点environ是什么start_response是什么为什么可以同时用write和return来返回内容
对于这些疑问不妨自己猜测一下他的作用。联想到CGI那么environ可能就是一系列的环境变量用来**表示HTTP请求的信息(客户端发过来的)**比如说method之类的。start_response可能是接受**HTTP response头信息(应用返回给客户端的Http头信息)**然后返回一个write函数这个write函数可以把__HTTP response的body__返回给客户端。return自然是将HTTP response的body信息返回。不过这里的write和函数返回有什么区别会不会是其实外围默认调用write对应用返回值进行处理而且为什么应用的返回值是一个__列表__呢说明肯定存在一个__对应用执行结果的迭代输出过程__。难道说他隐含的支持iterator或者generator吗
等等应用执行结果__一个应用既然是一个函数说明肯定有一个对象去执行它并且可以猜到这个对象把environ和start_response传给应用将应用的返回结果输出给客户端。那么这个对象是什么呢自然就是WSGI容器了。__
===== WSGI容器 =====
先说说WSGI容器的来源其实这是我自己编造出来的一个概念。来源就是JavaServlet容器。我个人理解两者有相似的地方就顺手拿过来用了。
__WSGI容器的作用就是构建一个让WSGI应用成功执行的环境__。成功执行意味着需要传入正确的参数以及正确处理返回的结果还得把结果返回给客户端。
所以WSGI容器的工作流程大致就是用webserver规定的通信方式能从webserver获得正确的request信息__封装好__传给WSGI应用执行正确的返回response。
一般来说WSGI容器必须__依附于现有的webserver的技术__才能实现比如说CGIFastCGI或者是embed的模式。
下面利用CGI的方式编写一个最简单的WSGI容器。关于WSGI容器的协议官方文档并没有具体的说如何实现只是介绍了一些需要约束的东西。具体内容看PEP3333中的协议。
#!/usr/bin/python
#encoding:utf8
import cgi
import cgitb
import sys
import os
#Make the environ argument
environ = {}
environ['REQUEST_METHOD'] = os.environ['REQUEST_METHOD']
environ['SCRIPT_NAME'] = os.environ['SCRIPT_NAME']
environ['PATH_INFO'] = os.environ['PATH_INFO']
environ['QUERY_STRING'] = os.environ['QUERY_STRING']
environ['CONTENT_TYPE'] = os.environ['CONTENT_TYPE']
environ['CONTENT_LENGTH'] = os.environ['CONTENT_LENGTH']
environ['SERVER_NAME'] = os.environ['SERVER_NAME']
environ['SERVER_PORT'] = os.environ['SERVER_PORT']
environ['SERVER_PROTOCOL'] = os.environ['SERVER_PROTOCOL']
environ['wsgi.version'] = (1, 0)
environ['wsgi.url_scheme'] = 'http'
environ['wsgi.input'] = sys.stdin
environ['wsgi.errors'] = sys.stderr
environ['wsgi.multithread'] = False
environ['wsgi.multiprocess'] = True
environ['wsgi.run_once'] = True
#make the start_response argument
#注意WSGI协议规定如果没有body内容是不能返回http response头信息的。
sent_header = False
res_status = None
res_headers = None
def write(body):
global sent_header
if sent_header:
sys.stdout.write(body)
else:
print res_status
for k, v in res_headers:
print k + ': ' + v
print
sys.stdout.write(body)
sent_header = True
def start_response(status, response_headers):
global res_status
global res_headers
res_status = status
res_headers = response_headers
return write
#here is the application
def application(environ, start_response):
status = '200 OK'
output = 'World!'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(12)]
write = start_response(status, response_headers)
write('Hello ')
return [output]
#here run the application
result = application(environ, start_response)
for value in result:
write(value)
看吧。其实实现一个WSGI容器也不难。
不过我从WSGI容器的设计中可以看出WSGI的应用设计上面存在着一个重大的问题就是为什么要提供两种方式返回数据明明只有一个write函数却既可以在application里面调用又可以在容器中传输应用的返回值来调用。如果说让我来设计的话直接把start_response给去掉了。就用application(environ)这个接口。传一个方法然后返回值就__是status, response_headers和一个字符串的列表__。实际传输的方法全部隐藏了。用户只需要从environ中读取数据处理就行了。。
可喜的是搜了一下貌似web3的标准里面应用的设计和我的想法类似。希望web3协议能早日普及。
====== Middleware中间件 ======
中间件是一类特殊的程序可以在容器和应用之间干一些坏事。。其实熟悉python的decorator的人就会发现这和decoraotr没什么区别。
下面来实现一个route的简单middleware。
class Router(object):
def __init__(self):
self.path_info = {}
def route(self, environ, start_response):
application = self.path_info[environ['PATH_INFO']]
return application(environ, start_response)
def __call__(self, path):
def wrapper(application):
self.path_info[path] = application
return wrapper
这就是一个很简单的路由功能的middleware。将上面那段wsgi容器的代码里面的应用修改成如下
router = Router()
#here is the application
@router('/hello')
def hello(environ, start_response):
status = '200 OK'
output = 'Hello'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
write = start_response(status, response_headers)
return [output]
@router('/world')
def world(environ, start_response):
status = '200 OK'
output = 'World!'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
write = start_response(status, response_headers)
return [output]
#here run the application
result = router.route(environ, start_response)
for value in result:
write(value)
这样,**容器就会自动的根据访问的地址找到对应的app执行了**。
====== 延伸 ======
写着写着怎么越来越像一个框架了看来Python开发框架真是简单。。
其实从另外一个角度去考虑。如果把application当作是一个运算单元。利用middleware调控IO和运算资源那么利用WSGI组成一个分布式的系统。
好吧,全文完。

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

View File

@@ -0,0 +1,411 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-13T22:13:34+08:00
====== 主题在Python3.0中处理web请求-封装wsgi ======
Created Thursday 13 October 2011
http://www.iteye.com/topic/396244
# -*- coding: utf-8 -*-
import socketserver, re, cgi, io, urllib.parse
from wsgiref.simple_server import WSGIServer
class AppException(Exception):
pass
class Request(object):
"""保存客户端请求信息"""
def __init__(self, env):
self.env = env
self.winput = env["wsgi.input"]
self.method = env["REQUEST_METHOD"] # 获取请求方法(GET or POST)
self.__attrs = {}
self.attributes = {}
self.encoding = "UTF-8"
def __getattr__(self, attr):
if(attr == "params" and "params" not in self.__attrs):
fp = None
if(self.method == "POST"):
content = self.winput.read(int(self.env.get("CONTENT_LENGTH","0")))
#fp = io.StringIO(content.decode(self.encoding))
fp = io.StringIO(urllib.parse.unquote(content.decode("ISO-8859-1"),encoding=self.encoding))
self.fs = cgi.FieldStorage(fp = fp, environ=self.env, keep_blank_values=1)# 创建FieldStorage
self.params = {}
for key in self.fs.keys():
self.params[key] = self.fs[key].value
self.__attrs["params"] = self.params
return self.__attrs[attr]
class Response(object):
"""对客户端进行响应"""
def __init__(self, start_response, write = None):
self.encoding = "UTF-8"
self.start_response = start_response
self._write = write
def write(self, string):
"""向流中写数据
@param string:要写到流中的字符串
"""
if(self._write is None):
self._write = self.start_response("200 OK", [("Content-type","text/html;charset="+self.encoding)])
self._write(string.encode(self.encoding).decode("ISO-8859-1"))
def redirect(self, url):
"""跳转"""
if(self._write is not None):
raise AppException("响应流已写入数据,无法进行跳转。")
self.start_response("302 OK", [("Location",url)])
class ThreadingWSGIServer(WSGIServer, socketserver.ThreadingMixIn):
"""一个使用多线程处理请求的WSGI服务类"""
pass
class WSGIApplication(object):
"""WSGI服务器程序"""
def __init__(self, urls=None):
self.urls = urls # URL映射
def getHandlerByUrl(self, url):
"""根据URL获取处理程序如果没有找到该处理程序则返回None"""
url = url.replace("//","/") # 避免输入错误引起的url解释错误
urlArr = url.split('/')
for setUrl in self.urls.keys():
setUrlArr = setUrl.split("/")
#print(setUrl.replace("*",r'\w*'))
if(len(setUrlArr) == len(urlArr)):
for i in range(len(urlArr)):
if(i == len(urlArr) - 1 and
(setUrlArr[i] == '*' or setUrlArr[i] == urlArr[i] or
('*' in setUrlArr[i] and re.search(setUrlArr[i].replace("*",r'\w*'),urlArr[i])))):
return self.urls[setUrl]
if(setUrlArr[i] == '*' or setUrlArr[i]==' '):
continue;
if(setUrlArr[i] != urlArr[i]):
break;
def make_app(self):
"""建立WSGI响应程序"""
def wsgi_app(env, start_response):
#print(";\n".join([k+"="+str(v) for k, v in env.items()]))
url = env["PATH_INFO"] # 获取当前请求URL
handlerCls = self.getHandlerByUrl(url)
if(handlerCls is None):
# 未经定义的url处理
start_response("500 OK", [("Content-type","text/html;charset=utf-8")])
return "Error URL"
if(not hasattr(handlerCls,"doGET") and not hasattr(handlerCls,"doPOST")):
# 映射错误
start_response("500 OK", [("Content-type","text/html;charset=utf-8")])
return "Error Mapping"
request = Request(env)
response = Response(start_response)
try:
handler = handlerCls(request, response)
except TypeError as e:
handler = handlerCls()
methodName = "do" + request.method
returnValue = None
try:
returnValue = getattr(handler,methodName)(request, response)
except TypeError as e:
returnValue = getattr(handler,methodName)()
if(returnValue is None):
returnValue=[]
return returnValue
return wsgi_app
def make_server(self, serverIp='', port=8080, test=False):
"""建立一个默认服务器
@param test: 是否只是做一次测试
"""
from wsgiref.simple_server import make_server # 加载模块
httpd = make_server(serverIp, port, self.make_app(), server_class=ThreadingWSGIServer)
if test: # 如果只是测试
httpd.handle_request() # 处理单次请求
else:
httpd.serve_forever() # 处理多次请求
return True
def main():
app = WSGIApplication(urls={"/a/*":TestHandler, "/a/b/*.do":TestHandler})
app.make_server(test=True)
class TestHandler(object):
def __init__(self):
pass
def doGET(self, request=None, response=None):
request.encoding='UTF-8'
response.write("Hello")
def doPOST(self, request=None, response=None):
#request.encoding='UTF-8'
#response.write(request.params["name"])
response.redirect("/a/x")
if __name__=="__main__":
main()
#input()
接上篇 在Python3.0中处理web请求-继续封装wsgi
这次加入了Cookies封装session支持从线程作用域获取request,response等。目前session还不能被持久化
http://www.iteye.com/topic/397437
# -*- coding: utf-8 -*-
import socketserver, re, cgi, io, urllib.parse
from wsgiref.simple_server import WSGIServer
import threading, time, urllib, guid
from http.cookies import SimpleCookie
ctx = context = threading.local()
class AppException(Exception):
pass
class SessionPool(object):
sessionIdKey = "psessionid"
"""存储Session的地方"""
def __init__(self, session_store_time=30):
"""初始化Session池
@param session_store_time:session存储时间单位分钟
"""
self.session_store_time = session_store_time
self.sessions = {}
def getSession(self, key):
"""从池中获取Session"""
if(key in self.sessions):
session = self.sessions[key]
if(session.isTimeOut()):
self.removeSession(session.sessionId)
else:
return session
return None
def createSession(self):
"""创建一个新的Session"""
sessionId = self.newSessionId()
session = Session(sessionId, self.session_store_time)
self.sessions[sessionId] = session
return session
def removeSession(self, key):
"""删除Session"""
#self.sessions.remove(key)
if(key in self.sessions):
del self.sessions[key]
def newSessionId(self, ip=None):
"""创建一个新的SessionId"""
return guid.generate(ip)
def getSessionByCookie(self, cookie, response=None, create=True):
"""根据Cookie信息找到session"""
sessionId = cookie.get(SessionPool.sessionIdKey, None)
if(sessionId is not None):
sessionId = sessionId.value
session = self.getSession(sessionId)
if(session is not None):
session.lastAccessTime = time.time()
return session
if(create):
session = self.createSession()
response.putCookie(SessionPool.sessionIdKey, session.sessionId)
return session
return None
def saveSessions(self):
pass
class Session(dict):
"""一个客户端会话"""
def __init__(self, sid, store_time):
self.sessionId = sid
self.lastAccessTime = self.createTime = time.time()
self.maxInactiveInterval = store_time # session存储时间单位分钟
def isTimeOut(self):
"""判断是否已超时"""
return time.time() - self.lastAccessTime > self.maxInactiveInterval * 60
class Request(object):
"""保存客户端请求信息"""
def __init__(self, env, sessions):
self.env = env
self.winput = env["wsgi.input"]
self.method = env["REQUEST_METHOD"] # 获取请求方法(GET or POST)
self.__attrs = {}
self.attributes = {}
self.encoding = "UTF-8"
self.cookies = SimpleCookie(env.get("HTTP_COOKIE",""))
self.response = ctx.response
self.sessionPool = sessions
def __getattr__(self, attr):
if(attr == "params" and "params" not in self.__attrs): # 获取客户端请求参数
fp = None
if(self.method == "POST"): #如果请求时以POST方式提交的则以POST方式处理否则以GET方式处理
content = self.winput.read(int(self.env.get("CONTENT_LENGTH","0")))
#fp = io.StringIO(content.decode(self.encoding))
fp = io.StringIO(urllib.parse.unquote(content.decode("ISO-8859-1"),encoding=self.encoding))
self.fs = cgi.FieldStorage(fp = fp, environ=self.env, keep_blank_values=1)# 创建FieldStorage
self.params = {}
for key in self.fs.keys():
self.params[key] = self.fs[key].value
self.__attrs["params"] = self.params
if(attr == "session" and "session" not in self.__attrs): # 该request中不存在session则创建一个
self.session = self.sessionPool.getSessionByCookie(self.cookies, self.response)
return self.session
return self.__attrs[attr]
class Response(object):
"""对客户端进行响应"""
def __init__(self, start_response, write = None):
self.encoding = "UTF-8"
self.start_response = start_response
self._write = write
self.cookies = None
self.headers = {}
def write(self, string):
"""向流中写数据
@param string:要写到流中的字符串
"""
if(self._write is None):
__headers = [("Content-type","text/html;charset="+self.encoding)]
if(self.cookies is not None):
t = ('Set-Cookie', self.cookies.output(header=""))
__headers.append(t)
for k, v in self.headers.items():
t = (k,v)
__headers.append(t)
self._write = self.start_response("200 OK", __headers)
self._write(string.encode(self.encoding).decode("ISO-8859-1"))
def redirect(self, url):
"""跳转"""
if(self._write is not None):
raise AppException("响应流已写入数据,无法进行跳转。")
self.start_response("302 OK", [("Location",url)])
def putCookie(self, key, value, expires=1000000, path='/'):
"""添加Cookie信息"""
if(self.cookies is None):
self.cookies = SimpleCookie()
self.cookies[key] = urllib.parse.quote(value)
self.cookies[key]["expires"] = expires
self.cookies[key]['path'] = path
def addHeaders(key, value):
self.headers[key] = value
#WSGIServer必须放在后面…否则没有异步效果
class ThreadingWSGIServer(socketserver.ThreadingMixIn, WSGIServer):
"""一个使用多线程处理请求的WSGI服务类"""
pass
class WSGIApplication(object):
"""WSGI服务器程序"""
def __init__(self, urls=None):
self.urls = urls # URL映射
self.sessions = SessionPool(1)
def getHandlerByUrl(self, url):
"""根据URL获取处理程序如果没有找到该处理程序则返回None"""
url = url.replace("//","/") # 避免输入错误引起的url解释错误
urlArr = url.split('/')
for setUrl in self.urls.keys():
setUrlArr = setUrl.split("/")
#print(setUrl.replace("*",r'\w*'))
if(len(setUrlArr) == len(urlArr)):
for i in range(len(urlArr)):
if(i == len(urlArr) - 1 and
(setUrlArr[i] == '*' or setUrlArr[i] == urlArr[i] or
('*' in setUrlArr[i] and re.search(setUrlArr[i].replace("*",r'\w*'),urlArr[i])))):
return self.urls[setUrl]
if(setUrlArr[i] == '*' or setUrlArr[i]==' '):
continue;
if(setUrlArr[i] != urlArr[i]):
break;
def make_app(self):
"""建立WSGI响应程序"""
def wsgi_app(env, start_response):
print("start request....")
#print(";\n".join([k+"="+str(v) for k, v in env.items()]))
url = env["PATH_INFO"] # 获取当前请求URL
handlerCls = self.getHandlerByUrl(url)
if(handlerCls is None):
# 未经定义的url处理
start_response("500 OK", [("Content-type","text/html;charset=utf-8")])
return "Error URL"
if(not hasattr(handlerCls,"doGET") and not hasattr(handlerCls,"doPOST")):
# 映射错误
start_response("500 OK", [("Content-type","text/html;charset=utf-8")])
return "Error Mapping"
response = Response(start_response)
ctx.response = response
request = Request(env, self.sessions)
ctx.request = request # 将request和response放入当前线程作用域中方便访问
try:
handler = handlerCls(request, response)
except TypeError as e:
handler = handlerCls()
methodName = "do" + request.method
returnValue = None
try:
returnValue = getattr(handler,methodName)(request, response)
except TypeError as e:
returnValue = getattr(handler,methodName)()
if(returnValue is None):
returnValue=[]
print("end request....")
return returnValue
return wsgi_app
def make_server(self, serverIp='', port=8080, test=False):
"""建立一个默认服务器
@param test: 是否只是做一次测试
"""
from wsgiref.simple_server import make_server # 加载模块
httpd = make_server(serverIp, port, self.make_app(), server_class=ThreadingWSGIServer)
if test: # 如果只是测试
httpd.handle_request() # 处理单次请求
else:
httpd.serve_forever() # 处理多次请求
return True
def main():
app = WSGIApplication(urls={"/a/*":TestHandler, "/a/b/*.do":TestHandler})
app.make_server(test=False,port=9000)
class TestHandler(object):
def __init__(self):
pass
def doGET(self):
ctx.request.encoding='UTF-8'
session = ctx.request.session
if("x" in ctx.request.params):
session["x"] = ctx.request.params["x"]
#time.sleep(3)
ctx.response.write("Hello "+session["x"])
def doPOST(self):
#request.encoding='UTF-8'
#response.write(request.params["name"])
ctx.response.redirect("/a/x")
if __name__=="__main__":
main()
#input()

View File

@@ -0,0 +1,23 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-13T18:28:24+08:00
====== 关于 WSGI ======
Created Thursday 13 October 2011
http://eishn.blog.163.com/blog/static/652318201011082044410/
WSGI 主要是读一下 PEP333 。其实把里面两段示范代码看了就明白了。我读了下示范代码和环境变量的要求, 就写了 (eurasia) WSGI server 事情就这么简单。
一个比较容易产生疑惑的地方是, 可能会把 (1)** WSGI server** 和 (2) 基于 WSGI 的 framework 混淆了。其实 WSGI 是分成 server 和 framework (即 application) 两部分 (当然还有 middleware)。严格说 WSGI 只是一个协议, 规范 server 和 framework 之间连接的接口。
(1) WSGI server 把服务器功能以** WSGI 接口**暴露出来。比如 mod_wsgi 是一种 server, 把 apache 的功能以 WSGI 接口的形式提供出来。
(2) WSGI framework 就是我们经常提到的 Django 这种框架。不过需要注意的是, 很少有单纯的 WSGI framework , 基于 WSGI 的框架往往都自带 WSGI server。比如 Django、CherryPy 都自带 WSGI server 主要是测试用途, 发布时则使用生产环境的 WSGI server。而有些 WSGI 下的框架比如 pylons、bfg 等, 自己不实现 WSGI server。使用 paste 作为 WSGI server。
Paste 是流行的 WSGI server, 带有很多中间件。还有** flup** 也是一个提供中间件的库。
搞清除 WSGI server 和 application, 中间件自然就清楚了。除了** session、cache** 之类的应用, 前段时间看到一个 bfg 下的中间件专门用于给网站换肤的 (skin) 。中间件可以想到的用法还很多。
这里再补充一下, 像 django 这样的框架如何以__ fastcgi CGI也是种规范协议与WSGI不同因此需要转换__的方式跑在 apache 上的。这要用到 flup.fcgi 或者 fastcgi.py (eurasia 中也设计了一个 fastcgi.py 的实现) 这些工具, 它们就是把 fastcgi 协议转换成 WSGI 接口 (把 fastcgi 变成一个 WSGI server) 供框架接入。整个架构是这样的: django -> fcgi2 wsgiserver -> mod_fcgi -> apache 。
虽然我不是 WSGI 的粉丝, 但是不可否认 WSGI 对 python web 的意义重大。有意自己设计 web 框架, 又不想做 socket 层和 http 报文解析的同学, 可以从 WSGI 开始设计自己的框架。在 python 圈子里有个共识, 自己随手搞个 web 框架跟喝口水一样自然, 非常方便。或许每个 python 玩家都会经历一个倒腾框架的阶段吧。

View File

@@ -0,0 +1,41 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-13T17:18:36+08:00
====== 在apache下配置mod wsgi ======
Created Thursday 13 October 2011
http://healich.iteye.com/blog/727620
在apache下配置mod_wsgi
Apache http Server: http://httpd.apache.org/
modwsgi: http://code.google.com/p/modwsgi/, http://code.google.com/p/modwsgi/wiki/InstallationInstructions
WSGI: http://www.python.org/dev/peps/pep-0333/
在安装好apache之后还需要下载mod_wsgi.mod_wsgi是用于apache支持python wsgi协议的扩展当前版本是3.3有windows下支持不同python版本的二进制文件下载。
首先需要使apache httpd服务器加载wsgi_module扩展。将下载的mod_wsgi.so置于apache serverr安装目录的modules文件下在httpd.conf文件中添加如下一行
LoadModule wsgi_module modules/mod_wsgi.so
使用**WSGIScriptAlias**指令来指定wsgi application的启动脚本。在httpd.conf中添加如下一行这里使用默认的DocumentRoot:
WSGIScriptAlias /test "/path/to/docRoot/test.wsgi"
在**/test路径**下访问测试程序wsgi脚本文件为**test.wsgi**
def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
return [output]
重启apache sever之后可以通过http://localhost/test%E6%9D%A5%E8%AE%BF%E9%97%AE%E6%B5%8B%E8%AF%95%E7%A8%8B%E5%BA%8F%E4%BA%86%E3%80%82%E5%A6%82%E6%9E%9C%E6%98%BE%E7%A4%BA%E2%80%9CHello World!”则表明mod_wsgi安装成功。

View File

@@ -0,0 +1,100 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-13T21:50:17+08:00
====== 捉摸Python的WSGI ======
Created Thursday 13 October 2011
http://www.iteye.com/topic/734050
过去的这个月接触的最多的就是Python的WSGI了WSGI不是框架不是模块仅仅是一个**规范协议**,定义了一些**接口**却影响着Python网络开发的方方面面。对于WSGI有这么一段定义WSGI is the Web Server Gateway Interface. It is a specification for **web servers and application servers to communicate with web applications** (though it can also be used for more than that).我想我这篇文章不是详细介绍WSGI内容的只是想扯扯我对WSGI相关的学习。
诚如那个WSGI的定义所说的协议定义了一套接口来实现**服务器端与应用端通信的规范化**(或者说是统一化)。这是怎样的一套接口呢?很简单,尤其是对于应用端。
应用端只需要实现一个**接受**两个参数的含有__call__方法的返回一个可遍历的含有零个或多个string结果的Python对象我强调说Python对象只是想和Java的对象区别开在Python里一个方法、一个类型……都是对象Python是真“一切皆对象”详见《Python源码分析》即可。码农都知道传入参数的名字可以任意取这里也不例外但习惯把第一个参数命名为“environ”第二个为“start_response”。至于这个对象的内容怎样**应用自由发挥**去吧……
服务器端要做的也不复杂,就是对于每一个来访的**请求**,调用一次应用端“**注册**”的那个协议规定应用端必须要实现的对象,然后返回相应的**响应消息**。这样一次服务器端与应用端的通信也就完成了,一次对用户请求的处理也随之完成了!当然了,既然**协议规定了服务器端在调用的时候要传递两个参数**,自然也规定了这两个参数的一些细节。比如第一个参数其实就是一个字典对象,里面是所有从用户请求和服务器环境变量中获取的信息内容,协议当然会定义一些必须有的值,及这些值对应的变量名;第二个参数其实就是一个**回调函数**,它向应用端传递一个用来生成**响应内容体**的write对象这个对象也是有__call__方法的。
协议也提到了,还可以设计**中间件**来连接服务器端与应用端,来实现一些通用的功能,比如**session、routing**等。
具体怎么应用这个协议呢Python自带的**wsgiref一个支持WSGI协议的服务器实现**模块有个简单的例子:
from wsgiref.simple_server import make_server
def hello_world_app(**environ, start_response**):
status = '200 OK' # HTTP Status
headers = [('Content-type', 'text/plain')] # HTTP Headers
** start_response(status, headers) **
# The returned object is going to be printed
** return** ["Hello World"]
httpd = make_server('', 8000, **hello_world_app**) #启动服务器并将app注册到服务器中那个
print "Serving on port 8000..."
# Serve until process is killed
httpd.serve_forever()
这个例子更多体现的是应用端的开发方法,很简单的按照协议**实现一个了满足规范的方法**这样当浏览器向本机8000端口发起一个请求时就会得到一个“Hello World”的字符串文本响应。这个例子虽然简单但非常清楚的说明了应用端与服务器端的接口应用方式。
你可能会想到现在对该端口的不同地址的请求都是由这个“hello_world_app”函数处理的你可以实现一个功能解析一下请求的PATH信息针对**不同的地址**转发给不同的函数或是类来处理你可能会觉得使用environ和start_response这两个参数不直观你可以像Java的servlet那样自己封装成两个request和response对象来用你觉得有些**常用功能**可以提取出来,在**具体应用逻辑之外**来做……哈哈那你就已经在思考怎么做中间件或是Web框架了其实这些也都有人做过了比如Routes、WebOb、Beaker……当然你大可以自己造自己独有的轮子有时候自己做过一遍了才会对现有的成熟的东西有更好的理解最重要的是在Python的世界里这些都不难做到
不知你是不是和我一样在写应用的时候或多或少的会想一下服务器端是怎么运作的呢可能最模糊的流程大家都能想得到服务器开一个socket等待客户端连接请求来了服务器会读出传来的数据然后根据HTTP协议做一些初步的封装接着就可以调用事先注册的应用程序了并将请求的数据塞进去等响应处理完毕了再把数据通过socket发出去over。好在Python的代码简洁而自带的wsgiref中的simple server也很简单就让我们探究一下更具体的实现吧
首先看一下类的继承关系这个simple server真正的类是WSGIServer继承自HTTPServerHTTPServer类又继承自TCPServerTCPServer又继承自BaseServer与server类直接打交道的还有RequestHandler类从最上层的
WSGIRequestHandler —> BaseHTTPRequestHandler —> StreamRequestHandler —> BaseRequestHandler。
相对Java而言不是很复杂吧它们是怎么工作的呢容我稍微解释一下。
让我们从Server的最基类BaseServer看起。它有一段注释非常清楚的介绍了它定义的方法的用处
Methods for the caller:
- __init__(server_address, RequestHandlerClass)
- serve_forever()
- handle_request() # if you do not use serve_forever()
- fileno() -> int # for select()
Methods that may be overridden:
- server_bind()
- server_activate()
- get_request() -> request, client_address
- verify_request(request, client_address)
- server_close()
- process_request(request, client_address)
- close_request(request)
- handle_error()
Methods for derived classes:
- finish_request(request, client_address)
可见一个server类其实就这么几个方法。
在可以被外部调用的四个方法中构造方法显然就是用来创建实例的第四个可能是和构建异步服务器有关的这里就略过了从具体的代码可以看到剩下两个方法的用处是相同的就是处理收到的请求只是serve_forever()方法会在server进程存在期间循环处理而handle_request()处理一次就退出了其实server_forever()就是循环调用了handle_request()。在handle_request()中说明了具体的从接受到返回一个请求的全部流程,代码也很简单:
def handle_request(self):
"""Handle one request, possibly blocking."""
try:
request, client_address = self.get_request()
except socket.error:
return
if self.verify_request(request, client_address):
try:
self.process_request(request, client_address)
except:
self.handle_error(request, client_address)
self.close_request(request)
BaseServer虽然定义了这些内部调用的方法但内容基本都是空的留给了**具体的Server类去实现**。从BaseServer的代码中就可以看到RequestHandler类的用处了它是具体的解析了request的内容它由finish_request()调用而这个finsh_request()方法显然应该是在process_request()方法中被调用的。
TCPServer继承BaseServer类它真正具体化了我们猜测的socket连接的初始化过程。
在与上面两个类相同的源文件中还有两个主要的类ThreadingMixIn和ForkingMixIn这两个类分别重载了process_request()方法并且相应使用了新建一个线程或是进程的方式来调用finish_request()方法。这也从应用的角度解释了为什么要在finish_request()外套一层process_request()而不是直接在handle_request()的第二个try块中调用。
HTTPServer其实做的工作很简单就是记录了socket server的名字。
接下来就该看看WSGIServer了。它做了两件新的工作设置了一些基本的__环境变量值__并且接受__应用程序的注册__。从这个Server的代码可以看出应用端实现的那个接口就是从这里注册到服务器端的而且只能注册一个哦所以要有多个应用只能通过routing的方式来转发调用了。而且这个WSGIServer不是多线程或是多进程的~
至于具体封装请求内容的RequestHandler类就不打算分析了感兴趣的话看官们自个看一下源码吧也很简单哦下一篇博客打算分享一下我对pylons框架的运行过程的学习。

View File

@@ -0,0 +1,281 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-05-22T09:23:45+08:00
====== basic knowledge ======
Created Sunday 22 May 2011
http://www.starming.com/index.php?action=plugin&v=wave&tpl=union&ac=viewgrouppost&gid=73&tid=4591
===== 数值: =====
在Python中有4种类型的数——整数、长整数、浮 点数和复数。
2是一个整数的 例子。
长整数不过是大一些的整数。
3.23和52.3E-4是 浮点数的例子。E标记表示10的幂。在这里52.3E-4表示52.3 * 10-4。
(-5+4j)和(2.3-4.6j)是 复数的例子。
===== 字符串: =====
* 使用单引号('
你可以用单引号指示字符串,就如同'Quote me on this'这样。所有的空白,即空格和制表符都照原样保留。
* 使用双引号("
在双引号中的字符串与单引号中的字符串的使用 完全相同,例如"What's your name?"。
* 使用三引号(''' 或"""
利用三引号,你可以指示一个多行的字符串。你 可以在三引号中自由的使用单引号和双引号。
===== 转义符 =====
输出, '输出' - 和 java 差不多。
“ I am is a \
student!”
表示字符串在下一行继续和“I am is a student!”是一样的。
===== 自然字符串 =====
如果你想要指示某些不需要如转义符那样的特别处理的字 符串那么你需要指定一个自然字符串。自然字符串通过给字符串加上前缀r或R来指定。例如r"Newlines are indicated by n"。
===== Unicode字符串 =====
Unicode是书写国际文本的标准方法。如果你想要 用你的母语如北印度语或阿拉伯语写文本那么你需要有一个支持Unicode的编辑器。类似地Python允许你处理Unicode文本——你只需要在 字符串前加上前缀u或U。例如u"This is a Unicode string."。
记住在你处理文本文件的时候使用Unicode字符 串,特别是当你知道这个文件含有用非英语的语言写的文本。
===== 字符串是不可变的 =====
这意味着一旦你创造了一个字符串,你就不能再改变它 了。虽然这看起来像是一件坏事,但实际上它不是。我们将会在后面的程序中看到为什么我们说它不是一个缺点。
===== 按字面意义级连字符串 =====
如果你把两个字符串按字面意义相邻放着他们会被Python自动级连。例如'What's' 'your name?'会 被自动转为"What's your name?"。
===== 标识符的命名 =====
变量是标识符的例子。 标识符 是用来标识 某样东西 的名字。在命名标识符的时候,你要遵循这些规则:
* 标识符的第一个字符必须是字母表中的字母(大 写或小写)或者一个下划线(‘ _ ’)。
* 标识符名称的其他部分可以由字母(大写或小 写)、下划线(‘ _ 或数字0-9组成。
* 标识符名称是对大小写敏感的。例如myname和myName不 是一个标识符。注意前者中的小写n和后者中的大写N。
* 有效 标识符名称的例子有i、__my_name、name_23和a1b2_c3。
* 无效 标识符名称的例子有2things、this is spaced out和my-name。
===== 对象 =====
记住Python把在程序中用到的**任何东西都称为对象** 。这是从广义上说的。因此我们不会说“某某 东西 ”,我们说“某个 对象 ”。
就每一个东西包括数、字符串甚至函数都是对象这一点来 说Python是__极其完全地面向对象__的。
例子:
print "Hello World!你好!" "我是张天瑞"
i = 5
print i
i = i + 1
print i
s = '''This is a multi-line string.
This is the second line.'''
print s
使用变量时只需要给它们赋一个值。不需要声明或定义数 据类型。
===== 逻辑行与物理行 =====
物理行是你在编写程序时所 看见 的。逻辑行是Python 看见的单个语句。Python假定每个 物理行 对应一个 逻辑 行 。
逻辑行的例子如print 'Hello World'这样的语句——如果它本身就是一行(就像你在编辑器中看到的那样),那么它也是一个物理行。
默认地Python希望**每行都只使用一个语句**,这样 使得代码更加易读。
如果你想要__在一个物理行中使用多于一个逻辑行__那么你 需要使用分号(;)来特别地标明这种用法。分号表示一个逻辑行/语句的结束。
===== 缩进 =====
空白在Python中是重要的。事实上行 首的空白是重要的。它称为缩进。在逻辑行首的空白空格和制表符用来决定逻辑行的__缩进层次__从而 用来决定语句的分组。
这意味着同一层次的语句必须有 相同的缩进。每一组这样的语句称为一个块。我们将在后面的章节中看到有关块的用处的例子。
你需要记住的一样东西是错误的缩进会引发错误。
===== 运算符 =====
名称 说明 例子
+ 加 两个对象相加 3 + 5得到8。'a' + 'b'得到'ab'。
- 减 得到负数或是一个数减去另 一个数 -5.2得到一个负数。 50 - 24得到26。
* 乘 两个数相乘或是返回一个被 重复若干次的字符串 2 * 3得到6。'la' * 3得到'lalala'。
** 幂 返回x的y次幂 3 ** 4得到81即3 * 3 * 3 * 3
/ 除 x除以y 4/3得到1整数的除法 得到整数结果。4.0/3或4/3.0得到1.3333333333333333
// / / 取整除 返回商的整数部分 4 // 3.0得到1.0
% 取模 返回除法的余数 8%3得到2。 -25.5%2.25得到1.5
<< 左移 把一个数的比特向左移一定 数目每个数在内存中都表示为比特或二进制数字即0和1 2 << 2得到8。——2按比特表示为10
>> 右移 把一个数的比特向右移一定 数目 11 >> 1得到5。——11按比特表示为1011向右移动1比特后得到101即十进制的5。
& __按位与__ 数的按位与 5 & 3得到1。
| 按位或 数的按位或 5 | 3得到7。
^ 按位异或 数的按位异或 5 ^ 3得到6
~ __按位翻转 __ x的按位翻转是 -(x+1) ~5得到6。
< 小于 返回x是否小于y。所有比 较运算符返回1表示真返回0表示假。这分别与特殊的变量True和False等价。注意这些变量名的大写。 5 < 3返回0即False而3 < 5返回1即True。比较可以被任意连接3 < 5 < 7返回True。
> 大于 返回x是否大于y 5 > 3返回True。如果两个操作数都是数字它们首先被转换为一个共同的类型。否则它总是返回False。
<= 小于等于 返回x是否小于等于y x = 3; y = 6; x <= y返回True。
>= 大于等于 返回x是否大于等于y x = 4; y = 3; x >= y返回True。
= = 等于 比较对象是否相等 x = 2; y = 2; x == y返回True。x = 'str'; y = 'stR'; x == y返回False。x = 'str'; y = 'str'; x == y返回True。
!= 不等于 比较两个对象是否不相等 x = 2; y = 3; x != y返回True。
not 布尔“非” 如果x为True返回 False。如果x为False它返回True。 x = True; not y返回False。
and 布尔“与” 如果x为Falsex and y返回False否则它返回y的计算值。 x = False; y = True; x and y由于x是False返回False。在这里Python不会计算y因为它知道这个表达式的值肯定是False因为x是False。这个现象 称为短路计算。
or 布尔“或” 如果x是True它返回 True否则它返回y的计算值。 x = True; y = False; x or y返回True。短路计算在这里也适用。
===== if语句 =====
if xxx:
... ...;
elif xxx:
... ...;
else:
... ...;
===== while语句 =====
while True:
... ...;
else:
... ...;
True和False被称为__布尔类型__。你可以分别把它们等效地理解为值1和0
===== for语句 =====
for i in range(1, 5):
print i;
else:
print '循环结束'; #总是会执行的,除非碰到 break __提前跳出循环__
range函数生成这个数的序列,输出1234
默认步长时1也可以自定义 for i in range(1, 5, 2) - 步长为2来输出
for i in [1, 2, 3, 4]
print i;
也可以达到同样的效果。
continue、break 语句和 java 中的差不多
===== 函数 =====
def showList(a, b):
for i in range(a, b):
print i;
else :
print '循环结束';
showList(1, 5)
===== 使用global语句 =====
如果你想要为一个定义在函数外的变量赋值,那么你就得 告诉Python这个变量名不是局部的而是 全局 的。我们使用global语句完成这一功能。没有global语 句,是不可能为定义在函数外的变量赋值的。
你可以使用定义在函数外的变量的值(假设在函数内没有 同名的变量。然而我并不鼓励你这样做并且你应该尽量避免这样做因为这使得程序的读者会不清楚这个变量是在哪里定义的。使用global语 句可以清楚地表明变量是在外面的块定义的。
def say(message, times = 1): - 可以事先给出默认参数,只限于最后一个形参
print message * times
say('Hello') - 打印一遍
say('World', 5) - 打印5遍
一般情况下按参数顺序赋值,函数声明的时候有默认值的就可以不起默认值。
如果指定赋值就这样def sayc = 2,就回指定给形参 c 赋值a、b 取默认值,
===== return使用 =====
可以返回一个值,或者跳出函数
注意没有返回值的return语 句等价于__return None__。None是Python中表示没有任何东西的特殊类型。例如如 果一个变量的值为None可以表示它没有值。
除非你提供你自己的return语 句每个函数都在结尾暗含有return None语句。通过运行print someFunction() 你可以明白这一点函数someFunction没有使用return语句如同
def someFunction():
pass
===== pass语句 =====
在 Python中表示一个空的语句块。
===== DocStrings使用 =====
Python有一个很奇妙的特性称为 文 档字符串 ,它通常被简称为 docstrings 。DocStrings是一个重要的工具由于它帮助你的程序文档更加简单易懂你应该尽量使用它。你甚至可以在程序运行的时候从函数恢复文档字符串
def say():
'这是文档字符串'
print '说点什么';
print say.__doc__;
以上语句只会输出 ‘这是文档字符串’
===== 模块 =====
你已经学习了如何在你的程序中定义一次函数而重用代 码。如果你想要在其他程序中**重用很多函数**那么你该如何编写程序呢你可能已经猜到了答案是使用模块。__模块基本上就是一个包含了所有你定义的函数和变量 的文件__。为了在其他程序中重用模块模块的文件名必须以.py为扩展名。
===== 字节编译的.pyc文件 =====
输入一个模块相对来说是一个比较费时的事情,所以 Python做了一些技巧以便使输入模块更加快一些。一种方法是创建__ 字节编译__的文件 ,这些文件以.pyc作 为扩展名。字节编译的文件与Python变换程序的中间状态有关是否还记得Python如何工作的介绍。当你在下次从别的程序输入这个模块的时候.pyc文件是十分有用的——它会快 得多因为一部分输入模块所需的处理已经完成了。另外这些字节编译的文件也是与平__台无关__的。所以现在你知道了那些.pyc文 件事实上是什么了。
===== from..import语句 =====
如果你想要直接输入argv变 量到你的程序中避免在每次使用它时打sys.那么你可以使用from sys import argv语 句。如果你想要输入所有sys模块使用的名字那么你可以使用from sys import *语 句。这对于所有模块都适用。一般说来__应该避免使用from..import而使用import语 句__因为这样可以使你的程序更加易读也可以避免名称的冲突。
可以直接运行 helloModule.py 里的语句
import helloModule;
helloModule.sayHello();
print helloModule.version;
#或者是
from helloModule import sayHello, version;
sayHello();
print version;
dir()函数
你可以使用内建的dir函数来列出模块定义的标识符。标识符有函数、类和变量。
当你为dir()提供一个模块名的时候,它返回模块定义的名称列表。如果不提供参数,它返回当前模块中定义的名称列表。
dir(sys)
数据结构
列表
list是处理一组有序项目的数据结构即你可以在一个列表中存储一个 序列 的项目。假想你有一个购物列表上面记载着你要买的东西你就容易理解列表了。只不过在你的购物表上可能每样东西都独自占有一行而在Python中你在每个项目之间用逗号分割。
列表中的项目应该包括在方括号中这样Python就知道你是在指明一个列表。一旦你创建了一个列表你可以添加、删除或是搜索列表中的项目。由于你可以增加或删除项目我们说列表是 可变的 数据类型,即这种类型是可以被改变的。
shoplist = ['东瓜', '南瓜', '西瓜', '瓜瓜', '阿瓜'];
len(shoplist)
shoplist[2];
shoplist.append('张天瑞');
del shoplist[4];
shoplist.sort(); - sort(reverse=True)
元组
元组和列表十分类似,只不过元组和字符串一样是 不可变的 即你不能修改元组。元组通过圆括号中用逗号分割的项目定义。元组通常用在使语句或用户定义的函数能够安全地采用一组值的时候,即被使用的元组的值不会改变。
zoo = ('wolf', 'elephant', 'penguin')
new_zoo = ('monkey', 'dolphin', zoo)
len(new_zoo)
new_zoo[2]
new_zoo[2][2] - penguin
含有0个或1个项目的元组。一个空的元组由一对空的圆括号组成如myempty = () 注意print '%s is %d years old' % (name, age)
字典
字典类似于你通过联系人名字查找地址和联系人详细情况的地址簿,即,我们把键(名字)和值(详细情况)联系在一起。注意,键必须是唯一的,就像如果有两个人恰巧同名的话,你无法找到正确的信息。
注意,你只能使用不可变的对象(比如字符串)来作为字典的键,但是你可以不可变或可变的对象作为字典的值。基本说来就是,你应该只使用简单的对象作为键。
键值对在字典中以这样的方式标记d = {key1 : value1, key2 : value2 }。注意它们的键/值对用冒号分割,而各个对用逗号分割,所有这些都包括在花括号中。
记住字典中的键/值对是没有顺序的。如果你想要一个特定的顺序,那么你应该在使用前自己对它们排序。
字典是dict类的实例/ 对象。
ab = { 'Swaroop' : 'swaroopch@byteofpython.info',
'Larry' : 'larry@wall.org',
'Matsumoto' : 'matz@ruby-lang.org',
'Spammer' : 'spammer@hotmail.com'
}
ab['Guido'] = 'guido@python.org'
del ab['Spammer']
for name, address in ab.items():
print 'Contact %s at %s' % (name, address)
if 'Guido' in ab: # OR ab.has_key('Guido')
print "nGuido's address is %s" % ab['Guido']

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,7 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-21T21:15:48+08:00
====== django ======
Created Friday 21 October 2011

View File

@@ -0,0 +1,7 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-27T15:15:15+08:00
====== Developing Reusable Django Apps ======
Created Thursday 27 October 2011

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

View File

@@ -0,0 +1,195 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-22T19:56:25+08:00
====== Django+Mysql安装配置详解(Linux)[更新为1.3版] ======
Created Saturday 22 October 2011
http://dmyz.org/archives/110
===== Perface =====
Django是一个开源的Web应用框架由Python写成并于2005年7月在BSD许可证下发布。Django的主要目标是使得开发复杂的、数据库驱动的网站变得简单。Django采用MVC设计模式,注重组件的重用性和“可插拔性”敏捷开发和DRY法则Dont Repeat Yourself
在Django中Python被普遍使用甚至包括配置文件和数据模型。本文介绍Django在Linux+Mysql环境下安装、配置的过程包括安装、运行、添加应用的所有流程最终建立一个可以从Mysql读取文章并显示的Django应用。
===== Install =====
首先下载Django
wget www.djangoproject.com/download/1.3/tarball/
得到Django-1.3.tar.gz将其解压后安装
tar xzvf Django-1.3.tar.gz
cd Django-1.3
sudo python setup.py install
如果提示缺少setuptools还要下载安装setuptools(建议提前安上因为在安装MySQL for Python的时候也会用到)。
完成安装后Django会拷贝一个**django-admin.py**到/usr/local/bin下这个py文件引入了**Django的管理模块**。
===== Setup =====
要创建一个Django项目非常简单使用startproject命令输入项目名称
**django-admin.py startproject mysite**
Django会在当前目录下自动生成一个名为mysite的文件夹即**项目文件夹**,里面有以下文件(.pyc在第一次执行后才有刚建立时可能只有几个.py后缀的文件)
urls.py
settings.pyc
settings.py
manage.py
__init__.pyc
__init__.py
__init__.py/__init__.pyc可以是空文件只是**表明这个文件夹是一个可以导入的包**,这个文件在安装配置时不会用到。
settings.py/settings.pyc配置文件配置Django的一些信息最主要是数据库信息、加载模块的信息。
manage.py**命令行工具**实现与Django之间的交互。
创建项目后,进入项目文件夹,启动**Django自带的web开发服务器**
python manage.py runserver
Django会自动检查配置文件中的错误如果全部正常则顺利启动
Validating models…
0 errors found
Django version 1.2.3, using settings mysite.settings
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
访问http://127.0.0.1:8000如果顺利显示说明Django已经可以正常使用了。但现在只有本机可以访问要让外网能够访问或是要更换默认的8000端口可以执行命令
python manage.py runserver 0.0.0.0:8080
这样就将端口修改为8080且外网也可以通过IP访问本机上的Django。
现在要让Django支持Mysql数据库。编辑配置文件(settings.py)。在第12行找到
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql', #设置为mysql数据库
'NAME': 'dmyz', #mysql数据库名
'USER': 'root', #mysql用户名留空则默认为当前linux用户名
'PASSWORD': '', #mysql密码
'HOST': '', #留空默认为localhost
'PORT': '', #留空默认为3306端口
}
}
再次使用runserver命令(通常来说Django会**自动重新加载settings.py文件**的)就可以使用Mysql数据库了。
因为Django要通过Python操作Mysql所以要先安装**Mysql for Python**。在Ubuntu下安装还会提示EnvironmentError: mysql_config not found。因为通过apt-get安装的mysql**没有安装开发工具包**所以找不到mysql_config文件使用以下命令安装
sudo apt-get install libmysqld-dev
===== URL =====
URL配置文件很象**一个目录**Django会通过URL配置文件来查找相应的对象URL地址的使用正则表达式设置。在mysite目录下可以找到urls.py文件它是URL配置的默认起点也可以通过编辑settings.py中的 __ROOT_URLCONF__值来修改。直接编辑urls.py
urlpatterns = patterns('',
(r'^$', 'mysite.hello.index'),
)
r^$’:正则,表示根目录;
mysite.hello.index指向mysite这个项目下的hello模块中的index函数。
剩下的就很简单了在mysite文件夹下建立一个hello.py文件在其中写入一个index函数
#hello.py
from django.http import HttpResponse
def index(request):
return HttpResponse('hello, world')
刷新网站首页就可以看到已经输出了”hello, world”。
另一种方法: 设置一个hello模块只是方便理解Django的结构但如果一个首页就要使用那么多代码是很不pythonic的所以在生产环境中我们的首页通常会这么来写
#url.py
urlpatterns = patterns('',
url(r'^$', 'django.views.generic.simple.direct_to_template', {'template':'index.html'}),
)
Django会自动**在模板目录中**找到并加载index.html只需要修改url.py一个文件就搞定了。
===== Application =====
上一节”hello world”的例子只是说明了URL的用法可以说完全没有用到Django。__Django作为一个Web框架目的是实现MVC的分离它可以自行处理一些通用的操作让开发人员可以专注于核心应用的开发。__所以本文的最后一步将编写一个名为article的应用从mysql数据库里读取出文章作者、标题、内容。
首先建立应用:
python manage.py startapp article
在项目文件夹中会增加一个article文件夹里面有如下文件
models.py
views.py
__init__.py
models.py模型文件用一个 Python 类来描述**数据表**,运用它可以通过简单的 Python 的代码来创建、检索、更新、删除数据库中的记录而无需写一条又一条的SQL语句。
views.py视图文件用来联系模型与模版。
然后编写模型文件(article/models.py),用来实现对数据库的操作:
from django.db import models
# Create your models here.
class Article(models.Model):
title = models.CharField(max_length=50)
author = models.CharField(max_length=50)
content = models.CharField(max_length=200)
现在要修改配置文件(settings.py)文件告诉Django这__个应用是项目的一部分__打开配置文件在尾部找到INSTALLED_APPS元组将article添加进去
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django_openid_auth',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
#……
'mysite.article', #加入app
)
可以现运行manage.py sql article命令进行测试如果可以看到生成的sql语句说明模型已经正常设置可以初始化并安装
python manage.py syncdb
Django会自动创建名为**article_article**的表。而且因为在INSTALLED_APPS中使用了__django.contrib.auth__所以syncdb命令会要求提供一个管理员帐号和密码用来登录Django的管理工具。
You just installed Djangos auth system, which means you dont have any superusers defined.
Would you like to create one now? (yes/no): yes
简单的模型就设置完成了,现在来设置视图,编辑视图(article/views.py)文件:
# article/views.py
from django.shortcuts import render_to_response
from models import Article
def latest_article(request):
article_list = Article.objects.order_by('-id')
return render_to_response('article/article.html',{'article_list':article_list})
2行导入Django的render_to_response()函数,它用来调用模板、填充内容和返回包含内容的页面。
3行导入之前编写模型文件中的Article类。
4~6行定义一个latest_article函数利用Article类从数据库获得数据并按照id倒序输出。然后调用模版文件将变量传递过去。
在上面的代码中使用的模版文件它的地址是设置文件中的__模版路径__+在views.py中定义的路径因此如果报错TemplateDoesNotExist at 路径的话,很可能就是忘了设置模板路径。
编辑设置文件(settings.py),修改**TEMPLATE_DIRS**,设置一个模版路径,这里将模版目录直接指定在项目文件夹(mysite)中:
TEMPLATE_DIRS = (
"/var/www/mysite"
)
这样程序运行时,加载的静态模版就是 /var/www/mysite/article/article.html了。如果使用过其它框架或者模板引擎下面article.html的内容就很容易看懂了Django在模版文件中利用相应的TAG控制传递过来的变量显示的位置
{% for article in article_list %}
Author:{{ article.author }}
Title:{{ article.title }}
Content:{{ article.title }}
{% endfor %}
最后修改URL配置文件让article/指向视图(views.py)中定义的latest_articl函数
(r^article/, mysite.article.views.latest_article),
这样所有的配置就完成了,当访问 http://127.0.0.1:8000/articleDjango就会自动读取数据库中的内容并显示在网页上了。
===== Epilogue =====
本文的一些设置并不适用于实际生产环境比如URL配置为了方便重用通常都会__使用include的方式__而在这里则是直接指定。所以本文旨在介绍一些入门知识和快速配置的方法如果希望更规范的学习Django首推Django的官方文档其次是Djangobook我更新这篇文章的时候Djangobook2.0中文版已经翻译了大半了也是学习Django很好的教材。

View File

@@ -0,0 +1,107 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-27T15:30:19+08:00
====== Django tips- laying out an application ======
Created Thursday 27 October 2011
http://www.b-list.org/weblog/2006/sep/10/django-tips-laying-out-application/
Continuing the theme of dealing with common questions from the Django mailing lists and IRC channel, today well look at how to organize the various bits of a Django-based project or application.
===== Projects versus applications =====
This is really more of a separate (though related) question, but understanding the distinction Django draws between a “project” and an “application” is a big part of good code layout. Roughly speaking, this is what the two terms mean:
* An application tries to provide a single, relatively self-contained set of related functions. An application is allowed to define a set of models (though it doesnt have to) and to define and register custom template tags and filters (though, again, it doesnt have to).
* A project is a collection of applications, installed into the **same** database, and all using the same settings file. In a sense, the defining aspect of a project is that it supplies a settings file which specifies the database to use, the applications to install, and other bits of** configuration**. A project may correspond to a single web site, but doesnt have to — multiple projects can run on the same site. The project is also responsible for the** root URL** configuration, though in most cases its useful to just have that consist of calls to include which pull in URL configurations from inidividual applications.
Views, custom manipulators, custom context processors and most other things Django lets you create can all be defined **either **at the level of the project or of the application, and where you do that should depend on whats **most effective** for you; in general, though, theyre best placed **inside an application** (this increases their portability across projects).
===== Default project-level file layout =====
When you run** django-admin.py startproject**, Django will automatically create a new directory containing four files:
* __init__.py, which will be empty. This file is required to tell Python that the directory is a Python module and can be imported (and imported from).
* manage.py, which provides a number of convenience functions for working with the project.
* settings.py, which will be the projects settings file.
* urls.py, which will be the projects **root **URL configuration.
Generally you dont need to modify this layout, and for compatibility and consistency its probably **best if you dont**. If you do want to change things, though, heres what its safe to do:
* You can put your settings in a file thats not called settings.py — Django finds out where your settings are by looking at the environment variable **DJANGO_SETTINGS_MODULE**, not by looking for a file with a specific name.
* You can put your root URL configuration somewhere thats not called urls.py — Django looks at the **ROOT_URLCONF **settingin settings.py file to find out where your URL configuration lives.
manage.py and __init__.py should be left alone.
===== Default application-level file layout =====
When you run **manage.py startappon the root of project directory**, Django creates a sub-directory of your project directory, and creates the following files:
* __init__.py, which serves the same purpose as at the project level.
* models.py, which should hold the applications model classes.
* views.py, which is for any custom views the application wants to provide.
The __init__.py and models.py files (or, if you want to split up your models across multiple files, a directory called __models __which can act as a Python module) are **required**; without __init__.py, Python wont be able to import from the application, and Django is **hard-wired** to expect models in a file or module called models. The views.py file, however, is **optional** and you can delete it if you wont be providing any views, or rename it if you want to call it something else (though for sake of **consistency **its probably best not to rename it).
===== Extra special stuff =====
There are four “**special**” locations inside your application which can be used in order to take advantage of specific features, so if you want to use these features you dont have a whole lot of choice in how you set them up:
* To define custom template tags or filters, you must create a sub-directory in the applications directory called **templatetags**, and it **must **contain a file named __init__.py so that it can be imported as a Python module.
* To define** unit tests** which will **automatically** be noticed by Djangos testing framework, put them in a module called **tests** (which can be either a file named tests.py or a directory called tests). The testing framework will also find any **doctests **in that module, but the preferred place for those is, of course, the docstrings of the classes or functions theyre designed to test.
* To provide custom SQL which will be executed immediately after your application is** installed**, create a sub-directory called__ sql __inside the applications directory; the file names should be the same as the names of the models whose tables theyll operate on; for example, if you have an app named weblog containing a model named Entry, then the file __sql/entry.sql __inside the apps directory can be used to modify or insert data into the entries table as soon as its been created.
* To provide custom Python functions which will run when the application is installed, put them in a file named __management.py__, and use Djangos internal dispatcher to connect your functions to the **post_syncdb** signal.
That last one deserves a bit more explanation, so lets look at it in detail.
Internally, Django uses a package called **PyDispatcher** to enable its various bits to communicate cleanly with each other. Basically, the dispatcher works like this:
* Various parts of Django, as well as other applications, define plain objects called “signals”.
* Code which wants other things to be notified of something happening tells the dispatcher to send a particular signal.
* Code which wants to be notified when something happens uses the dispatchers connect method to listen for a particular signal.
For example, if you wanted to set up a function which would execute any time a new application is installed, you could create a file in your application called management.py, and put this code in it:
from django.dispatch import dispatcher
from django.db.models import signals
def my_syncdb_func():
# put your code here...
dispatcher.connect(my_syncdb_func, signal=signals.post_syncdb)
Several of the applications bundled with Django use this trick to do various things:
* django.contrib.sites listens to find out when its been installed, and creates the default “example.com” site object, which is needed for the admin to function.
* django.contrib.contenttypes listens for any new apps being installed, and creates new ContentType instances for all the models being installed.
* django.contrib.auth listens on two fronts: when the auth app is installed, it prompts you to create a superuser, and when any new app is installed, it creates permissions for that apps models.
This works because the syncdb function in manage.py imports the management files from all the installed and soon-to-be-installed apps in your project; that makes sure any app which needs to can take advantage of the dispatcher.
===== Other useful conventions =====
Of course, that doesnt cover all the things you might want to do within Django, so naturally people end up wondering how they should organize the rest of their code. Generally theres no need to require standardized layouts or locations for certain functions, but it can be helpful to adhere to a few conventions. So to provide an example, heres how I generally lay out any additional bits I need in things that Im working on.
At the project level, I generally dont add a whole lot; its rare that I need to do something that doesnt make more sense as part of an application. However, if youre going to be running multiple projects which all use some or all of the same applications, it can be useful to organize your settings carefully. Jacob has, once or twice, pointed out the trick that we use at the Journal-World, and I think its fairly useful:
//we have our code base in one directory structure, and settings files in another, with a “default” settings file at the top level of the settings tree. Settings files for individual sites import the default settings and override anything they need to, or add any extra settings they require. Because Django doesnt require settings files to live in the same directory tree as the applications their projects use, this is extremely easy and extremely useful to do.//
At the application level, I usually drop in **a few more** files depending on exactly what the application is going to be using:
* If the application defines any custom manipulators, I put them in a file called __forms.py __instead of in the views file.
* If there are multiple custom managers in the app, I put them in a file called __managers.py__ instead of the models file.
* If Im defining any custom context processors, I put them in a file called __context_processors.py__.
* If Im setting up any custom dispatcher signals, they go in a file called __signals.py__.
* If the application is setting up any syndication feeds, the feed classes go in a file called __feeds.py__. Similarly, sitemap classes go in __sitemaps.py__.
* Middleware classes go in a file called __middleware.py__.
* Any miscellaneous code which doesnt clearly go anywhere else goes in a file or module called __utils__.
I dont always use all of that functionality in an app, but if I do its nice to have a convention for it, so that I only need to remember to do from appname import forms or from appname import feeds.
===== And one more thing… =====
This is something Im still in the process of developing, but I also like to have an easy way to test the **dependencies** my applications have, and to make sure everything is **configured** properly. The easiest way to do this, in my experience, is to put some code in the applications__ __init__.py__ which checks for everything the application will need. I wrote up a simple example in a post on the Django developers list, and Im still working on improving that; there are functions tucked away in django.core.management which will let you test not only whether an application can be imported and is listed in the INSTALLED_APPS setting, but whether its actually been installed into the database.
===== How do you do it? =====
I think Ive covered everything I know of, and every trick I use, for laying out a Django project or application cleanly; if you see something here you like, feel free to start using it. And if Ive overlooked something cool that you know of, please post it in a comment and let the world know about it :)

View File

@@ -0,0 +1,373 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T10:17:36+08:00
====== Django 最佳实践 - 中文版 (2009-06-17) ======
Created Sunday 23 October 2011
http://yangyubo.com/django-best-practices/
===== 译者 (yospaly) 前言 =====
Django 最佳实践 (django-best-practices) 是 django-reusable-app-docs 的一个分支项目, 它在原有项目的理念上进行了扩展, 建立了一套关于 Django Web 开发方面的 “最佳实践” 规则, 这些理念超出了官方文档的讨论范围.
这份文档由一系列准则组成, 围绕这如何创建便于维护, 代码干净, 结构清晰, 方便复用, 易于部署的 Django 项目. 对于初学者, 它是一份不可多得指南, 如果不知道从何下手, 按照文档说做能很快规范起来; 对于经验丰富 Django 达人, 它也有一定的参考价值, 可以据此来创建自己的最佳实践准则.
遗憾的是本人理工科出身, 水平有限, 翻译中规中矩, 只能勉强表达出原文传递的信息. 还有小部分尚不解其意 (文中有备注) :)
原文是多页面方式, 不过 实践 每个规则的内容并不太多, 打散成多页面反而浏览不方便. 所以我把它们全合并到单个页面中, 除此之外, 没做其它改动.
原文和译文的源文件均使用 reStructuredText 格式, 可以用 Sphinx 转换成 HTML 等格式的文档.
====== Django 最佳实践 ======
这是一份关于开发和部署** Django Web 框架** 的动态文档 (会随时更新). 这些准则不应该被认为是 绝对正确 或 唯一 使用 Django 的方法, 应该说这些最佳实践是我们使用框架多年来积累的经验.
本项目是 django-reusable-app-docs 项目 的一个分支, 这个优秀的项目是由 Brian Rosner 和 Eric Holscher 建立的, 是关于如何开发和维护可复用 Django apps 方面的最佳实践准则.
===== 代码风格 =====
一般而言, 代码应该干净, 简明, 易读. The Zen of Python (PEP 20) 是 Python 编码实践的权威介绍.
* 尽可能合理的遵守 Style Guide for Python Code (PEP 8).
* 遵守 Django coding style.
Django 应用 (app)
如何发布我的 app ?
Django 应该使用使用标准的 Python Package Index 即 Pypi 和 Cheese Shop. 我写过一篇关于如何轻松打包和上传 app 到 Pypi 的 教程.
如果你上传 app 到 Pypi, 建议最好在你的项目名前加上 “django-” 前缀.
(yospaly: 以下不解其意, 望达人指点) Also note, that when below when we refer to the default place for something as a file, that also means that you can make a directory of that same name; as per normal python.
文档
放在和 APP 目录同级的 docs 目录中 (你的 app 应该有上级目录的吧?)
可以包含模板, 供使用者参考
什么是可复用 app?
一个 Django app, 一个能够轻松嵌入到 project 的 app, 提供一个非常明确的功能. 它们应该专注并遵循 Unix 哲学 “做一件事并把它做好”. 更多相关信息请参考 James Bennett 的 Djangocon talk.
Application 模块
(yospaly: 以下所有大写单词, 如: APP, MODEL 等, 替换成你项目中真实的 app 名或 model 名.)
Admin
非必须
放在 APP/admin.py 文件中
Admin 的 MODEL 类命名为 MODELAdmin
上下文处理器
放在 APP/context_processors.py 文件中
内容源
放在 APP/feeds.py 文件中
表单
放在 APP/forms.py 文件中
Managers
放在 APP/managers.py 文件中
中间件
放在 APP/middleware.py 文件中
实现尽可能少的任务
模型
放在 APP/models (.py 文件中或目录下)
遵循 Djangos 模型约定
模板
放在 APP/templates/APP/template.html 文件中
为了尽量标准化 Django 模板区块 (block) 名称, 我建议通常情况下使用以下区块名称.
{% block title %}
这个区块用来定义页面的标题. 你的 base.html 模板很可能要在这个 tag 之外定义站点名字 (Sites name) (即便使用了 Sites 框架), 以便能够放在所有页面中.
{% block extra_head %}
我认为这是个非常有用的区块, 很多人已经以某种方式在使用了. 很多页面经常需要在 HTML 文档头添加些信息, 比如 RSS 源, Javascript, CSS, 以及别的应该放在文档头的信息. 你可以, 也很可能将会, 定义另外专门的区块 (比如前面的 title 区块) 来添加文档头的其它部分的信息.
{% block body %}
这个 tag 用来包含页面的整个 body 部分. 这使得你在 app 中创建的页面能够替换整个页面内容, 不仅仅是正文内容. 这种做法虽不常见, 但当你需要时, 它确实是一个非常方便的 tag. 你可能还没注意到, 我一直尽可能的使 tag 名字和 HTML 标签名称保持一致.
{% block menu %}
你的菜单 (导航栏) 应该包含在这个区块中. 它是针对站点级的导航, 不是每个页面专属的导航菜单.
{% block content %}
这个区块用来放置页面正文内容. 任何页面正文内容都可能不一样. 它不包含任何站点导航, 信息头, 页脚, 或其它任何属于 base 模板的东东.
其它可能的区块
{% block content_title %}
用来指定 content 区块的 “title”. 比如 blog 的标题. 也可以用来包含 content 内的导航 (译注: 比如提纲), 或其它类似的东东. 大致都是些页面中并非主要内容的东东. 我不知道这个区块是否应该放到 content tag 内, 并且对应于前面建议的 content tag, 是不是还需要一个 main_content 区块.
{% block header %} {% block footer %}
任何每个页面都可能修改的文本区域的页面和页脚.
{% block body_id %} {% block body_class %}
用来设置 HTML 文档 body 标签的 class 或 id 属性. 在设置样式或其它属性时非常有用.
{% block [section]_menu %} {% block page_menu %}
这是对应于之前建议的 menu 区块. 用来导航一个章节或页面.
模板标签
放在 APP/templatetags/APP_tags.py 文件中
推荐的模板标签语法
as (Context Var): This is used to set a variable in the context of the page
for (object or app.model): This is used to designate an object for an action to be taken on.
limit (num): This is used to limit a result to a certain number of results.
exclude (object or pk): The same as for, but is used to exclude things of that type.
测试
放在 APP/tests (.py 文件或目录) 中
Fixtures 放在 APP/fixtures/fixture.json 文件中
通常只须重写 Django 的 testcase
URLs
放在 APP/urls (.py 文件或目录) 中
需要设置 name 属性以便能够被反查; name 属性设置成 APP_MODEL_VIEW 的格式, 比如 blog_post_detail 或 blog_post_list.
视图
放在 APP/views (.py 文件或目录) 中
可以是任何可调用的 python 函数.
视图参数应提供合理的缺省值, 并易于定制:
范例:
def register(request, success_url=None,
form_class=RegistrationForm
template_name='registration/registration_form.html',
extra_context=None):
Django Projects (项目)
推荐的布局
example.com/
README
settings.py
urls.py
docs/
This will hold the documentation for your project
static/
-In production this will be the root of your MEDIA_URL
css/
js/
images/
tests/
- Project level tests (Each app should also have tests)
uploads/
- content imgs, etc
templates/
- This area is used to override templates from your reusable apps
flatpages/
comments/
example/
app1/
app2/
什么是 Django Project?
Django 中的 project 指的是一个包含设置文件, urls 链接, 以及一些 Django Apps 集合的简单结构. 这些东东可以是你自己写的, 也可以是一些包含在你的 project 内的第三方代码.
Project 模块
设置
放在 [PROJECT]/settings.py 文件中
使用相对路径
import os
DIRNAME = os.path.dirname(__file__)
MEDIA_ROOT = os.path.join(DIRNAME, 'static')
具体环境相关的设置使用 local_settings.py 文件, 并在 settings.py 文件结尾导入它.
try:
from local_settings import *
except ImportError:
pass
URLs
放在 PROJECT/urls.py 文件中
应包含最少的逻辑代码, 多数情况下只作为一个指针, 指向你 apps 各自的 URL 配置.
部署
Project 的环境初始化
文件系统布局
Note
本文档严重偏向 Unix 风格的文件系统, 要在其它操作系统上使用需要做些额外的修改.
Virtualenv 对于 Python 项目来说是必须的. 它提供一个隔离不同 Python 运行环境的方法. 典型的, 我们在 /opt/webapps/<site_name> 部署生产环境站点, 在 ~/webapps/<site_name> 目录部署我们的开发环境站点. 每个 project 有它自己的 virtualenv, virtualenv 还充当 project 所有相关代码的根目录. 我们使用 pip 为 virtualenv 添加必要的包.
引导过程看上去是这样的:
cd /opt/webapps
virtualenv mysite.com
cd mysite.com
pip install -E . -r path/to/requirements.txt
source bin/activate
Tip
方便起见, 你可以在你的 virtualenv 根目录中创建 Django project 的符号链接. 符号链接的名字无所谓, 因为你的 project 已经在 Python 搜索路径中. 通过给你所有的 projects 起同样的符号链接名, 你可以使用一些 方便的 bash 函数以节省时间.
打包
成功部署的关键之一是, 保证你开发环境下的软件尽可能接近部署环境下的软件. Pip 提供了一个简单的重现方法, 让你在任何系统上都能非常一致的部署 Python 项目. 任何需要第三方库的 app 都应该包含一个名为 requirements.txt 的 pip 规格文件. Projects 应到负责汇集所有 app 的规格文件, 并在根据需要添加其它规格.
你的规格文件中要包含些什么
我们的经验是, 任何应用程序, 只要你的操作系统默认没附带. 唯一需要从我们的规格文件中剔除的几个包是 PIL, 数据库驱动和其它 pip 不能安装的包. 这些被剔除的规格放在 projects README 文件中加以说明.
服务器
Note
部署架构很大程度上取决于站点的流量. 下面描述的设置对我们来说, 在大多数情况下工作的最好.
我们基于 Linux 和 PostgreSQL 后端数据库部署 Django, Nginx 进程作为前端代理, 处在其后的是 Apache 和 mod_wsgi.
Nginx
Nginx 是一个非常优秀的前端服务器, 速度快, 稳如磐石, 并且资源占用很少. 以下是一个典型的 Nginx 站点配置:
# Apache server
upstream django {
server domain.com:9000;
}
# Redirect all requests on the www subdomain to the root domain
server {
listen 80;
server_name www.domain.com;
rewrite ^/(.*) http://domain.com/$1 permanent;
}
# Serve static files and redirect any other request to Apache
server {
listen 80;
server_name domain.com;
root /var/www/domain.com/;
access_log /var/log/nginx/domain.com.access.log;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
if (!-f $request_filename) {
proxy_pass http://django;
}
}
它都做些什么?
第一段告诉 Nginx 去哪里找托管了 Django 站点的服务器. 第二段把所有来自 www.domain.com 的请求重定向到 domain.com, 这样所有资源就都只有一个 URL 能被访问到. 最后一段承担了所有工作. 它告诉 Nginx 检查 /var/www/domain.com 中是否存在被请求的文件. 如果存在, 它返回该文件, 否则, 它将把请求转发给 Django 站点.
Warning
yospaly 注
以下涉及 Apache 的部分均未作翻译, 我们强烈建议使用 Nginx/Lighttpd + SCGI/FastCGI/HTTP 的方式, 尽量不要使用繁琐的 Apache + mod_wsgi.
SSL
Another benefit to running a frontend server is lightweight SSL proxying. Rather than having two Django instances running for SSL and non-SSL access, we can have Nginx act as the gatekeeper redirecting all requests back to a single non-SSL Apache instance listening on the localhost. Heres what that would look like:
server {
listen 67.207.128.83:443; #replace with your own ip address
server_name domain.com;
root /var/www/domain.com/;
access_log /var/log/nginx/domain.com.access.log;
ssl on;
ssl_certificate /etc/nginx/ssl/certs/domain.com.crt;
ssl_certificate_key /etc/nginx/ssl/private/domain.com.key;
ssl_prefer_server_ciphers on;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Protocol https;
if (!-f $request_filename) {
proxy_pass http://django;
}
}
You can include this code at the bottom of your non-SSL configuration file.
Tip
For SSL-aware Django sites like Satchmo, youll need to “trick” the site into thinking incoming requests are coming in via SSL, but this is simple enough to do with a small addition to the WSGI script we discuss below.
Apache
We run the Apache2 Worker MPM with mod_wsgi in daemon mode. The default settings for the MPM Worker module should be sufficient for most environments although those with a shortage of RAM may want to look into reducing the number of servers spawned. Since Nginx will be listening for HTTP(S) requests, youll need to bind Apache to a different port. While youre at it, you can tell it to only respond to the localhost. To do so, youll want to edit the Listen directive
Listen 127.0.0.1:9000
With Apache up and running, youll need an Apache configuration and WSGI script for each site. A typical Apache configuration for an individual site looks like this:
<VirtualHost *:9000>
ServerName domain.com
ServerAdmin webmaster@domain.com
ErrorLog /var/log/apache2/domain.com.log
WSGIDaemonProcess domain display-name=%{GROUP} maximum-requests=10000
WSGIProcessGroup domain
WSGIScriptAlias / /opt/webapps/domain.com/apache/django.wsgi
<Directory /opt/webapps/domain.com/apache>
Order deny,allow
Allow from all
</Directory>
</VirtualHost>
Tip
In a perfect world, your app would never leak memory and you can leave out the maximum-requests directive. In our experience, setting this to a high number is nice to keep Apaches memory usage in check.
Warning
This will default to a single process with 15 threads. Django is not “officially” thread safe and some external libraries (notably a couple required for django.contrib.gis) are known to not be thread safe. If needed the threads and processes arguments can be adjusted accordingly.
It links to the WSGI script within the project directory. The script is just a few lines of Python to properly setup our environment.
import os, sys
import site
# put virtualenv on pythonpath
site.addsitedir('/opt/webapps/domain.com/lib/python2.5/site-packages')
# redirect print statements to apache log
sys.stdout = sys.stderr
os.environ['DJANGO_SETTINGS_MODULE'] = 'domain.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()

View File

@@ -0,0 +1,131 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T20:17:13+08:00
====== Django使用Uploadify组件实现图片上传 ======
Created Sunday 23 October 2011
http://2goo.info/blog/panjj/Django/2011/08/04/527
Uploadify组件上传文件很酷可以实现文件进度上传而且可以批量上传各种文件。好处还很多具体详情登到官网看看文档了解吧。在同类组件中Uploadify做的也很出色。打算在Django中用它两个东西结合使用也算简单但有些细节需要记下来以便以后重用。
这次只说上传图片部分至于上传文件其实可以照猫画虎而且来得会简单些只是python程序后端写法的区别而已前端代码Uploadify一律平等对待图片也是文件一种特例罢了。
Django使用Uploadify组件实现图片上传可以分为两个大步骤。
前端引用Uploadify所需要的类库(javascript)和脚本样式(css)。
Uploadify会用到JQuery类库还有自己的几个脚本和样式文件搭配好了Django的静态文件让Django**正确解析静态文件**就算成功一半了静态文件的配置参考先前的博客《Django静态文件的配置》。
静态文件我们统一存放在根目录的**site_media**文件夹下到官网http://www.uploadify.com/下载Uploadify-2.14组件放在site_media下的**plugin**随意起名uploadify_214再新建个文件夹**upload(位于site_media下)**,来存放上传的图片。
前端样式脚本引用代码:
<link href="/site_media/plugin/uploadify_214/uploadify.css" type="text/css" rel="stylesheet" />
<script type="text/javascript" src="/site_media/js/jquery.js"></script>
<script type="text/javascript" src="/site_media/plugin/uploadify_214/swfobject.js"></script>
<script type="text/javascript" src="/site_media/plugin/uploadify_214/jquery.uploadify.v2.1.4.min.js"></script>
引用文件的路径算是很重要具体静态配置决定这些。首先引用Uploadify的样式文件然后就是先引用JQuery类库再引用Uploadify自身脚本swfobject.js和jquery.uploadify.v2.1.4.min.js
Uploadify组件初始化代码放在包含上传功能的模板文件里
<script type="text/javascript">
$(document).ready(function() {
$('#file_upload').uploadify({
'uploader' : '/site_media/plugin/uploadify_214/uploadify.swf',
'script' : '{%url uploadify_script%}',
'cancelImg' : '/site_media/plugin/uploadify_214/cancel.png',
'folder' : '/upload',
'auto' : false,//
'multi': true,//设置可以上传多个文件
'queueSizeLimit':20,//设置可以同时20个文件
'removeCompleted':false,//
'sizeLimit':10240000,//设置上传文件大小单位kb
'fileExt':'*.jpg;*.gif;*.png',//设置上传文件类型为常用图片格式
'fileDesc':'Image Files',
'onInit': function () {},
'onError' : function (event,ID,fileObj,errorObj) {
$('#id_span_msg').html("上传失败,错误码:"+errorObj.type+" "+errorObj.info);
},
'onSelect': function (e, queueId, fileObj) {
$('#id_span_msg').html("");
},
'onAllComplete': function (event, data) {
if(data.filesUploaded>=1){
$('#id_span_msg').html("上传成功!");
}
}
});
});
</script>
初始化脚本,有几个关键的参数需要说明一下:
uploader是组件需要flash编译文件里面封装了Uploadify核心的处理程序。
script是后端上传文件程序的url这个是后面说的需要自己写。
folder是上传文件的目录这里我们不计划使用它随便写一个充数。
前端html代码
<h1>Uploadify组件上传方式</h1>
<div class="demo-box">
<input id="file_upload" type="file" name="Filedata">
<div id="file_uploadQueue" class="uploadifyQueue"></div>
<p><a href="javascript:$('#file_upload').uploadifyUpload()">上传图片</a>
<a href="javascript:$('#file_upload').uploadifyClearQueue()">取消上传</a>
</p>
<p><span id="id_span_msg"></span></p>
</div>
二:写好后端图片上传的方法。
如果刚开始就把写好的上传程序和Uploadify结合也许不是很明智的做法因为过程中遇到问题我们不很确定是后端程序的bug还是Uploadify的配置错误所以建议先把写好的后端上传程序用传统的上传方式去测试把程序调试好了再和Uploadify结合这样就会很清楚是那块出现问题了。
所以我们先写个通用的上传函数_upload用传统的上传方式测试它该函数
def _upload(file):
'''图片上传函数'''
if file:
path=os.path.join(**settings.MEDIA_ROOT,'upload'**)
file_name=str(**uuid.uuid1()**)+".jpg"
path_file=os.path.join(path,file_name)
parser = ImageFile.Parser()
for chunk in file.chunks():
parser.feed(chunk)
img = parser.close()
try:
if img.mode != "RGB":
img = img.convert("RGB")
img.save(path_file, 'jpeg',quality=100)
except:
return False
return True
return False
这个程序接收一个Files对象在内存里处理保存好图片程序就几行代码就不解释太多了。大体是先构造一个物理地址用于保存图片再把内存里的图片信息存入img临时变量中判断图片的模式如果不是RGB转换保存成jpg格式返回True失败返回False。
该函数测试通过了能完成保存图片的使命最后就是写Uploadify需要的函数uploadify_script
@csrf_exempt
def uploadify_script(request):
response=HttpResponse()
response['Content-Type']="text/javascript"
ret="0"
file = request.FILES.get("Filedata",None)
if file:
if _upload(file):
ret="1"
ret="2"
response.write(ret)
return response
Uploadify使用uploadify_script函数通过Get方式把图片控件的信息提交给该函数函数返回"text/javascript"的内容类型如果成功写入字符1否则写入非1字符。页面的图片控件命名FiledataDjango通过file = request.FILES.get("Filedata",None)获取控件的图片信息如果不是空的就传递给刚才说的通用函数_upload保存图片。
整个过程算是完结了,过程中值得注意的:
1 常常出现IO Error如果我们已经测试_upload和uploadify_script后端程序他们都没有错误很多程度上是因为前端的Uploadify初始化脚本的问题确认Uploadify几个关键的参数能不能正确解析或者是静态文件配置没成功造成的。
2 Forbidden (403)这是Django引发的Django1.3引进了CSRF我们需要进行一些处理给uploadify_script一个装饰器@csrf_exempt记住这个很关键很折腾人。
3 cannot write mode P as JPEG这个是后端上传程序的错误是因为上传了非jpg类型的图片我们需要需要转换成RGB再保存上面已经提过。
好了不废话,例行给一个例子,看源码就明白了,本地浏览地址http://127.0.0.1:8000/uploadify/。

View File

@@ -0,0 +1,77 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T20:52:38+08:00
====== Django标准化项目dj-scaffold ======
Created Sunday 23 October 2011
http://haoluobo.com/2011/08/dj-scaffold/
由于Django没有象rails一样指定项目的目录结构规范很多人都对django项目的目录结构要如何组织而感到困惑。为此我又新创建了一个开源项目dj-scaffolddjango的脚手架。这个项目用于自动生成一个标注化的django项目和app目录结构同时提供虚拟化环境。
项目地址https://github.com/vicalloy/dj-scaffold
===== 安装 =====
已经发布到了pypi所以你可以用pip或easy_install 来进行安装。
pip install dj-scaffold 或
easy_install dj-scaffold
===== 使用 =====
dj-scaffold主要提供了两个命令dj-scaffold.py和lbstartapp。
====== dj-scaffold.py ======
该脚本用于取代django的startproject命令。使用方式如下
**dj-scaffold.py projectname **
在该命令执行后将创建项目projectname。在项目的scripts目录中提供了脚本create_env.py和env.rc。
* create_env.py 执行该脚本将__自动初始化并创建python虚拟环境__。新生成的python虚拟环境在env目录。
* env.rc 该脚本用户启动python虚拟环境source env.rc。该脚本同时为python manage.py设置了快捷方式__$mg__。你可以在任何目录调用$mg来执行django命令。比如你用$mg runserver来启动测试服务器。
项目对应的目录结构如下:
注:文件太多,去掉了部分不重要的文件
dj-scaffold.py projectname
|+docs/ #用于存放项目的相关文档
|+env/ #python虚拟环境由脚本自动生成
|~requirements/ #第三方依赖包的存放位置
| `-requirements.pip #pip的依赖说明文件
|~scripts/ #系统相关的脚本
| |-create_env.py ** #**__创建__**python虚拟环境env目录**
| `-env.rc #__进入__python虚拟环境。同时提供python manger.py的快捷方式$mg。可在任意目录使用$mg, 若要退出虚拟环境则可以使用deactive命令。
|~sites/ #Django的__项目目录__。在settings文件中增加了部分默认配置。如数据库默认使用sqlite设置项目的模板以及静态文件目录。
| |+media/ #项目静态文件(用户上传)
| |+static/ #项目静态文件css、js等
| `+templates/ #__项目模板__
|+tools/ #一些项目依赖的第三方工具包。如python虚拟环境初始化脚本等。
`~wsgi/ #项目部署用的wsgi文件
`-dj_scaffold.wsgi
====== lbstartapp ======
lbstartapp作为django的扩展命令提供。将dj_scaffold加到INSTALLED_APPS后即可使用该命令。该命令将生成一个标准的app相比django自带的startapplbstartapp将那些不太常用的app默认目录也都给生成了出来。对应目录结构如下
|+management/ #命令目录
|+static/ #静态文件目录
|+templates/ #模板目录
|+templatetags/ #tag目录
|-__init__.py
|-admin.py #admin管理后台的models配置文件
|-forms.py
|-models.py
|-settings.py #app自己的settings文件
|-tests.py
|-urls.py #urls配置文件
`-views.py
NOTE
项目的大多代码来自https://github.com/lincolnloop/django-startproject
类似项目https://github.com/mozilla/playdoh 个人觉得这个项目还可以。不过我个人觉得自己写的更符合自己的习惯。
“摒弃魔法”是Django的哲学之一。为此Django没有为用户提供太多的默认操作它希望一切对用户都是显示可见的。这本没太大的问题但在我看来“no magic”并不代表连规范都不要。Django实在是太缺乏一些必要的规范。

View File

@@ -0,0 +1,86 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T19:33:37+08:00
====== Django环境搭建常用的工具及做法 ======
Created Sunday 23 October 2011
http://2goo.info/blog/panjj/Django/2011/06/05/521
自己购买了个廉价的vps部署Django开发的网站汲取强大互联网的信息结合自己需要的环境尝试去搭配从中把最常用的工具和命令记录下来。服务器我选择Debian觉它更适合自己。Debian自带主流版本的Python最先要安装的是easy_install 和pip因为这两个工具可以方便安装python所需要的类库。
===== 安装方法: =====
apt-get install python-pip
pip install easy_install
pip install -U virtualenv
我们只需要给python安装**基本的类库**即可比如上面的pip easy_install和virtualenv等即可其他和django项目紧密相关的而因django项目不同而不用的类库我们**采用virtualenv工具具体安装就好了(这会将django项目的python环境和本机上的python环境隔离在virtualenv中安装的软件包不会带到本机中)**,比如:Django psycopg MySQLdb simplejson python-openid flup html5lib simplejson等。这样的做的好处很多这些类库版本都不断更新我们具体的项目具体安装具体的版本而**不会影响全局的python环境**把他们都集中在一个项目中。而且可以使用pip把这些虚拟环境的**类库清单**生成一个txt文件然后再通过pip一键式安装到位。
开始我们的环境搭建之旅吧。我们创建一个django虚拟环境
virtualenv --no-site-packages --distribute twogoo
cd twogoo
source bin/activate
此时已经进入虚拟环境接下来就是使用pip或者easy_install安装**项目的类库**了,如:
pip install django
pip install psycopg
pip install flup
...
我们目前在**项目环境文件夹**twogoo下当下建立**项目程序文件夹**myproject
django-admin.pu createproject myproject
我们已经安装了flup如果要启动fastcgi在虚拟环境中启动想要的端口9090或者其他的即可
python myproject/manage.py runfcgi method=threaded host=127.0.0.1 port=9090
启动了fastcgi如果我们修改了程序想再重启使用以上的命令是无效的我们需要关闭掉9090端口再重新启动
python myproject/manage.py runfcgi method=threaded host=127.0.0.1 port=9090
查看端口的PID,关闭掉端口的办法:
netstat -anp|grep 9090 #(端口号)
这时PID会列举出来比如PID是8920我们kill掉它
kill 8920
刚才说通过pip**一键式安装虚拟环境**,办法是先导出环境的**类库列表(除了python自带的标准库以外的安装)**
pip freeze > req.txt
这时会生成req.txt文件里面是具体的类库名和版本号格式如下
Django==1.3
Markdown==2.0.3
PIL==1.1.7
South==0.7.3
distribute==0.6.15
django-debug-toolbar==0.8.5
flup==1.0.3.dev-20110405
html5lib==0.90
psycopg2==2.4.1
python-openid==2.2.5
simplejson==2.1.6
wsgiref==0.1.2
我们再**根据req.txt文件创建一个wow项目虚拟环境**
cd ../
pip install -E wow -r twogoo/req.txt
此时会创建一个wow文件夹里面和twogoo环境是一模一样的。
如果要退出虚拟环境,请使用:
deactivate

View File

@@ -0,0 +1,169 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T20:42:04+08:00
====== Python virtualenv ======
Created Sunday 23 October 2011
http://www.rainsts.net/article.asp?id=1004
virtualenv 的作用相当于 Sandbox它通过隔离包目录和系统环境参数来实现多个**相对独立的虚拟环境**。如此可避免过多的第三方库因版本依赖造成问题。同时每个独立的虚拟环境只需通过打包即可分发,也大大方便了系统部署。
$ sudo easy_install virtualenv
现在我们可以创建虚拟环境了。
$ virtualenv test1
New python executable in test1/bin/python
Installing setuptools............done.
我们可以看到虚拟目录下已经被安装了基本所需的运行环境。
$ ls test1/bin
activate activate_this.py easy_install easy_install-2.6 pip python
$ ls test1/include/
python2.6
$ ls test1/lib
python2.6
$ ls test1/lib/python2.6/
_abcoll.py copy_reg.pyc linecache.py os.pyc sre_compile.py stat.py
_abcoll.pyc distutils linecache.pyc posixpath.py sre_compile.pyc stat.pyc
abc.py encodings locale.py posixpath.pyc sre_constants.py types.py
abc.pyc fnmatch.py locale.pyc re.py sre_constants.pyc types.pyc
codecs.py fnmatch.pyc ntpath.py re.pyc sre_parse.py UserDict.py
codecs.pyc genericpath.py ntpath.pyc site-packages sre_parse.pyc UserDict.pyc
config genericpath.pyc orig-prefix.txt site.py sre.py warnings.py
copy_reg.py lib-dynload os.py site.pyc sre.pyc warnings.pyc
进入 test1 目录,**激活虚拟环境。**
$ cd test1
test1$ source bin/activate
(test)test1$ which python
/home/yuhen/projects/test1/bin/python
(test)test1$ which easy_install
/home/yuhen/projects/test1/bin/easy_install
(test1)test1$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['',
'/home/yuhen/projects/test1/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg',
'/home/yuhen/projects/test1/lib/python2.6/site-packages/pip-0.7.2-py2.6.egg',
'/home/yuhen/projects/test1/lib/python2.6',
'/home/yuhen/projects/test1/lib/python2.6/plat-linux2',
'/home/yuhen/projects/test1/lib/python2.6/lib-tk',
'/home/yuhen/projects/test1/lib/python2.6/lib-old',
'/home/yuhen/projects/test1/lib/python2.6/lib-dynload',
'/usr/lib/python2.6',
'/usr/lib64/python2.6',
'/usr/lib/python2.6/plat-linux2',
'/usr/lib/python2.6/lib-tk',
'/usr/lib64/python2.6/lib-tk',
'/home/yuhen/projects/test1/lib/python2.6/site-packages',
'/usr/local/lib/python2.6/dist-packages/virtualenv-1.4.9-py2.6.egg',
'/usr/local/lib/python2.6/dist-packages/simplejson-2.1.1-py2.6-linux-x86_64.egg',
'/usr/local/lib/python2.6/site-packages',
'/usr/local/lib/python2.6/dist-packages',
'/usr/lib/python2.6/dist-packages',
'/usr/lib/pymodules/python2.6',
'/usr/lib/pymodules/python2.6/gtk-2.0',
'/usr/lib/python2.6/dist-packages/wx-2.8-gtk2-unicode']
>>>
可以看到使用 "souce bin/active" 激活以后,命令提示行多了一个** "(test1)" 前缀**,同时 python 和 easy_install 默认会使用虚拟环境 bin 目录下的程序。sys.path 显示当前虚拟环境的库目录被添加到搜索路径列表中。
(test1)$ easy_install MySQL-python
Searching for MySQL-python
Reading http://pypi.python.org/simple/MySQL-python/
Reading http://sourceforge.net/projects/mysql-python
Best match: MySQL-python 1.2.3c1
Downloading http://sourceforge.net/.../1.2.3c1/MySQL-python-1.2.3c1.tar.gz/download
Processing download
Running MySQL-python-1.2.3c1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Em0wfb/MySQL-python-1.2.3c1/egg-dist-tmp-vzoJ2t
In file included from _mysql.c:36:
/usr/include/mysql/my_config.h:1050:1: warning: "HAVE_WCSCOLL" redefined
In file included from /usr/include/python2.6/Python.h:8,
from pymemcompat.h:10,
from _mysql.c:29:
/usr/include/python2.6/pyconfig.h:808:1: warning: this is the location of the previous definition
zip_safe flag not set; analyzing archive contents...
Adding MySQL-python 1.2.3c1 to easy-install.pth file
Installed /home/yuhen/projects/python/test1/lib/python2.6/site-packages/MySQL_python-1.2.3c1-py2.6-linux-x86_64.egg
Processing dependencies for MySQL-python
Finished processing dependencies for MySQL-python
(test1)$ ls -l lib/python2.6/site-packages/
total 444
-rw-r--r-- 1 yuhen yuhen 283 2010-06-02 09:46 easy-install.pth
-rw-r--r-- 1 yuhen yuhen 106325 2010-06-02 09:46 MySQL_python-1.2.3c1-py2.6-linux-x86_64.egg
drwxr-xr-x 4 yuhen yuhen 4096 2010-06-02 09:45 pip-0.7.2-py2.6.egg
-rw-r--r-- 1 yuhen yuhen 333447 2010-06-01 23:58 setuptools-0.6c11-py2.6.egg
-rw-r--r-- 1 yuhen yuhen 30 2010-06-02 09:45 setuptools.pth
(test1)$ cat lib/python2.6/site-packages/setuptools.pth
./setuptools-0.6c11-py2.6.egg
(test1)$ python
>>> import MySQLdb
>>> MySQLdb.version_info
(1, 2, 3, 'gamma', 1)
>>>
MySQL-python 被安装到了虚拟环境中,且 easy-install.pth 中正确添加了 egg 搜索路径。
最后我们可以用 "deactivate" 命令**退出虚拟环境**。
(test1)test1$ deactivate
在创建虚拟环境时,我们可以添加 "--no-site-packages" 参数指示虚拟环境不要访问 global site-packages。
$ virtualenv --no-site-packages test2
New python executable in test2/bin/python
Installing setuptools............done.
$ cd test2
test2$ source bin/activate
(test2)test2$ python
>>> import sys
>>> sys.path
['',
'/home/yuhen/projects/python/test2/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg',
'/home/yuhen/projects/python/test2/lib/python2.6/site-packages/pip-0.7.2-py2.6.egg',
'/home/yuhen/projects/python/test2/lib/python2.6',
'/home/yuhen/projects/python/test2/lib/python2.6/plat-linux2',
'/home/yuhen/projects/python/test2/lib/python2.6/lib-tk',
'/home/yuhen/projects/python/test2/lib/python2.6/lib-old',
'/home/yuhen/projects/python/test2/lib/python2.6/lib-dynload',
'/usr/lib/python2.6',
'/usr/lib64/python2.6',
'/usr/lib/python2.6/plat-linux2',
'/usr/lib/python2.6/lib-tk',
'/usr/lib64/python2.6/lib-tk',
'/home/yuhen/projects/python/test2/lib/python2.6/site-packages']
>>>
搜索路径中除了 Python 基本库路径外,已经没有了 global site-packages。这样即便我们在安装了大量开发包的的系统里也可以隔离出一个干净的测试环境。
要判断一个虚拟环境是否 --no-site-packages除了检查 sys.path 外,还可以通过 "lib/python2.6/no-global-site-packages.txt" 文件来判断。

View File

@@ -0,0 +1,73 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T21:40:42+08:00
====== 发布一个Django的论坛系统LBForum开源、带演示 ======
Created Sunday 23 October 2011
http://haoluobo.com/2010/02/%E5%8F%91%E5%B8%83%E4%B8%80%E4%B8%AAdjango%E7%9A%84%E8%AE%BA%E5%9D%9B%E7%B3%BB%E7%BB%9Flbforum%EF%BC%88%E5%BC%80%E6%BA%90%E3%80%81%E5%B8%A6%E6%BC%94%E7%A4%BA%EF%BC%89/c
简介
LBForum 用django开发的论坛系统演示地址为http://vik.haoluobo.com/lbforum/
项目的地址为http://github.com/vicalloy/LBForum
界面部分抄的 FluxBB(一个开源的PHP论坛 http://fluxbb.org/ )。
虽然Django写的论坛也不少不过还真没什么好用的。
大多Django论坛都是独立的app而且不少还缺模板想我这样有经验的Django用户要跑起来都觉得麻烦其他普通用户就更别说了。
LBForum主要注重部署的方便性和易用性功能方面目前还比较简单。
LBForum一开始就是以整站的形式提供所以以LBForum做为基础项目进行二次开发是很容易的。
同时LBForum的开发尽量遵照Django可复用app原则因此即使需要将LBForum做为独立的app集成到其他项目也并不会太难。
主要功能
目前功能还比较简单,而且还有些小问题有待修正。
论坛分类,分版块
发帖,回帖
BBCode支持
置顶贴
使用django admin提供论坛管理功能
用开发服务器把LBForum跑起来
先把代码down下来。LBForum托管在github上http://github.com/vicalloy/LBForum 。如果你没有安装git你可以直接用界面右上方的download
source功能下载代码。
运行\scripts\create_lbforum_env.py初始化lbforum的python虚拟环境。该脚本会自动创建一个python的虚拟环境并使用easy_install安装对应的依赖包同时将一些依赖包解压到对应的目录中。
django使用的是svn版本所以机器上必须要安装有SVN不然脚本会运行失败。如果因为由于svn的问题导致脚本运行失败可以运行lbforum_env.bat进入lbforum环境手动安装django的svn版本。
环境初始化好后运行lbforum_env.bat进入lbforum环境
运行%mg% syncdb初始化数据库
运行%mg% runserver启动django开发服务器
进入admin创建论坛分类和版块
进入版块发帖
LBForum的目录结构说明
|+lbforum_env/#lbforum运行的python虚拟环境运行create_lbforum_env.py后自动创建
|+requirements/#lbforum用的第三方库和app运行的时候会将该目录加到python路径
|~scripts/#工程相关脚本
| |-create_lbforum_env.py#初始化python虚拟环境并自动安装easy_install/django依赖库
| |-helper.py#提供其他脚本所需的辅助函数
| `-lbforum_env.bat*#启动lbforum运行的虚拟环境及并为lbforum的manage.py提供快捷方式%mg%,比如初始化数据库%mg%
syncdb
|~sites/#站点配置/模板/静态文件
| `~default/#默认站点
| |+static/#静态资源文件如css等
| |+templates/#Django模板目录
| |+templates_plus/#Django模板目录用户将自己重写过的目标放到该目录
| `-……
|~src/#django的app目录
| |+account/#account相关app。具体站点通常会对用户中心进行定制所以该app在实际应用中很可能需要针对实际情况进行修改。
| |+djangohelper/#一些django的辅助函数等
| |+lbforum/#lbforum的主app论坛功能都在改app中
| |+lbregistration/#registration app的lbforum扩展主要去掉邮件地址认证功能
| |+onlineuser/#显示在线用户的app可复用的django app可脱离lbforum单独使用
| `+simpleavatar/#头像功能的app可复用的django app可脱离lbforum单独使用依赖djangohelper
|+tools/#工程用到的辅助工具目前只有一个virtualenv的脚本
注:
由于计划在以后做i18n所以目前只提供英文界面
django的错误提示是显示在字段后面fluxbb的错误全部都显示在表单前面。由于模板没有调好所以目前按照fluxbb的方式显示错误所以错误显示有些不太正常。
bbcode的输入框本想做成自适应大小的不过也调得有些问题所以现在输入框的大小固定。
文档… ,感觉好难写-_-,目前文档不全(项目中没有带任何的文档),日后补上。
应用程序的目录结构主要查看pinax
simpleavatar模块部分代码来自django-avatar
依赖包除用easy_install在线安装的外尽量使用zip包的方式附带在项目中减少安装依赖包的困难。
远程部署脚本计划使用fabric但fabric本身安装比较麻烦所暂未处理。
项目最早放在googlecode不过感觉github的功能更强些所以移了过去。

View File

@@ -0,0 +1,56 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T21:07:20+08:00
====== 合理的组织django的settings文件 ======
Created Sunday 23 October 2011
http://haoluobo.com/2011/10/django-settings/
django在一个项目的目录结构划分方面缺乏必要的规范因此不同人的项目组织形式也千奇百怪而且也很难说谁的做法就比较好。我根据自己的项目组织习惯发布了一个项目dj-scaffold。
前些天在reddit上为我的项目dj-scaffold打了个“广告”http://redd.it/kw5d4。不想评价甚糟甚至差点被打成负分。其中更也人将这个项目说的一文不值。面对负面声音虽然会有些不爽但其中的建设性意见还是需要听取的至于那些纯属个人偏好部分就自动过滤了。
在谈及settings文件如何组织时coderanger建议参考The Best (and Worst) of Django中的做法。文中的主要观点是开发环境和生产环境的配置都需要放到VCS中进行版本控制。参考文中的做法我对settings模块做了部分调整。注代码 https://github.com/vicalloy/dj-scaffold/tree/master/dj_scaffold/conf/prj/sites/settings
local_settings的弊病
为将项目的默认配置和本地配置区分开最常用的做法是增加一个local_settings.py文件并在settings文件的最后对该文件进行import。
try:
from local_settings import *
except:
pass
由此引发的问题是你不能对local_settings.py进行版本控制部署环境的配置万一丢失将难以找回。
解决方案
针对该问题,建议的解决方案如下
合理的配置文件组织方式
|~settings/
| |-__init__.py
| |-base.py #默认配置信息
| |-dev.py #开发环境的配置
| |-local.sample #本地的扩展配置在dev和production的最后进行import
| |-pre.sample #设置当前使用的配置为生产环境还是开发环境
| `-production.py #生产环境的配置
使用方式
DJANGO_SETTINGS_MODULE
django的admin脚本提供了settings参数用于指定当前使用的配置文件
django-admin.py shell --settings=settings.dev
在wsgi脚本中则可直接设置需要使用的settings
deploy.wsgi
os.environ['DJANGO_SETTINGS_MODULE'] = settings.production
简化参数
当然如果每次使用django-admin.py的时候都要带上settings参数还是非常恼人所以推荐的做法是在pre.py中配置自己所需要使用的配置文件。
SETTINGS = 'production' #dev

View File

@@ -0,0 +1,56 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T20:36:22+08:00
====== Django静态文件的配置 ======
Created Sunday 23 October 2011
http://2goo.info/blog/panjj/Django/2010/10/07/131
一直在寻找产品部署环境和开发环境时Django静态文件配置的差异化。比如说网站的css js和一些非程序相关的文件我暂时看成是静态文件。我们怎么正确配置才能让Django网站解析到静态文件呢简明来说要注意两个文件settings.py urls.py。
首先在settings文件中引用os模块
import os
然后我们定义一个常量,项目的**根目录地址**
PROJECT_PATH = os.path.abspath(os.path.dirname(__file__))
再者就是给**MEDIA_ROOT**赋值为:
MEDIA_ROOT= os.path.join(PROJECT_PATH,'static')
注意我们的静态文件在根目录下的static文件里如果文件夹名不一样join的参数改成 自己起用的名字)
settings.py 配置完了urls.py呢如下配置
from django.conf import settings
url(r'^static/(?P<path>.*)$', 'django.views.static.serve',{'document_root': **settings.MEDIA_ROOT** },name="media"),
^static/(?P<path>.*)$的static可以随自己喜欢的名字建议参考Django最佳实践做法。也许urls配置很重要稍微不小心url经常解析不到真正的静态文件。之前喜欢使用media比如^media/(?P<path>.*)$结果老解析不到静态文件还瞎捣鼓MEDIA_ROOT终究还是没有解析成功后来把media改成static一下子成功了。注意不一定是static只要不是media应该就可以了。很奇怪不知道是不是Django其他地方用到media了比如Django的Admin。
最后就是如何在templates里使用静态文件了
<script type="javascript/text" **src="/static/js/config.js"**></script>
<link rel="stylesheet" type="text/css" href="/static/css/contents.css"/>
<img src="/static/images/logo.ipg" alt=""/>
使用的时候注意 路径的开头需要加上/
这样的配置,在开发环境中式能正常解析的,在**产品部署环境**中只需修改settings文件的
MEDIA_URL
ADMIN_MEDIA_PREFIX
把他们改成实际的域名
MEDIA_URL='http://www.XXX.com/static/'
ADMIN_MEDIA_PREFIX='http://www.XXX.com/static/admin/'
ADMIN_MEDIA_PREFIX后面的admn可能有点差异我们是把Django的admin**静态文件拷贝**到一个名叫admin(static/admin)的文件夹里的。
补充根据实际的实践中发现ADMIN_MEDIA_PREFIX如果指定了后缀media或者static比如ADMIN_MEDIA_PREFIX='http://www.XXX.com/media/'或者ADMIN_MEDIA_PREFIX='http://www.XXX.com/static/'
urls.py应该另起一个后缀比如
url(r'^site_media/(?P<path>.*)$','django.views.static.serve',{'document_root':settings.MEDIA_ROOT },name="site_media"),
我们指定了site_media作为后缀而没有采用media和static是**避免和后台的静态路径冲突**而我们应用的静态文件无法得到正常解析。
(完)

View File

@@ -0,0 +1,49 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-21T22:29:39+08:00
====== Django项目production环境发布笔记 ======
Created Friday 21 October 2011
这里使用apache2 + mod_python, 安装忽略, 另外需要到你部署的机子上安装django 和 MySQLdb我用的ubuntu server装上之后 apache2, mod_python, MySQLdb 都已经有了, 装个django就好了。
其实配置蛮简单的, 修改/etc/apache2/httpd.conf 加入以下片段。
Alias /site_media /home/denny/zoomino/website/zoomino_media
<Location "/">
SetHandler python-program
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE zoomino.settings
# PythonOption django.root /
PythonDebug On
PythonPath "sys.path +['/home/denny']"
</Location>
<Location "/site_media">
SetHandler None
</Location>
项目位于/home/denny,这里要注意PythonPath 这里不是设置/home/denny/zoomino而且project dir的**上一级目录** 也就是你运行django-admin.py startproject的当前目录。如果你想在开发的时候自动定位你的template dir请看这篇文章http://dengyin2000.iteye.com/blog/323391。
django虽然把可以处理静态文件但是django内置的web server很差所以发布的时候需要用apache这样的handle。
Alias /site_media /home/denny/zoomino/website/zoomino_media 这句定义__资源文件的路径和映射的url__。
然后我们把django项目映射到根url“/”上。 因为我把django映射到了根上 所以最后一段肯定要加要不然资源文件会被django handle了那肯定是要报错的。
OK最后把settings.py的DEBUG设成False 重启apache就行了。
如何在开发的时候处理静态文件请看。 http://docs.djangoproject.com/en/dev/howto/static-files/#howto-static-files 再结合我的这篇文章http://dengyin2000.iteye.com/blog/323391定位你的static files就完美了。
参考http://docs.djangoproject.com/en/dev/topics/install/#database-installation
http://docs.djangoproject.com/en/dev/intro/install/#intro-install
http://docs.djangoproject.com/en/dev/howto/deployment/modpython/#howto-deployment-modpython
安装apache mod-python
http://www.howtoforge.com/embedding-python-in-apache2-with-mod_python-debian-ubuntu-fedora-centos-mandriva-opensuse

View File

@@ -0,0 +1,116 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-22T17:27:37+08:00
====== How to serve static files ======
Created Saturday 22 October 2011
https://docs.djangoproject.com/en/1.2/howto/static-files/
**Django itself doesnt serve static (media) files, such as images, style sheets, or video. It leaves that job to whichever Web server you choose.**
The reasoning here is that standard Web servers, such as Apache, lighttpd and Cherokee, are much more fine-tuned at serving static files than a Web application framework.
With that said, **Django does support static files during development**. You can use the__ django.views.static.serve() __view to serve media files.
== See also ==
If you just need to serve the admin media from a nonstandard location, see the **--adminmedia parameter to runserver**.
===== The big, fat disclaimer =====
Using this method is inefficient and insecure. Do not use this in a production setting. Use this** only for development.**
For information on serving static files in an Apache production environment, see the Django mod_python documentation.
===== How to do it =====
Heres the formal definition of the serve() view:
**def serve(request, path, document_root, show_indexes=False)**
To use it, just put this in your URLconf:
**(r'^site_media/(?P<path>.*)$', 'django.views.static.serve',**
** {'document_root': '/path/to/media'}),**
...where__ site_media__ is the URL where your media will be rooted, and __/path/to/media__ is the filesystem root for your media. This will call the serve() view, passing in the path from the URLconf and the (required) __document_root __parameter.
Given the above URLconf:
The file /path/to/media/foo.jpg will be made available at the URL /site_media/foo.jpg.
The file /path/to/media/css/mystyles.css will be made available at the URL /site_media/css/mystyles.css.
The file /path/bar.jpg will not be accessible, because it doesn't fall under the document root.
Of course, it's not compulsory to use a fixed string for the 'document_root' value. You might wish to make that an entry in your settings file and use the setting value there. That will allow you and other developers working on the code to easily change the value as required. For example, if we have a line in __settings.py__ that says:
**STATIC_DOC_ROOT = '/path/to/media'**
...we could write the above URLconf entry as:
**from django.conf import settings**
**...**
**(r'^site_media/(?P<path>.*)$', 'django.views.static.serve',**
** {'document_root': settings.STATIC_DOC_ROOT}),**
Be careful not to use the same path as your __ADMIN_MEDIA_PREFIX__ (which defaults to /media/) as this will overwrite your URLconf entry.
===== Directory listings =====
Optionally, you can pass the __show_indexes__ parameter to the serve() view. This is False by default. If it's True, Django will **display file listings** for directories.
For example:
(r'^site_media/(?P<path>.*)$', 'django.views.static.serve',
{'document_root': '/path/to/media', 'show_indexes': True}),
You can customize the index view by creating a template called** static/directory_index.html**. That template gets two objects in its context:
** directory -- the directory name (a string)**
** file_list -- a list of file names (as strings) in the directory**
Here's the default static/directory_index.html template:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Language" content="en-us" />
<meta name="robots" content="NONE,NOARCHIVE" />
<title>Index of {{ directory }}</title>
</head>
<body>
<h1>Index of {{ directory }}</h1>
<ul>
{% for f in file_list %}
<li><a href="{{ f }}">{{ f }}</a></li>
{% endfor %}
</ul>
</body>
</html>
Changed in Django 1.0.3: Prior to Django 1.0.3, there was a bug in the view that provided directory listings. The template that was loaded had to be called static/directory_listing (with no .html extension). For backwards compatibility with earlier versions, Django will still load templates with the older (no extension) name, but it will prefer the directory_index.html version.
===== Limiting use to DEBUG=True =====
Because URLconfs are just plain Python modules, you can use Python logic to make the static-media view available only in development mode. This is a handy trick to make sure the static-serving view doesn't slip into a production setting by mistake.
Do this by wrapping an if DEBUG statement around the **django.views.static.serve() **inclusion. Here's a full example URLconf:
from django.conf.urls.defaults import *
from django.conf import settings
urlpatterns = patterns('',
(r'^articles/2003/$', 'news.views.special_case_2003'),
(r'^articles/(?P<year>\d{4})/$', 'news.views.year_archive'),
(r'^articles/(?P<year>\d{4})/(?P<month>\d{2})/$', 'news.views.month_archive'),
(r'^articles/(?P<year>\d{4})/(?P<month>\d{2})/(?P<day>\d+)/$', 'news.views.article_detail'),
)
**if settings.DEBUG:**
urlpatterns += patterns('',
(r'^site_media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': '/path/to/media'}),
)
This code is straightforward. It imports the settings and checks the value of the DEBUG setting. If it evaluates to True, then site_media will be associated with the django.views.static.serve view. If not, then the view won't be made available.
Of course, the catch here is that you'll have to remember to set DEBUG=False in your production settings file. But you should be doing that anyway.

View File

@@ -0,0 +1,232 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-21T21:15:54+08:00
====== How to use Django with Apache and mod python ======
Created Friday 21 October 2011
https://docs.djangoproject.com/en/dev/howto/deployment/modpython/
===== Warning =====
Support for mod_python has been deprecated, and will be removed in Django 1.5. If you are configuring a new deployment, you are strongly encouraged to consider using __mod_wsgi __or any of the other supported backends.
The mod_python module for Apache can be used to deploy Django to a production server, although it has been mostly superseded by the simpler mod_wsgi deployment option.
mod_python is similar to (and inspired by) **mod_perl **: It **embeds Python within Apache** and loads Python code into memory when the server starts. Code stays in memory throughout the life of an Apache process, which leads to significant performance gains over other server arrangements.
Django requires Apache 2.x and mod_python 3.x, and you should use Apaches prefork MPM, as opposed to the worker MPM.
===== Basic configuration =====
To configure Django with** mod_python**, first make sure you have Apache installed, with the mod_python module activated.
Then edit your** httpd.conf** file and add the following:
<Location "/mysite/">
SetHandler python-program
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
PythonOption django.root** /mysite # 不应该带有结尾的斜杠**
PythonDebug On
</Location>
...and replace **mysite.settings **with the Python import path to your Django project's settings file.
This tells Apache: "Use mod_python for any URL at or under '/mysite/', using the Django mod_python handler." It passes the value of **DJANGO_SETTINGS_MODULE** so mod_python knows which settings to use.
Because mod_python //does not know we are serving this site from underneath the//__ /mysite/ __//prefix//, this value needs to be passed through to the mod_python handler in Django, via the //PythonOption django.root// ... line. The value set on that line (the last item) should __match the string given in the <Location ...> directive__. The effect of this is that Django will __automatically strip the /mysite string__ from the front of any URLs __before matching them against your URLconf patterns__. If you later move your site to live under /mysite2, you will not have to change anything except the django.root option in the config file.
When using django.root you should make sure that what's left, after the prefix has been removed, __begins with a slash__. Your URLconf patterns that are expecting an initial slash will then work correctly. In the above example, since we want to send things like /mysite/admin/ to /admin/, we need to remove the string /mysite from the beginning, so that is the django.root value. It would be **an error to use /mysite/ **(with a trailing slash) in this case.
Note that we're using the <Location> directive, not the <Directory> directive. The latter is used for pointing at places on your filesystem, whereas **<Location> points at places in the URL structure **of a Web site. <Directory> would be meaningless here.
Also, if your Django project is not on the default__ PYTHONPATH__ for your computer, you'll have to tell mod_python where your project can be found:
<Location "/mysite/">
SetHandler python-program
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
PythonOption django.root /mysite
PythonDebug On
__PythonPath__ "['/path/to/project'] + sys.path" #应该是mysite(也就是project目录)所在的**父目录**
</Location>
The value you use for PythonPath should include the __parent directories __of all the modules you are going to import in your application. It should also include the **parent directory **of the DJANGO_SETTINGS_MODULE location. This is exactly the same situation as setting the Python path for interactive usage. Whenever you try to import something, Python will run through all the directories in sys.path in turn, from first to last, and try to import from each directory until one succeeds.
Make sure that your Python source files' permissions are set such that the Apache user (usually named apache or httpd on most systems) will have **read** access to the files.
An example might make this clearer. Suppose you have some applications under /usr/local/django-apps/ (for example, /usr/local/django-apps/weblog/ and so forth), your settings file is at /var/www/mysite/settings.py and you have specified DJANGO_SETTINGS_MODULE as in the above example. In this case, you would need to write your PythonPath directive as:
**PythonPath "['/usr/local/django-apps/', '/var/www'] + sys.path"**
With this path, import weblog and import mysite.settings will both work. If you had import blogroll in your code somewhere and blogroll lived under the weblog/ directory, you would also need to add /usr/local/django-apps/weblog/ to your PythonPath.
__Remember: the parent directories of anything you import directly must be on the Python path.__
===== Note =====
If you're using Windows, we still recommended that you use **forward slashes **in the pathnames, even though Windows normally uses the backslash character as its native separator. Apache knows how to convert from the forward slash format to the native format, so this approach is portable and easier to read. (It avoids tricky problems with having to double-escape backslashes.)
This is valid even on a Windows system:
PythonPath "['c:/path/to/project'] + sys.path"
You can also add directives such as **PythonAutoReload Off **for performance. See the mod_python documentation for a full list of options.
Note that you should set **PythonDebug Off** on a production server. If you leave PythonDebug On, your users would see ugly (and revealing) Python tracebacks if something goes wrong within mod_python.
**Restart Apache**, and any request to /mysite/ or below will be served by Django.
__Note that Django's URLconfs won't trim (修剪、除去)the "/mysite/" -- they get passed the full URL.__
When deploying Django sites on mod_python, you'll need to__ restart Apache each time you make changes __to your Python code.
===== Multiple Django installations on the same Apache =====
It's entirely possible to run multiple Django installations on the same Apache instance. Just use** VirtualHost** for that, like so:
NameVirtualHost *
<VirtualHost *>
ServerName www.example.com
# ...
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
</VirtualHost>
<VirtualHost *>
ServerName www2.example.com
# ...
SetEnv DJANGO_SETTINGS_MODULE mysite.other_settings
</VirtualHost>
If you need to put two Django installations within the same VirtualHost (or in different VirtualHost blocks that share the same server name), you'll need to take **a special precaution** to ensure __mod_python's cache doesn't mess things up__. Use the** PythonInterpreter** directive to give different <Location> directives separate interpreters:
<VirtualHost *>
ServerName www.example.com
# ...
<Location "/something">
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
** PythonInterpreter mysite**
</Location>
<Location "/otherthing">
SetEnv DJANGO_SETTINGS_MODULE mysite.other_settings
**PythonInterpreter othersite**
</Location>
</VirtualHost>
The values of PythonInterpreter don't really matter, as long as they're **different** between the two Location blocks.
===== Running a development server with mod_python =====
If you use mod_python for your development server, you can avoid the hassle of having to restart the server each time you make code changes. Just set __MaxRequestsPerChild 1__ in your httpd.conf file to **force Apache to reload everything for each request**. But don't do that on a production server, or we'll revoke your Django privileges.
If you're the type of programmer who debugs using scattered print statements, note that __output to stdout__// will not appear in the Apache log and can even cause response errors//.
If you have the need to print debugging information in a mod_python setup, you have a few options. You can __print to stderr __explicitly, like so:
print >> sys.stderr, 'debug text'
sys.stderr.flush()
(note that stderr is buffered, so calling flush is necessary if you wish debugging information to be displayed promptly.)
A more compact approach is to use an assertion:
assert False, 'debug text'
Another alternative is to add **debugging information **to the template of your page.
===== Serving media files =====
Django doesn't serve media files itself; it leaves that job to whichever Web server you choose.
We recommend using a** separate Web server** -- i.e., one that's not also running Django -- for serving media. Here are some good choices:
* lighttpd
* Nginx
* TUX
* A stripped-down version of Apache
* Cherokee
If, however, you have no option but to serve media or static files on the same Apache VirtualHost as Django, here's how you can turn off mod_python for a** particular part** of the site:
<Location "/media">
SetHandler None
</Location>
Just change Location to the **root URL of your media files**. You can also use **<LocationMatch>** to match a regular expression.
This example sets up Django at the site root but explicitly disables Django for the **media** and **static** subdirectories and any URL that ends with .jpg, .gif or .png:
<Location "/">
SetHandler python-program
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE mysite.settings
</Location>
<Location "/media">
SetHandler None
</Location>
<Location "/static">
SetHandler None
</Location>
<LocationMatch "\.(jpg|gif|png)$">
SetHandler None
</LocationMatch>
===== Serving the admin files =====
Note that the// Django development server// automagically serves the static files of the admin app, but this is not the case when you use any other server arrangement. You're responsible for setting up Apache, or whichever media server you're using, to serve the admin files.
The admin files live in (django/contrib/admin/static/admin) of the Django distribution.
We strongly recommend using** django.contrib.staticfiles** to handle the admin files, but here are two other approaches:
1) Create a symbolic link to the admin static files from within your document root.
2) Or, copy the admin static files so that they live within your Apache document root.
===== Using "eggs" with mod_python =====
If you installed Django from a Python egg or are using eggs in your Django project, some extra configuration is required. Create an extra file in your project (or somewhere else) that contains something like the following:
import os
os.environ['PYTHON_EGG_CACHE'] = '/some/directory'
Here, /some/directory is a directory that the Apache Web server process can** write to**. It will be used as the location for any **unpacking** of code the eggs need to do.
Then you have to tell mod_python to import this file before doing anything else. This is done using the__ PythonImport__ directive to mod_python. You need to ensure that you have specified the **PythonInterpreter** directive to mod_python as described above (you need to do this even if you aren't serving multiple installations in this case). Then add the PythonImport line in the main server configuration (i.e., outside the Location or VirtualHost sections). For example:
PythonInterpreter my_django
PythonImport /path/to/my/project/file.py my_django
Note that you can use an absolute path here (or a normal dotted import path), as described in the mod_python manual. We use an absolute path in the above example because if any Python path modifications are required to access your project, they will not have been done at the time the PythonImport line is processed.
===== Error handling =====
When you use Apache/mod_python, errors will be caught by Django -- in other words, they **won't propagate t**o the Apache level and won't appear in the Apache error_log.
The exception for this is if something is really wonky in your** Django setup**. In that case, you'll see an __"Internal Server Error"__ page in your browser and the full Python traceback in your Apache error_log file. The error_log traceback is spread over multiple lines. (Yes, this is ugly and rather hard to read, but it's how mod_python does things.)
If you get a segmentation fault
If Apache causes a segmentation fault, there are two probable causes, neither of which has to do with Django itself.
* It may be because your Python code is importing the "pyexpat" module, which may conflict with the version embedded in Apache. For full information, see Expat Causing Apache Crash.
* It may be because you're running mod_python and mod_php in the same Apache instance, with MySQL as your database backend. In some cases, this causes a known mod_python issue due to version conflicts in PHP and the Python MySQL backend. There's full information in the mod_python FAQ entry.
If you continue to have problems setting up mod_python, a good thing to do is __get a barebones mod_python site working, without the Django framework. __This is an easy way to isolate mod_python-specific problems. Getting mod_python Working details this procedure.
The next step should be to edit your test code and add an import of any Django-specific code you're using -- your views, your models, your URLconf, your RSS configuration, etc. Put these imports in your test handler function and access your test URL in a browser. If this causes a crash, you've confirmed it's the importing of Django code that causes the problem. Gradually reduce the set of imports until it stops crashing, so as to find the specific module that causes the problem. Drop down further into modules and look into their imports, as necessary.
===== If you get a UnicodeEncodeError =====
If you're taking advantage of the internationalization features of Django (see Internationalization and localization) and you intend to allow users to **upload files**, you must ensure that the environment used to start Apache is configured to __accept non-ASCII file names.__ If your environment is not correctly configured, you will trigger UnicodeEncodeError exceptions when calling functions like os.path() on filenames that contain non-ASCII characters.
To avoid these problems, the environment used to start Apache should contain settings analogous to the following:
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
Consult the documentation for your operating system for the appropriate syntax and location to put these configuration items; /etc/apache2/envvars is a common location on Unix platforms. Once you have added these statements to your environment, restart Apache.

View File

@@ -0,0 +1,95 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-22T15:31:56+08:00
====== How to use Django with Apache and mod wsgi ======
Created Saturday 22 October 2011
https://docs.djangoproject.com/en/dev/howto/deployment/modwsgi/#serving-files
Deploying Django with Apache and mod_wsgi is the **recommended way** to get Django into production.
mod_wsgi is an **Apache module** which can be used to host any Python application which supports the Python WSGI interface described in PEP 3333, including Django. Django will work with any version of Apache which supports mod_wsgi.
The official mod_wsgi documentation is fantastic; its your source for all the details about how to use mod_wsgi. Youll probably want to start with the installation and configuration documentation.
===== Basic configuration =====
Once youve got mod_wsgi installed and activated, edit your httpd.conf file and add:
**WSGIScriptAlias / /path/to/mysite/apache/django.wsgi**
The first bit above is the url you want to be serving your application at (/ indicates the root url), and the second is the location of a "WSGI file" -- see below -- on your system, usually inside of your project. **This tells Apache to serve any request below the given URL using the WSGI application defined by that file.**
Next we'll need to actually create this WSGI application, so create the file mentioned in the second part of WSGIScriptAlias and add:
**import os**
**import sys**
**os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'**
**import django.core.handlers.wsgi**
**application = django.core.handlers.wsgi.WSGIHandler()**
If your project is not on your __PYTHONPATH__ by default you can add:
**path = '/path/to/mysite's parent directory'**
**if path not in sys.path:**
** sys.path.append(path)**
just below the import sys line to place your project on the path. Remember to replace 'mysite.settings' with your correct settings file, and '/path/to/mysite' with your own project's location.
===== Serving files =====
**Django **__doesn't serve files itself__**; it leaves that job to whichever Web server you choose.**
We recommend using a separate Web server -- i.e., one that's not also running Django -- for **serving media**. Here are some good choices:
* lighttpd
* Nginx
* TUX
* A stripped-down version of Apache
* Cherokee
If, however, you have no option but to serve media files on the same Apache VirtualHost as Django, you can set up Apache to __serve some URLs as static media, and others using the mod_wsgi interface to Django__.
This example sets up Django at the site root, but explicitly serves robots.txt, favicon.ico, any CSS file, and anything in the /static/ and /media/__ URL space as a static file__. All other URLs will be served using mod_wsgi:
**Alias /robots.txt /usr/local/wsgi/static/robots.txt**
**Alias /favicon.ico /usr/local/wsgi/static/favicon.ico**
**AliasMatch ^/([^/]*\.css) /usr/local/wsgi/static/styles/$1**
**Alias /media/ /usr/local/wsgi/media/**
**Alias /static/ /usr/local/wsgi/static/**
**<Directory /usr/local/wsgi/static>**
**Order deny,allow**
**Allow from all**
**</Directory>**
**<Directory /usr/local/wsgi/media>**
**Order deny,allow**
**Allow from all**
**</Directory>**
**WSGIScriptAlias / /usr/local/wsgi/scripts/django.wsgi**
**<Directory /usr/local/wsgi/scripts>**
**Order allow,deny**
**Allow from all**
**</Directory>**
===== Serving the admin files =====
Note that the Django **development server** automagically serves the static files of the **admin app**, but this is not the case when you use any other server arrangement. You're responsible for setting up Apache, or whichever media server you're using, to serve the admin files.
The admin files live in (**django/contrib/admin/media**) of the Django distribution.
We strongly recommend using __django.contrib.staticfiles__ to handle the admin files, but here are two other approaches:
* Create a symbolic link to the admin static files from within** your document root**.
* Or, copy the admin static files so that they live within your Apache document root.
===== Details =====
For more details, see the mod_wsgi documentation on Django integration, which explains the above in more detail, and walks through all the various options you've got when deploying under mod_wsgi.

View File

@@ -0,0 +1,97 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-27T18:31:44+08:00
====== My Idea Of The Django Blogging App™ ======
Created Thursday 27 October 2011
http://www.muhuk.com/2010/04/my-idea-of-the-django-blogging-app%E2%84%A2/
I am not going to talk about yet another Django-based blogging engine in this post. There are a number of blogging apps which try to be like turn-key solutions, like a WordPress blog. I have skimmed through the code of many such apps, but havent used one yet. Some of them are really high quality apps. What I have in mind is somewhat different though. I would like an app that would allow me to build a blog that satisfies my projects specific requirements.
Let me reiterate the last sentence. Having a Django-based blog just because Django is fashinable is a little dumb in my opinion. If Django-based X blogging engine suits you better than anything else, use it. Why not? But my personal choice of blogging engine is WordPress1. The value of a Django blogging app, for me, is in adding a blog to a Django project. And different projects might have different requirements. So my idea of a Django blogging app is one that is highly configurable and highly extendable.
On the other hand I dont need the convenience of clicking a checkbox on a polished UI. I can write a function. Or I dont necessarily need it to, say, provide a navigation menu. There are apps that do that. Even if there wasnt it shouldnt be the blog apps job. So I am not looking for an instant-blog. I have a Django app in my mind, nothing more.
What Should Be Left Out
Basically any feature that can be provided by another reusable app should be left out. Why should we re-implement something that is already done… and reviewed by others… and tested. Of course this doesnt necessarily mean providing no convenience functions.
No admin. Because we already have one.
No theming. For the love of Flying Spaghetti Monster, you dont need any theming other than what django.template offers. Pre-built themes are for turn-key solutions.
No comments or contact forms. (See django.contrib.comments and django-contact-form)
No official markup format (or formats). This can be handled in the templates without difficulty. But, maybe, pluggable content filters is a good idea. I havent made up my mind on this one entirely. It wont use any markup format by default, that is for sure.
What Should Be Included
Remember, every project has a different set of needed features for its blog. Some need catagories, some need tags and some others need both. But it would end up as a disaster if we implemented each one of those features into a single app. Instead I think it should consist of many small apps that work together. But I wouldnt want to end up having huge spaghetti of apps that all depend on one another, like Pinax does. A minimal amount of core apps2 and then everything else should be optional. By optional I mean you dont have to install packages you wont need.
I think the components (apps) should be activated via adding to INSTALLED_APPS and configured with settings. I cant think of any parameter that needs to be changed dynamically, so why not use the established way of doing configuration in Django.
Two must have features for such a blogging app are previews and scheduled publishing. It is possible that you sometimes write a post quickly and publish it immediately. But I suppose nobody will say they dont care about these two features.
Built-in feeds and sitemaps are also nice to have.
Multiple instances of this blogging app running on the same project? À la admin. I cant make my mind on this one. Sure it would be a nice feature. But it could complicate the code. Peehaps too much for a not so common case.
What do you think about the general idea? Are there any other must-have features? Would you be willing to learn a new app when you are already comfortable with another blogging app?
1: Even though its written in the abomination called PHP. But since there are plugins for everything I dont have to touch the code.
2: One sounds like a good number, if possible.
Bookmark and Share
No related posts.
Tags: blogs, django, python, reusable
This entry was posted on Wednesday, April 28th, 2010 at 12:36 and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.
7 Responses to “My Idea Of The Django Blogging App™”
Matthias Kestenholz says:
28.04.2010 at 13:48
Oh yeah, there are other must-have features for something which wants to call itself a blog:
Receiving and sending pings, trackbacks and pingbacks.
Without these, its just another news app with commenting functionality. Thats not what a blog is all about.
Atamert Ölçgen says:
28.04.2010 at 13:56
Good points Matthias, I agree 100%.
Fantomas42 says:
28.04.2010 at 19:28
I totally agree with you when you said : “Why should we re-implement something that is already done… and reviewed by others… and tested. ” And I understand very well why you have choose WordPress, it is really powerfull, but a little bit to ugly for me to maintain.
My conception of a weblog application is really close of your. So that why you should check the release of my weblog application, and give me some feed back.
http://github.com/Fantomas42/django-blog-zinnia
@Matthias Kestenholz Pingback and trackbacks are a part of a blogging system, but not the essential. And some pluggable applications does it very well, for example :
http://github.com/svetlyak40wt/django-pingback
Sorry for my french !
Atamert Ölçgen says:
28.04.2010 at 22:12
@Fantomas42, Your app looks good. But its all-or-nothing this way. I would like to be able to choose a subset of available features. Ill take a closer look later, though.
Josh says:
28.04.2010 at 22:39
I agree with your ideas for the most part. Im of the persuasion that a lot of external dependencies make things more complicated than they need to be. A few here and there are fine, I say.
Im curious what your opinion of django-articles is (its my pet project). Im all for making it more configurable. If you make suggestions, Ill probably take them to heart and try to apply them to the codebase! Im not suggesting that django-articles is the end-all be-all blog engine for Django, but it might well be close enough to what youre after… :)
origiNell says:
29.04.2010 at 13:25
“There are a number of blogging apps which try to be like turn-key solutions, like a WordPress blog.”
Thats what bothered me too.. I wrote my own blogging engine which tries to be as modular as possible.. or at least I try to make everything optional via settings.. ;-) The only thing thats really hardcoded is a simple comment spam protection (which I might remove in favor of disqus) and markdown as markup of choice..
http://github.com/originell/simpleblog/tree
Julien Phalip says:
03.05.2010 at 13:06
I couldnt agree more! In that same spirit weve built a blog framework to make it super easy to create your own blog app and extend it to your needs: http://github.com/glamkit/glamkit-blogtools Take a look at it and see what you think ;)

View File

@@ -0,0 +1,65 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-22T16:15:23+08:00
====== Serving Django Admin Static Media Files with Apache WSGI on Ubuntu ======
Created Saturday 22 October 2011
http://blog.videntity.com/?p=507
Here is a quick and simple configuration code recipe to serve the admin static files in a production environment using modWSGI and Apache 2.
Modify your apache2 configuration file. By default that would be:
/etc/apache2/sites-available/default
Just add the line:
**Alias /media/ /var/www/media/**
..under your VirtualHost setting, like so:
**<VirtualHost *:80>**
** ServerAdmin webmaster@localhost**
** Alias /media/ /var/www/media/**
** DocumentRoot /var/www**
** <Directory />**
** Options FollowSymLinks**
** AllowOverride None**
** </Directory>**
** <Directory /var/www/>**
** Options Indexes FollowSymLinks MultiViews**
** AllowOverride None**
** Order allow,deny**
** allow from all**
** </Directory>**
**WSGIScriptAlias / /home/ubuntu/RESTCat/apache/django.wsgi**
**.**
**.**
**</VirtualHost>**
Note that the WSGIScriptAlias points to your **Django application configuration file**. I keep it along side my project in a folder called apache in a file called django.wsgi.
Now add a symbolic link to point apache to the admin static content like so:
**sudo ln -s /usr/local/lib/python2.6/dist-packages/django/contrib/admin/media /var/www/media**
Now simply restart apache.
**sudo apache2ctl restart**
Your admin static files should now be served!

View File

@@ -0,0 +1,48 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-21T22:28:40+08:00
====== django+apache配置文件详解 ======
Created Friday 21 October 2011
setting.py是django项目的重要配置文件现将几个重要参数解释一下
#这个是上传文件的“本地”绝对路径。。。
MEDIA_ROOT = '/home/lion/www/uploads/'
#这个是上传文件的url相对路径django将通过这个相对路径来生成上传文件的真实的url路径否则django对上传的文件无法正确定位
MEDIA_URL = '/uploads/'
#这个也是url相对路径是和Admin Sitedjango内置的后台管理页面相关的资源文件这个路径设置不正确将导致Admin Site的页面显示不正常
ADMIN_MEDIA_PREFIX = '/media/'
apache并没有内置对python语言的支持所以需要安装mod_python插件
在debian下安装只要以root权限输入以下命令
apt-get install libapache2-python2.x2.x是版本比如说2.5
httpd.conf是apache的主要配置文件
以下是我的httpd.conf
#这里设置了在根目录下使用mod_pyhton解释器+django
<Location "/">使用解释器的url路径
SetHandler python-program
PythonHandler django.core.handlers.modpython
SetEnv DJANGO_SETTINGS_MODULE www.settings
#上面的这句重点解释一下我们用dejango生成的web程序正常是在同一个目录下这个目录对python来说是一个“模块”切记
#因此如果我们的web程序目录名是www,目录下的配置文件是settings.py 那么settings这个"模块"在python中的表示就是 www.settings
#于是就有了以上的 www.settings 这个模块了
PythonOption django.root /home/lion/#这里要设置成你的web程序“所在”的目录也就是web目录的父目录
PythonDebug On
PythonPath "['/home/lion/'] + sys.path"# 一般情况下你的web程序模块不再python的默认搜索目录里面所以还是把这个web程序“所在”的目录加入python的搜索目录当中才可以。
</Location>
Alias /media/ /usr/share/python-support/python-django/django/contrib/admin/media/# 定义Admin Site相关资源文件目录为/media/
Alias /uploads/ /home/lion/www/uploads/# 定义上传目录为/uploads/注意要想能够正常上传文件需要开启apache运行用户对它的写入权限。apache一般以www-data 或者 nobody 用户运行。
<Location "/media/">#设定/media/目录不使用任何解释器包括python
SetHandler None
</Location>
<Location "/uploads/">#设定/uploads/目录不使用任何解释器。--否则无法访问上传后的文件,不信你试试看~~
SetHandler None
</Location>

View File

@@ -0,0 +1,28 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T17:57:50+08:00
====== django1.3的staticfiles ======
Created Sunday 23 October 2011
http://haoluobo.com/2011/07/django1-3%E7%9A%84staticfiles/
django1.3新加入了一个静态资源管理的appdjango.contrib.staticfiles。在以往的django版本中静态资源的管理一向都是个问题。
部分app发布的时候会带上静态资源文件在部署的时候你必须手动从各个app中将这些静态资源文件复制到同一个static目录。
在引入staticfiles后你只需要执行./manage.py collectstatic就可以很方便的将所用到app中的静态资源复制到同一目录。
staticfiles的引入方便了django静态文件的管理不过感觉staticfiles的文档写的并不是太清楚初次使用的时候还是让我有些困惑。
下面简单的介绍一下staticfiles的主要配置
* STATIC_ROOT运行manage.py collectstatic后静态文件将复制到的目录。注意不要把你项目的静态文件放到这个目录。这个目录只有在运行collectstatic时才会用到。我最开始想当然的以为这个目录和MEDIA_ROOT的作用是相同的致使在开发环境下一直无法找到静态文件。
* STATIC_URL设置的static file的起始url这个只可以在template里面引用到。这个参数和MEDIA_URL的含义差不多。
* STATICFILES_DIRS除了各个app的static目录以外还需要管理的静态文件位置比如项目公共的静态文件差不多。和TEMPLATE_DIRS的含义差不多。
* 各个APP下static/目录下的静态文件django的开发服务器会自动找到这点和以前APP下的templates目录差不多。
* 在urls.py中加入静态文件处理的代码
* from django.contrib.staticfiles.urls import staticfiles_urlpatterns
* # ... the rest of your URLconf goes here ...
* urlpatterns += staticfiles_urlpatterns()

View File

@@ -0,0 +1,37 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-22T22:18:40+08:00
====== django media 设置 ======
Created Saturday 22 October 2011
http://hi.baidu.com/uniqcmt/blog/item/e64a0c1e2d9dd41b40341770.html
在一个 model 中使用 FileField 或 ImageField 需要以下步骤:
1) 在你的 settings 文件中, 定义一个完整路径给 __MEDIA_ROOT__ 以便让 Django在此处保存**上传文件**. (出于性能考虑,这些文件并不保存到数据库.) 定义 __MEDIA_URL__ 作为**该目录的公共 URL.** 要确保该目录对 WEB 服务器用户帐号是可写的.
2) 在你的 model 中添加 FileField 或 ImageField, 并确保定义了 __upload_to__ 选项,以告诉 Django 使用 MEDIA_ROOT 的哪个**子目录**保存上传文件.
3) 你的数据库中要保存的只是文件的路径(相对于 MEDIA_ROOT). 出于习惯你一定很想使用 Django 提供的 get_<fieldname>_url 函数.举例来说,如果你的 ImageField 叫作 mug_shot, 你就可以在模板中以 {{ object.get_mug_shot_url }} 这样的方式得到图像的绝对路径.
===== settings.py 设置 =====
# 媒体文件的绝对路径
# Absolute path to the directory that holds media.
# Example: "/home/media/media.lawrence.com/"
MEDIA_ROOT = 'D:/django/hwedding/images/'
# 媒体文件的相对路径
# URL that handles the media served from MEDIA_ROOT.
# Example: "http://media.lawrence.com"
MEDIA_URL = '/smedia/'
# admin管理素材路径
# URL prefix for admin media -- CSS, JavaScript and images. Make sure to use a
# trailing slash.
# Examples: "http://foo.com/media/", "/media/".
ADMIN_MEDIA_PREFIX = '/media/'
# urls.py 影射路径: 将 url smedia 影射到 settings.MEDIA_ROOT
(r'^smedia/(?P<path>.*)$', 'django.views.static.__serve__',{'document_root': settings.MEDIA_ROOT}),
# models.py 上传图片字段设置
image = models.ImageField('介绍图片',upload_to='./up')
相当于传到 D:/django/hwedding/images/up 目录下浏览时通过__/smedia/up/__访问

View File

@@ -0,0 +1,156 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-22T22:41:18+08:00
====== linux下nginx+python+fastcgi部署总结(django版) ======
Created Saturday 22 October 2011
http://www.vimer.cn/2011/07/linux%E4%B8%8Bnginxpythonfastcgi%E9%83%A8%E7%BD%B2%E6%80%BB%E7%BB%93django%E7%89%88.html/comment-page-1
最近因为项目上的需要开始大量使用nginx因此也想趁机将以前常用的django+apache的架构换成**django+nginx+fastcgi**,此文是整个搭建的步骤,主要留作备忘,也希望对大家有所帮助。
注意虽然本文成功的搭建了django运行fastcgi的实例但是在实际运行中发现了很多问题比如程序执行异常进程在每次请求之后退出之类的。可能是我机器的问题也可能是程序本身bug大家如果用来搭建外网环境请务必多多测试。
===== 一. 编译nginx =====
在网上买了一本《实战nginx-取代Apache的高性能服务器》写的比较浅主要是些配置方面的东西不过却正是目前我所需要的。由于需要支持__https和rewrite__所以除了nginx的源码之外又下载了 openssl-0.9.8r.tar.gz 和 pcre-8.12.tar.gz把他们和nginx-1.0.4.tar.gz放到同一个目录。
为了方便编译,笔者写了一个脚本,代码如下:
#=============================================================================
#脚本所在绝对目录
abs_path(){
local path=$1
local basename=$( basename $path )
local dirname=$( dirname $path )
cd $dirname
if [ -h $basename ]; then
path=$( readlink $basename )
abs_path $path
else
pwd
fi
}
#=============================================================================
#依赖的目录
src_base_dir=$( abs_path $0 )
src_openssl_dir=$src_base_dir'/openssl-0.9.8r'
src_pcre_dir=$src_base_dir'/pcre-8.12'
src_nginx_dir=$src_base_dir'/nginx-1.0.4'
#=============================================================================
#目标的目录
dest_base_dir=$src_base_dir'/release'
dest_nginx_dir=$dest_base_dir'/nginx'
#=============================================================================
#把所有的tar.gz解压
find . -name "*.tar.gz" | xargs -IX tar zxvf X
#=============================================================================
#编译nginx
cd $src_nginx_dir
chmod u+x ./configure
./configure --with-http_stub_status_module --with-http_ssl_module --with-openssl=$src_openssl_dir --with-pcre=$src_pcre_dir --prefix=$dest_nginx_dir
make && make install
编译完成后我们就需要来配置nginx了
===== 二.配置nginx =====
在server配置项下增加
location / {
#fastcgi_pass 127.0.0.1:9001;
fastcgi_pass unix:django.sock;
fastcgi_param PATH_INFO $fastcgi_script_name;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_pass_header Authorization;
fastcgi_intercept_errors off;
fastcgi_param SERVER_PROTOCOL $server_protocol;
fastcgi_param SERVER_PORT $server_port;
fastcgi_param SERVER_NAME $server_name;
}
location** /admin_media/** {
alias /usr/local/lib/python2.7/site-packages/django/contrib/admin/media/;
break;
}
location **/site_media/** {
alias /home/dantezhu/htdocs/ngx_django/media/;
break;
}
这里的3个location配置分别解决了与python进程通信、django后台管理端样式存放、网站样式存放的问题。对照着apache的配置来看就很容易明白了
WSGIPythonEggs /tmp
<VirtualHost *>
ServerName fuload.qq.com
** WSGIScriptAlias / /home/dantezhu/htdocs/fuload/conf/setting.wsgi**
<Directory />
Options FollowSymLinks
AllowOverride
Order allow,deny
Allow from all
</Directory>
<Directory "/home/dantezhu/htdocs/fuload/mysite">
Order Deny,Allow
Deny from all
</Directory>
**Alias /admin_media "/usr/local/lib/python2.7/site-packages/django/contrib/admin/media"**
<Directory "/usr/local/lib/python2.7/site-packages/django/contrib/admin/media">
Order allow,deny
Options Indexes
Allow from all
IndexOptions FancyIndexing
</Directory>
#AliasMatch /site_media/(.*\.(css|gif|png|jpg|jpeg)) /home/dantezhu/htdocs/fuload/media/$1
** Alias /site_media /home/dantezhu/htdocs/fuload/media/**
<Directory "/home/dantezhu/htdocs/fuload/media/">
Order allow,deny
Options Indexes
Allow from all
IndexOptions FancyIndexing
</Directory>
</VirtualHost>
===== 三.安装fastcgi依赖 =====
需要到 http://trac.saddi.com/flup%E4%B8%8B%E8%BD%BD%E5%AE%89%E8%A3%85%EF%BC%8C%E4%B9%8B%E5%90%8Efastcgi%E6%89%8D%E8%83%BD%E5%A4%9F%E6%AD%A3%E5%B8%B8%E5%90%AF%E5%8A%A8。
===== 四.启动django =====
创建django project的过程我们就不说了只列出启动/停止的命令:
启动:
#python manage.py runfcgi daemonize=true pidfile=`pwd`/django.pid host=127.0.0.1 port=9001 maxrequests=1 &
python manage.py __runfcgi__ daemonize=true pidfile=`pwd`/django.pid socket=/home/dantezhu/nginx/sbin/django.sock maxrequests=1 &
停止:
kill -9 `cat django.pid`
===== 五.启动nginx =====
启动:
./nginx -p /home/dantezhu/nginx/
停止:
kill -QUIT `cat ../logs/nginx.pid`
重新载入配置:
./nginx -t -c `pwd`/../conf/nginx.conf
kill -HUP `cat ../logs/nginx.pid`
成功显示了django的后台界面:
OK到此为止大功告成

View File

@@ -0,0 +1,110 @@
Content-Type: text/x-zim-wiki
Wiki-Format: zim 0.4
Creation-Date: 2011-10-23T20:48:21+08:00
====== 在Django中使用CKEditor ======
Created Sunday 23 October 2011
http://2goo.info/blog/panjj/Django/2010/11/15/139
在博客中使用富文本编辑器是很常见的今天尝试在Django中使用比较知名的CKEditor。前提条件是通过静态文件的正确配置可以参考之前写的《Django静态文件的配置》今天不再累赘。
一 前提条件:
1. 找到第三方支持插件django-ckeditor
相关开源项目地址https://github.com/dwaiter/django-ckeditor
2. python中安装simplejson
相关包地址http://pypi.python.org/pypi/simplejson/
安装方法和安装其他Python第三方库是一样的下载该包解压到包的目录下执行 python setup.py install过程中python会自动寻找依赖的类库所以你必须能连上网络
二 配置CKEditor环境
你可以选择把django-ckeditor安装到python里也可以把下载的django-ckeditor作为自己的app来用。我选择后者。
下载的django-ckeditor目录结构
ckeditor
setup.py
...
把django-ckeditor里的ckeditor直接粘贴到我们的项目中然后把该app的templates和media挪出来整个项目的目录结构
app_test #这是我特意创建的一个自己的app作为测试用。
ckeditor
static
--|ckeditor
----|...
--|css
----|...
templates
--|ckeditor
__init__.py
manage.py
settings.py
url.py
整个过程就是把下载的ckeditor文件夹当成一个全新的app熟悉的Django的就不必一一介绍了如果你和我一样是新手不用担心我们教程后面有一个基本的实例 供参考。
我们在settings.py最后一行给ckeditor写一些配置
CKEDITOR_CONFIGS = {
'default': {
'toolbar':[
['Source','-','Save','NewPage','Preview','-','Templates'],
['Cut','Copy','Paste','PasteText','PasteFromWord','-','Print','SpellChecker','Scayt'],
['Undo','Redo','-','Find','Replace','-','SelectAll','RemoveFormat'],
['Form','Checkbox','Radio','TextField','Textarea','Select','Button', 'ImageButton','HiddenField'],
['Bold','Italic','Underline','Strike','-','Subscript','Superscript'],
['NumberedList','BulletedList','-','Outdent','Indent','Blockquote'],
['JustifyLeft','JustifyCenter','JustifyRight','JustifyBlock'],
['Link','Unlink','Anchor'],
['Image','Flash','Table','HorizontalRule','Smiley','SpecialChar','PageBreak'],
['Styles','Format','Font','FontSize'],
['TextColor','BGColor'],
['Maximize','ShowBlocks','-','About']
],
'width': 650,
'height': 200,
'toolbarCanCollapse': False,
},
'simple_toolbar': {
'toolbar': [
[ 'Bold', 'Italic', 'Underline' ],
],
'width': 650,
'height': 50,
},
}
一个是默认的配置几乎是ckeditor的全部功能一个是简单的配置仅仅提供几个再简单不过的功能。更具体的配置参考CKEditor官方网站吧。
三 使用CKEidor
涉及到表单控件的我们习惯使用Django的forms。
我们在app_test里定义了一个forms类
#coding=utf-8
from django import forms
from ckedj.ckeditor.widgets import CKEditor #ckedj是定义的项目名
class BlogPostForm(forms.Form):
title = forms.CharField()
# This field will render as a CKEditor with the 'simple_toolbar' config.
subtitle = forms.CharField(widget=CKEditor(ckeditor_config='simple_toolbar'))
# This field will render as a CKEditor with the 'default' config.
body = forms.CharField(widget=CKEditor())
代码里subtitle使用简易的配置而body使用了默认的。
在html文件中注意要引用CKEditor的初始化脚本前提是保证静态文件能成功解析哦django-ckeditor说明文档中没有这个说明所以过程中折腾了半天别说我不提醒你...,初始化脚本ckeditor.js在ckeditor/ckeditor下。废话少说看代码
<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<script type="text/javascript" src="/site_media/ckeditor/ckeditor/ckeditor.js"></script>
</head>
<body>
<form>
<table border="0" cellspacing="0">
{{form.as_table}}
</table>
</form>
</body>
</html>
这样就能在Django中正常使用CKEditor了。如图

Some files were not shown because too many files have changed in this diff Show More