星期四, 十一月 19, 2020

KDD CUP 99数据集之特征描述


2, tcp, smtp, SF, 1684, 363, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 104, 66, 0.63, 0.03, 0.01, 0.00, 0.00, 0.00, 0.00, 0.00, normal.

0, tcp, private, REJ, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 38, 1, 0.00, 0.00, 1.00, 1.00, 0.03, 0.55, 0.00, 208, 1, 0.00, 0.11, 0.18, 0.00, 0.01, 0.00, 0.42, 1.00, portsweep.

0, tcp, smtp, SF, 787, 329, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 76, 117, 0.49, 0.08, 0.01, 0.02, 0.00, 0.00, 0.00, 0.00, normal.


1. TCP连接基本特征(共9种)


(1)duration. 连接持续时间,以秒为单位,连续类型。范围是 [0, 58329] 。它的定义是从TCP连接以3次握手建立算起,到FIN/ACK连接结束为止的时间;若为UDP协议类型,则将每个UDP数据包作为一条连接。数据集中出现大量的duration = 0 的情况,是因为该条连接的持续时间不足1秒。

(2)protocol_type. 协议类型,离散类型,共有3种:TCP, UDP, ICMP。

(3)service. 目标主机的网络服务类型,离散类型,共有70种。’aol’, ‘auth’, ‘bgp’, ‘courier’, ‘csnet_ns’, ‘ctf’, ‘daytime’, ‘discard’, ‘domain’, ‘domain_u’, ‘echo’, ‘eco_i’, ‘ecr_i’, ‘efs’, ‘exec’, ‘finger’, ‘ftp’, ‘ftp_data’, ‘gopher’, ‘harvest’, ‘hostnames’, ‘http’, ‘http_2784′, ‘http_443′, ‘http_8001′, ‘imap4′, ‘IRC’, ‘iso_tsap’, ‘klogin’, ‘kshell’, ‘ldap’, ‘link’, ‘login’, ‘mtp’, ‘name’, ‘netbios_dgm’, ‘netbios_ns’, ‘netbios_ssn’, ‘netstat’, ‘nnsp’, ‘nntp’, ‘ntp_u’, ‘other’, ‘pm_dump’, ‘pop_2′, ‘pop_3′, ‘printer’, ‘private’, ‘red_i’, ‘remote_job’, ‘rje’, ‘shell’, ‘smtp’, ‘sql_net’, ‘ssh’, ‘sunrpc’, ‘supdup’, ‘systat’, ‘telnet’, ‘tftp_u’, ‘tim_i’, ‘time’, ‘urh_i’, ‘urp_i’, ‘uucp’, ‘uucp_path’, ‘vmnet’, ‘whois’, ‘X11′, ‘Z39_50′。

(4)flag. 连接正常或错误的状态,离散类型,共11种。’OTH’, ‘REJ’, ‘RSTO’, ‘RSTOS0′, ‘RSTR’, ‘S0′, ‘S1′, ‘S2′, ‘S3′, ‘SF’, ‘SH’。它表示该连接是否按照协议要求开始或完成。例如SF表示连接正常建立并终止;S0表示只接到了SYN请求数据包,而没有后面的SYN/ACK。其中SF表示正常,其他10种都是error。

(5)src_bytes. 从源主机到目标主机的数据的字节数,连续类型,范围是 [0, 1379963888]。

(6)dst_bytes. 从目标主机到源主机的数据的字节数,连续类型,范围是 [0. 1309937401]。

(7)land. 若连接来自/送达同一个主机/端口则为1,否则为0,离散类型,0或1。

(8)wrong_fragment. 错误分段的数量,连续类型,范围是 [0, 3]。

(9)urgent. 加急包的个数,连续类型,范围是[0, 14]。

2. TCP连接的内容特征(共13种)

对于U2R和R2L之类的攻击,由于它们不像DoS攻击那样在数据记录中具有频繁序列模式,而一般都是嵌入在数据包的数据负载里面,单一的数据包和正常连接没有什么区别。为了检测这类攻击,Wenke Lee等从数据内容里面抽取了部分可能反映入侵行为的内容特征,如登录失败的次数等。

(10)hot. 访问系统敏感文件和目录的次数,连续,范围是 [0, 101]。例如访问系统目录,建立或执行程序等。

(11)num_failed_logins. 登录尝试失败的次数。连续,[0, 5]。

(12)logged_in. 成功登录则为1,否则为0,离散,0或1。

(13)num_compromised. compromised条件(**)出现的次数,连续,[0, 7479]。

(14)root_shell. 若获得root shell 则为1,否则为0,离散,0或1。root_shell是指获得超级用户权限。

(15)su_attempted. 若出现”su root” 命令则为1,否则为0,离散,0或1。

(16)num_root. root用户访问次数,连续,[0, 7468]。

(17)num_file_creations. 文件创建操作的次数,连续,[0, 100]。

(18)num_shells. 使用shell命令的次数,连续,[0, 5]。

(19)num_access_files. 访问控制文件的次数,连续,[0, 9]。例如对 /etc/passwd 或 .rhosts 文件的访问。

(20)num_outbound_cmds. 一个FTP会话中出站连接的次数,连续,0。数据集中这一特征出现次数为0。


(22)is_guest_login. 若是guest 登录则为1,否则为0,离散,0或1。

余下部分见:KDD CUP 99数据集之特征描述(下)



(**)“compromised condition”我理解为目标系统出现不正常的状态,例如文件或路径” not found “,或使用“jump to” 跳转指令等。



A data mining framework for constructing features and models for intrusion detection – by Wenke Lee

3. 基于时间的网络流量统计特征 (共9种,23~31)

由于网络攻击事件在时间上有很强的关联性,因此统计出当前连接记录与之前一段时间内的连接记录之间存在的某些联系,可以更好的反映连接之间的关系。这类特征又分为两种集合:一个是 “same host”特征,只观察在过去两秒内与当前连接有相同目标主机的连接,例如相同的连接数,在这些相同连接与当前连接有相同的服务的连接等等;另一个是 “same service”特征,只观察过去两秒内与当前连接有相同服务的连接,例如这样的连接有多少个,其中有多少出现SYN错误或者REJ错误。

(23)count. 过去两秒内,与当前连接具有相同的目标主机的连接数,连续,[0, 511]。

(24)srv_count. 过去两秒内,与当前连接具有相同服务的连接数,连续,[0, 511]。

(25)serror_rate. 过去两秒内,在与当前连接具有相同目标主机的连接中,出现“SYN” 错误的连接的百分比,连续,[0.00, 1.00]。

(26)srv_serror_rate. 过去两秒内,在与当前连接具有相同服务的连接中,出现“SYN” 错误的连接的百分比,连续,[0.00, 1.00]。

(27)rerror_rate. 过去两秒内,在与当前连接具有相同目标主机的连接中,出现“REJ” 错误的连接的百分比,连续,[0.00, 1.00]。

(28)srv_rerror_rate. 过去两秒内,在与当前连接具有相同服务的连接中,出现“REJ” 错误的连接的百分比,连续,[0.00, 1.00]。

(29)same_srv_rate. 过去两秒内,在与当前连接具有相同目标主机的连接中,与当前连接具有相同服务的连接的百分比,连续,[0.00, 1.00]。

(30)diff_srv_rate. 过去两秒内,在与当前连接具有相同目标主机的连接中,与当前连接具有不同服务的连接的百分比,连续,[0.00, 1.00]。

(31)srv_diff_host_rate. 过去两秒内,在与当前连接具有相同服务的连接中,与当前连接具有不同目标主机的连接的百分比,连续,[0.00, 1.00]。

注:这一大类特征中,23、25、27、29、30这5个特征是 “same host” 特征,前提都是与当前连接具有相同目标主机的连接;24、26、28、31这4个特征是 “same service” 特征,前提都是与当前连接具有相同服务的连接。

4. 基于主机的网络流量统计特征 (共10种,32~41)

基于时间的流量统计只是在过去两秒的范围内统计与当前连接之间的关系,而在实际入侵中,有些 Probing攻击使用慢速攻击模式来扫描主机或端口,当它们扫描的频率大于2秒的时候,基于时间的统计方法就无法从数据中找到关联。所以Wenke Lee等按照目标主机进行分类,使用一个具有100个连接的时间窗,统计当前连接之前100个连接记录中与当前连接具有相同目标主机的统计信息。

(32)dst_host_count. 前100个连接中,与当前连接具有相同目标主机的连接数,连续,[0, 255]。

(33)dst_host_srv_count. 前100个连接中,与当前连接具有相同目标主机相同服务的连接数,连续,[0, 255]。

(34)dst_host_same_srv_rate. 前100个连接中,与当前连接具有相同目标主机相同服务的连接所占的百分比,连续,[0.00, 1.00]。

(35)dst_host_diff_srv_rate. 前100个连接中,与当前连接具有相同目标主机不同服务的连接所占的百分比,连续,[0.00, 1.00]。

(36)dst_host_same_src_port_rate. 前100个连接中,与当前连接具有相同目标主机相同源端口的连接所占的百分比,连续,[0.00, 1.00]。

(37)dst_host_srv_diff_host_rate. 前100个连接中,与当前连接具有相同目标主机相同服务的连接中,与当前连接具有不同源主机的连接所占的百分比,连续,[0.00, 1.00]。

(38)dst_host_serror_rate. 前100个连接中,与当前连接具有相同目标主机的连接中,出现SYN错误的连接所占的百分比,连续,[0.00, 1.00]。

(39)dst_host_srv_serror_rate. 前100个连接中,与当前连接具有相同目标主机相同服务的连接中,出现SYN错误的连接所占的百分比,连续,[0.00, 1.00]。

(40)dst_host_rerror_rate. 前100个连接中,与当前连接具有相同目标主机的连接中,出现REJ错误的连接所占的百分比,连续,[0.00, 1.00]。

(41)dst_host_srv_rerror_rate. 前100个连接中,与当前连接具有相同目标主机相同服务的连接中,出现REJ错误的连接所占的百分比,连续,[0.00, 1.00]。


星期五, 九月 05, 2014

Forking vs Threading

So, finally after long time, i am able to figure out the difference between forking and threading :)
When i have been surfing around, i see a lots of threads/questions regarding forking and threading, lots of queries which one should be used in the applications. So i wrote this post which could clarify the difference between these two based on which you could decide what you want to use in your application/scripts.

What is Fork/Forking:

Fork is nothing but a new process that looks exactly like the old or the parent process but still it is a different process with different process ID and having  it’s own memory. Parent process creates a separate address space for child. Both parent and child process possess the same code segment, but execute independently from each other.
The simplest example of forking is when you run a command on shell in unix/linux. Each time a user issues a command, the shell forks a child process and the task is done.
When a fork system call is issued, a copy of all the pages corresponding to the parent process is created, loaded into a separate memory location by the OS for the child process, but in certain cases, this is not needed. Like in ‘exec’ system calls, there is not need to copy the parent process pages, as execv replaces the address space of the parent process itself.

Few things to note about forking are:

  • The child process will be having it’s own unique process ID.
  • The child process shall have it’s own copy of parent’s file descriptor.
  • File locks set by parent process shall not be inherited by child process.
  • Any semaphores that are open in the parent process shall also be open in the child process.
  • Child process shall have it’s own copy of message queue descriptors of the parents.
  • Child will have it’s own address space and memory.

Fork is universally accepted than thread because of the following reasons:

  • Development is much easier on fork based implementations.
  • Fork based code a more maintainable.
  • Forking is much safer and more secure because each forked process runs in its own virtual address space. If one process crashes or has a buffer overrun, it does not affect any other process at all.
  • Threads code is much harder to debug than fork.
  • Fork are more portable than threads.
  • Forking is faster than threading on single cpu as there are no locking over-heads or context switching.
Some of the applications in which forking is used are: telnetd(freebsd), vsftpd, proftpd, Apache13, Apache2, thttpd, PostgreSQL.

Pitfalls in Fork:

  • In fork, every new process should have it’s own memory/address space, hence a longer startup and stopping time.
  • If you fork, you have two independent processes which need to talk to each other in some way. This inter-process communication is really costly.
  • When the parent exits before the forked child, you will get a ghost process. That is all much easier with a thread. You can end, suspend and resume threads from the parent easily. And if your parent exits suddenly the thread will be ended automatically.
  • In-sufficient storage space could lead the fork system to fail.

What are Threads/Threading:

Threads are Light Weight Processes (LWPs). Traditionally, a thread is just a CPU (and some other minimal state) state with the process containing the remains (data, stack, I/O, signals). Threads require less overhead than “forking” or spawning a new process because the system does not initialize a new system virtual memory space and environment for the process. While most effective on a multiprocessor system where the process flow can be scheduled to run on another processor thus gaining speed through parallel or distributed processing, gains are also found on uniprocessor systems which exploit latency in I/O and other system functions which may halt process execution.

Threads in the same process share:

  • Process instructions
  • Most data
  • open files (descriptors)
  • signals and signal handlers
  • current working directory
  • User and group id

Each thread has a unique:

  • Thread ID
  • set of registers, stack pointer
  • stack for local variables, return addresses
  • signal mask
  • priority
  • Return value: errno

Few things to note about threading are:

  • Thread are most effective on multi-processor or multi-core systems.
  • For thread – only one process/thread table and one scheduler is needed.
  • All threads within a process share the same address space.
  • A thread does not maintain a list of created threads, nor does it know the thread that created it.
  • Threads reduce overhead by sharing fundamental parts.
  • Threads are more effective in memory management because they uses the same memory block of the parent instead of creating new.

Pitfalls in threads:

  • Race conditions: The big loss with threads is that there is no natural protection from having multiple threads working on the same data at the same time without knowing that others are messing with it. This is called race condition. While the code may appear on the screen in the order you wish the code to execute, threads are scheduled by the operating system and are executed at random. It cannot be assumed that threads are executed in the order they are created. They may also execute at different speeds. When threads are executing (racing to complete) they may give unexpected results (race condition). Mutexes and joins must be utilized to achieve a predictable execution order and outcome.
  • Thread safe code: The threaded routines must call functions which are “thread safe”. This means that there are no static or global variables which other threads may clobber or read assuming single threaded operation. If static or global variables are used then mutexes must be applied or the functions must be re-written to avoid the use of these variables. In C, local variables are dynamically allocated on the stack. Therefore, any function that does not use static data or other shared resources is thread-safe. Thread-unsafe functions may be used by only one thread at a time in a program and the uniqueness of the thread must be ensured. Many non-reentrant functions return a pointer to static data. This can be avoided by returning dynamically allocated data or using caller-provided storage. An example of a non-thread safe function is strtok which is also not re-entrant. The “thread safe” version is the re-entrant version strtok_r.

Advantages in threads:

  • Threads share the same memory space hence sharing data between them is really faster means inter-process communication (IPC) is real fast.
  • If properly designed and implemented threads give you more speed because there aint any process level context switching in a multi threaded application.
  • Threads are really fast to start and terminate.
Some of the applications in which threading is used are: MySQL, Firebird, Apache2, MySQL 323


1. Which should i use in my application ?
Ans: That depends on a lot of factors. Forking is more heavy-weight than threading, and have a higher startup and shutdown cost. Interprocess communication (IPC) is also harder and slower than interthread communication. Actually threads really win the race when it comes to inter communication. Conversely, whereas if a thread crashes, it takes down all of the other threads in the process, and if a thread has a buffer overrun, it opens up a security hole in all of the threads.
which would share the same address space with the parent process and they only needed a reduced context switch, which would make the context switch more efficient.
2. Which one is better, threading or forking ?
Ans: That is something which totally depends on what you are looking for. Still to answer, In a contemporary Linux (2.6.x) there is not much difference in performance between a context switch of a process/forking compared to a thread (only the MMU stuff is additional for the thread). There is the issue with the shared address space, which means that a faulty pointer in a thread can corrupt memory of the parent process or another thread within the same address space.
3. What kinds of things should be threaded or multitasked?
Ans: If you are a programmer and would like to take advantage of multithreading, the natural question is what parts of the program should/ should not be threaded. Here are a few rules of thumb (if you say “yes” to these, have fun!):
  • Are there groups of lengthy operations that don’t necessarily depend on other processing (like painting a window, printing a document, responding to a mouse-click, calculating a spreadsheet column, signal handling, etc.)?
  • Will there be few locks on data (the amount of shared data is identifiable and “small”)?
  • Are you prepared to worry about locking (mutually excluding data regions from other threads), deadlocks (a condition where two COEs have locked data that other is trying to get) and race conditions (a nasty, intractable problem where data is not locked properly and gets corrupted through threaded reads & writes)?
  • Could the task be broken into various “responsibilities”? E.g. Could one thread handle the signals, another handle GUI stuff, etc.?


  1. Whether you have to use threading or forking, totally depends on the requirement of your application.
  2. Threads more powerful than events, but power is not something which is always needed.
  3. Threads are much harder to program than forking, so only for experts.
  4. Use threads mostly for performance-critical applications.


  1. http://en.wikipedia.org/wiki/Fork_(operating_system)
  2. http://tldp.org/FAQ/Threads-FAQ/Comparison.html
  3. http://www.yolinux.com/TUTORIALS/LinuxTutorialPosixThreads.html
  4. http://linas.org/linux/threads-faq.html

星期四, 七月 31, 2014









  • 諱,用於已故君王或尊長的名字前,表尊重。諱為形聲字,從言韋聲。韋依草書字形簡化為韦,則諱類推為讳。
  • 禮,形聲字,從示豊聲,礼為其古字,今用作簡化字。

2、琅琊:又作琅邪,音láng yá,國內有琅琊山數處,此處則指秦置琅琊郡。秦時,在古琅琊邑置琅琊郡。東漢,琅琊郡改為琅琊國,治開陽(今臨沂老城)。琅琊臨沂(即琅琊國臨沂縣之簡稱)自此始。歷史上很多望族以琅琊為郡望,如東晉“王謝”的王家。諸葛亮也是山東琅琊人,為避戰亂隨叔父遷到荊州。據記載顏氏始祖是孔門七十二賢之首的顏回。顏子後裔世居於魯之曲阜,至二十四代嫡孫顏盛,遷琅琊臨沂孝悌里。顏氏後人皆自稱“琅琊臨沂人”。

  • 見,會意字,從目從儿(人),今依草書字形簡化為见。
  • 遠,形聲字,从辵(chuò,俗稱走之,即辶),袁声。簡化字把聲旁袁置換為元,即远,成為新形聲字。


  • 齊,依草書筆意簡化為齐。
  • 慟,本義為大哭,《說文》:“大哭也。”今類推簡化為恸。


  • 恊,“協”之俗寫,音xié,宋孫奕《履齋示兒編》引《字譜總論訛字》云:“博、協皆從十,俗皆從忄。”《顏氏家廟碑》作“協”,今簡化為协,以二點代指二力字。


  • 東,會意字,太陽昇起,懸於樹中,其方向即為東,今依草書字形簡化為东。
  • 記,形聲字,從言己聲,類推簡化為记。
  • 㕘:《康熙字典》引《廣韻》:參俗作叅。而顏真卿手寫為㕘(心字底寫作小)。正字為參,會意字,人字頭上有三星,下加義符彡(shān),表示星光閃耀,字義為星名,二十八宿之一。今依草書筆意簡化為参。
  • 軍:會意字,從車從勹(bāo,包裹),古代車戰中以車圍作營壘。今類推簡化為军。
  • 傳:形聲兼會意字,從人專聲,專兼表轉動義,字義為驛站。《顏勤禮碑》中,無論作為單字還是偏旁,均省寫作。傳今類推簡化為传。



  • 門,象形字。今依行書字形簡化為门,二處門樞簡略為一點一折。


  • 學,本字為壆,後省土,加義符子,為學。簡化字取其草書字形,作学。簡化字学字頭切勿與尚字頭(如賞裳黨當)混淆。以學為聲符的字如黌、覺等。



  • 爲,異體字作為,今依草書字形簡化作为。
  • 長,象形字,今依草書字形簡化為长。













  • 隋:楊堅襲父爵為隨國公,開國後立國號為隨,因隨字有走義,故去“辶”為隋。儘管如此,隋僅享國祚短短三十八年。
  • 經:本字為巠,後加形旁為經,異體作𦀇、経,簡化字為“经”。經字本義為織布時的縱向紗線,與“緯”相對,引申指經久不變的法則,再引申為堪為思想、道德、行為規範的著作。


  • 寧:本字為“寍”,从宀(mián,房子),从心,从皿,會有住有吃心乃安之意。加“丂”為“寧”。另有甯字,用於“甯願”,同“寧願”,另為姓。今皆從民間白字簡化為“宁”。但正字中本有“宁”念作zhù,是“貯”之本字,因雀占鳩巢,貯、佇字形被迫變為贮、伫。
  • 讀:賣以草書筆意簡化為“卖”,則讀類推簡化為“读”。


  • 與:從与,從舁(yú)。《說文》視“與”和“与”為兩個不同的字,今人考證“与”為其古字。顏魯公在數通碑帖有均有混用二字處,《顏勤禮碑》中有一處寫作“与”(見下文)。今簡化字採用古字“与”。
  • 國:從“囗”(wéi,表邊界圍繞),從或。“或”與“域”是一個字,為“國”之本字。異體或作“囻”,太平天國造“囯”字,今簡化為“国”。
  • 劉:會意字,從卯(表剖分),從金,從刀,本義為砍殺。另為姓。1935年民國《簡體字表》收錄“刘”,淵源不詳。
  • 辯:形聲字,從言,辡(biǎn)聲,本義爭論。另,辨,從刀辡聲,本義為區分,辨別。
  • 論:形聲字,從言,侖聲。以草書字形簡化為“论”。
  • 義:會意字,《說文》:“己之威儀也,从我羊。”1935年民國《簡體字表》收錄“义”,或為民間俗寫。

  • 屢:形聲字,從尸(房屋之象形,非指身體),婁聲。屢本義為樓房,由樓房層層相連,引申為連續,而樓房之義另造形聲字“樓”來表達。婁依草書字形簡化為“娄”,則屢類推簡化為“屡”。



  • 將:會意兼形聲字,谷衍奎《漢字源流詞典》:“從肉,爿(qiáng)聲……另加義符又(寸)……本義當爲奉獻祭享。”顏魯公把該字右側寫作“寽”。簡化字為“将”,把聲符爿簡化,把斜月(即肉)中的二點去掉一點,想必是當代的將軍肚太多,急需減肥。










  •  云:象形字,像雲彩迴旋狀,本義為“山川氣也”。該字衍生出“說”義後,本義採用新造形聲字“雲”,而“云”字專用於“言說”義,明確分工。簡化字將二義重新歸併為“云”字。無論是繁簡字,“人雲亦雲”均是錯誤的。



  • 祕:形聲字,從示必聲,義符示為祭祀神靈義。現本字“祕”被棄,其俗寫“秘”反行世。《康熙字典》:“《正字通》从示从必。俗从禾作秘,譌。”
  • 閣:形聲字,從門各聲,今類推簡化為“阁”。

  • 時:形聲字,從日寺聲。今依草書字形簡化為“时”。
  • 選:形聲字,從辵巽聲。1935年民國《簡體字表》有“选”,為新造形聲字,今用作簡化字。

  • 業:會意字,从丵(zhuó)从巾。今取其上部 “业”為簡化字。實則古字“㐀”為古“丘”字。顏真卿《元次山碑》有“前是泌南戰士積骨者,君悉收瘞,刻石立表,命之曰哀丘”句,其中“哀丘”即寫作“哀业”。參見:http://blog.sina.com.cn/s/blog_4b013c240100fswv.html
  • 優:形聲字,從人憂聲。簡化字“优”為新造形聲字。
  • 職:形聲字,從耳戠(zhí)聲,本義為聽而記之,故以耳為形旁。異體字寫作“軄”,改用“身”為義符。《康熙字典》:“軄,《玉篇》俗職字。”簡化字“职”為新造形聲字。






  • 識:形聲字,從言,戠(zhí)聲。簡化字“识”以“只”作聲旁,為新造形聲字。
  • 弘:形聲字,從弓厶(gōng),弓響之大聲,義廣大。今多寫作“宏”。






  • 從:古字為象形字“从”,後加義符辵(chuò),篆文為 ,楷化作從。簡化字則恢復古字,作“从”。
  • 議:形聲字,從言義聲。簡化字類推為“议”。
  • 勳:形聲字,金文作“勛”,從力員聲。篆文為“勳”,改聲旁為熏。楷化後有勛、勳而字形。今依古文字形類推簡化為“勋”。



  • 輕車都尉,從四品勳官,非實職。“兼直秘書省”的“直”,即“值”的本字,當值之義。
  • 參軍事,又稱參軍,即參謀軍務之義,大概相當於今天的參謀。
  • 著作佐郎:品級為從六品上,比校書郎高升了好幾級!
  • 詹事主簿:詹事府是太子東宮的最高行政機構,詹事主簿是負責來往政務文書的收發、審核、用印的職務,品級為從七品上。
  • 太子內直監:正六品,此時的太子是太宗長子李承乾,該職務的權責暫未詳。
  • 崇賢館學士:貞觀十三年(639年)設置崇賢館,歸東宮直轄。上元二年(675年)因避太子李賢名,改為崇文館。置學士掌經籍圖書,教授生徒。學生均為皇族貴戚及高級京官子弟。
  • 東宮太子是皇帝的接班人,置太師,太傅,太保及少師,少傅,少保,專司訓導。此外,在其府下還有一套文武人馬,其制擬中央官制,門下坊擬門下省,統領司經、宮門、內直、典膳、藥藏、齋帥等六局。典書坊擬內史省,下文的“太子通事舍人”即任職典書坊。家令寺、率更令寺、僕寺,制擬中央諸寺諸監。歐陽詢便曾任太子率更令,故世稱歐陽率更。

  • 館:形聲字,從食官聲,本義為招待食宿的客舍。異體字“舘”以“舍”為形旁,強調其屋舍的屬性。今簡化字依偏旁類推為“馆”。

  • 廢:形聲字,从广(yǎn)發聲,本義為房屋傾倒。今類推簡化為“废”。
  • 補:形聲字,從从衣甫聲。今簡化字“补”以“卜”為聲旁,為新造形聲字。



  • 藝:本字為埶,後加艸為蓺,再加云為藝。本義為種植。簡化字“艺”為新造形聲字。
  • 獎:形聲字,從犬,將聲,本義為發聲驅使犬隻。後形旁犬訛變為大,簡化字再省寸,字形為“奖”。


  • 遷:形聲字,從辵䙴聲,本義為移動,引申為官場升職。簡化字為“迁”,以千為簡化聲符,為新造形聲字。


  • 無:《说文解字》以“无”為其古字。無為象形字,象人持器具舞蹈狀,即舞之本字。篆文以“舞”表舞蹈,以“無”表沒有之義。今簡化字“无”取其古字。



星期四, 三月 20, 2014

The 17 Equations That Changed The Course Of History

Mathematics is all around us, and it has shaped our understanding of the world in countless ways.
In 2013, mathematician and science author Ian Stewart published a book on 17 Equations That Changed The World. We recently came across this convenient table on Dr. Paul Coxon's twitter account by mathematics tutor and blogger Larry Phillips that summarizes the equations. (Our explanation of each is below):
Here is a little bit more about these wonderful equations that have shaped mathematics and human history:
pythagorean theorem chalkboard
Shutterstock/ igor.stevanovic
1) The Pythagorean Theorem: This theorem is foundational to our understanding of geometry. It describes the relationship between the sides of a right triangle on a flat plane: square the lengths of the short sides, a and b, add those together, and you get the square of the length of the long side, c.
This relationship, in some ways, actually distinguishes our normal, flat, Euclidean geometry from curved, non-Euclidean geometry. For example, a right triangle drawn on the surface of a sphere need not follow the Pythagorean theorem.
2) Logarithms: Logarithms are the inverses, or opposites, of exponential functions. A logarithm for a particular base tells you what power you need to raise that base to to get a number. For example, the base 10 logarithm of 1 is log(1) = 0, since 1 = 100; log(10) = 1, since 10 = 101; and log(100) = 2, since 100 = 102.
The equation in the graphic, log(ab) = log(a) + log(b), shows one of the most useful applications of logarithms: they turn multiplication into addition.
Until the development of the digital computer, this was the most common way to quickly multiply together large numbers, greatly speeding up calculations in physics, astronomy, and engineering. 
3) Calculus: The formula given here is the definition of the derivative in calculus. The derivative measures the rate at which a quantity is changing. For example, we can think of velocity, or speed, as being the derivative of position — if you are walking at 3 miles per hour, then every hour, you have changed your position by 3 miles.
Naturally, much of science is interested in understanding how things change, and the derivative and the integral — the other foundation of calculus — sit at the heart of how mathematicians and scientists understand change.
Isaac Newton
Isaac Newton
4) Law of Gravity: Newton's law of gravitation describes the force of gravity between two objects, F, in terms of a universal constant, G, the masses of the two objects, m1 and m2, and the distance between the objects, r. Newton's law is a remarkable piece of scientific history — it explains, almost perfectly, why the planets move in the way they do. Also remarkable is its universal nature — this is not just how gravity works on Earth, or in our solar system, but anywhere in the universe.
Newton's gravity held up very well for two hundred years, and it was not until Einstein's theory of general relativity that it would be replaced.
5) The square root of -1: Mathematicians have always been expanding the idea of what numbers actually are, going from natural numbers, to negative numbers, to fractions, to the real numbers. The square root of -1, usually written i, completes this process, giving rise to the complex numbers.
Mathematically, the complex numbers are supremely elegant. Algebra works perfectly the way we want it to — any equation has a complex number solution, a situation that is not true for the real numbers : x2 + 4 = 0 has no real number solution, but it does have a complex solution: the square root of -4, or 2i. Calculus can be extended to the complex numbers, and by doing so, we find some amazing symmetries and properties of these numbers. Those properties make the complex numbers essential in electronics and signal processing.
6) Euler's Polyhedra Formula: Polyhedra are the three-dimensional versions of polygons, like the cube to the right. The corners of a polyhedron are called its vertices, the lines connecting the vertices are its edges, and the polygons covering it are its faces.
A cube has 8 vertices, 12 edges, and 6 faces. If I add the vertices and faces together, and subtract the edges, I get 8 + 6 - 12 = 2.
Euler's formula states that, as long as your polyhedron is somewhat well behaved, if you add the vertices and faces together, and subtract the edges, you will always get 2. This will be true whether your polyhedron has 4, 8, 12, 20, or any number of faces.
Euler's observation was one of the first examples of what is now called a topological invariant — some number or property shared by a class of shapes that are similar to each other. The entire class of "well-behaved" polyhedra will have V + F - E = 2. This observation, along with with Euler's solution to the Bridges of Konigsburg problem, paved the way to the development of topology, a branch of math essential to modern physics.
bell curve
The normal distribution.
7) Normal distribution: The normal probability distribution, which has the familiar bell curve graph to the left, is ubiquitous in statistics.
The normal curve is used in physics, biology, and the social sciences to model various properties. One of the reasons the normal curve shows up so often is that it describes the behavior of large groups of independent processes.
8) Wave Equation: This is a differential equation, or an equation that describes how a property is changing through time in terms of that property's derivative, as above. The wave equation describes the behavior of waves — a vibrating guitar string, ripples in a pond after a stone is thrown, or light coming out of an incandescent bulb. The wave equation was an early differential equation, and the techniques developed to solve the equation opened the door to understanding other differential equations as well.
9) Fourier Transform: The Fourier transform is essential to understanding more complex wave structures, like human speech. Given a complicated, messy wave function like a recording of a person talking, the Fourier transform allows us to break the messy function into a combination of a number of simple waves, greatly simplifying analysis.
 The Fourier transform is at the heart of modern signal processing and analysis, and data compression. 
10) Navier-Stokes Equations: Like the wave equation, this is a differential equation. The Navier-Stokes equations describes the behavior of flowing fluids — water moving through a pipe, air flow over an airplane wing, or smoke rising from a cigarette. While we have approximate solutions of the Navier-Stokes equations that allow computers to simulate fluid motion fairly well, it is still an open question (with a million dollar prize) whether it is possible to construct mathematically exact solutions to the equations.
11) Maxwell's Equations: This set of four differential equations describes the behavior of and relationship between electricity (E) and magnetism (H).
Maxwell's equations are to classical electromagnetism as Newton's laws of motion and law of universal gravitation are to classical mechanics — they are the foundation of our explanation of how electromagnetism works on a day to day scale. As we will see, however, modern physics relies on a quantum mechanical explanation of electromagnetism, and it is now clear that these elegant equations are just an approximation that works well on human scales.
12) Second Law of Thermodynamics: This states that, in a closed system, entropy (S) is always steady or increasing. Thermodynamic entropy is, roughly speaking, a measure of how disordered a system is. A system that starts out in an ordered, uneven state — say, a hot region next to a cold region — will always tend to even out, with heat flowing from the hot area to the cold area until evenly distributed.
The second law of thermodynamics is one of the few cases in physics where time matters in this way. Most physical processes are reversible — we can run the equations backwards without messing things up. The second law, however, only runs in this direction. If we put an ice cube in a cup of hot coffee, we always see the ice cube melt, and never see the coffee freeze.
Albert Einstein
13) Relativity: Einstein radically altered the course of physics with his theories of special and general relativity. The classic equation E = mc2 states that matter and energy are equivalent to each other. Special relativity brought in ideas like the speed of light being a universal speed limit and the passage of time being different for people moving at different speeds.
General relativity describes gravity as a curving and folding of space and time themselves, and was the first major change to our understanding of gravity since Newton's law. General relativity is essential to our understanding of the origins, structure, and ultimate fate of the universe.
14) Schrodinger's Equation: This is the main equation in quantum mechanics. As general relativity explains our universe at its largest scales, this equation governs the behavior of atoms and subatomic particles.
Modern quantum mechanics and general relativity are the two most successful scientific theories in history — all of the experimental observations we have made to date are entirely consistent with their predictions. Quantum mechanics is also necessary for most modern technology — nuclear power, semiconductor-based computers, and lasers are all built around quantum phenomena.
15) Information Theory: The equation given here is for Shannon information entropy. As with the thermodynamic entropy given above, this is a measure of disorder. In this case, it measures the information content of a message — a book, a JPEG picture sent on the internet, or anything that can be represented symbolically. The Shannon entropy of a message represents a lower bound on how much that message can be compressed without losing some of its content.
Shannon's entropy measure launched the mathematical study of information, and his results are central to how we communicate over networks today.
16) Chaos Theory: This equation is May's logistic map. It describes a process evolving through time — xt+1, the level of some quantity x in the next time period — is given by the formula on the right, and it depends on xt, the level of x right now. k is a chosen constant. For certain values of k, the map shows chaotic behavior: if we start at some particular initial value of x, the process will evolve one way, but if we start at another initial value, even one very very close to the first value, the process will evolve a completely different way.
We see chaotic behavior — behavior sensitive to initial conditions — like this in many areas. Weather is a classic example — a small change in atmospheric conditions on one day can lead to completely different weather systems a few days later, most commonly captured in the idea of a butterfly flapping its wings on one continent causing a hurricane on another continent
17) Black-Scholes Equation: Another differential equation, Black-Scholes describes how finance experts and traders find prices for derivatives. Derivatives — financial products based on some underlying asset, like a stock — are a major part of the modern financial system.
The Black-Scholes equation allows financial professionals to calculate the value of these financial products, based on the properties of the derivative and the underlying asset.

Read more: http://www.businessinsider.com/17-equations-that-changed-the-world-2014-3#ixzz2wVdPUf4h