Browse Source

first commit

Pat Beirne 3 years ago
commit
e1c57a284e
3 changed files with 575 additions and 0 deletions
  1. 297 0
      tf.md
  2. 222 0
      tf.py
  3. 56 0
      tf_test.py

+ 297 - 0
tf.md

@@ -0,0 +1,297 @@
+# TF
+
+A module for manipulating files in the *MicroPython* environment.  
+
+## Oveview
+
+I discovered *MicroPython* when working on the ESP8266 processor. Everything seemed very nice, except it was awkward moving files around. All the methods I could find required a back-and-forth with the programmer's desktop.
+
+This **TF** module includes functions for creating, searching, editing and making backups of local files, using only the embedded processor. The module itself is small (about 7k) and can be downloaded into the target machine. Once there, the user can invoke it by either calling functions, or using the builtin command line. 
+
+For example, to make a backup, you can call  
+
+```
+    tf.cp('log.txt','log.2021-03-20.bak')
+```
+
+or you can use the builtin command line and
+
+```
+    cp log.txt log.2021-03-20.bak
+```
+
+The first half of the **TF** module holds the functions. These may come in handy for parsing files, making backups or searching through files. 
+
+The second half contains the simple command shell. This may come in handy for testing the functions, experimenting with their functions, or if you, like me, like to play around with a live system. If you don't need the shell, just delete everything from `def _help():` downward.
+
+## Functions
+
+These methods all belong to the **tf** module, so you would typically invoke the as members of tf: 
+
+```
+import tf
+tf.cp('log.txt','log.bak')`
+```
+
+#### cp()
+
+```
+   cp(src-filename, dest-filename)
+   in: src-filename    file to read
+       dest-filename   file to write
+   returns: Null
+```
+
+Simply copies a source file to a destination file. Filenames may include folders or . or .. prefixes. The destination is overwritten if it exists. This function reads-&-writes one line at a time, so it can handle megabyte files. Typical speeds are 100kB/sec on an ESP8266.
+
+#### cat()
+
+```
+    cat(filename, first=1, last=1000000, numbers=False, title=True)
+    in: filename    file to read and display
+        first       the first line to display
+        last        the last line to display
+        numbers     whether to prepend each line with line-number + space
+        title       whether to prepend the listing with the filename
+    return: Null
+```
+
+Displays the source file on the screen.  You can specify a line range, and whether line numbers are displayed, and whether to put a *title line* on the output display.  
+
+#### _dir()
+
+```
+    dir(directory-name='')
+    in:     directory-name     defaults to current directory
+    return: Null
+```
+
+Displays the contents of the current working directory. Files and folders are marked; ownership is assumed to be `all` and all are assumed to be `rwx` (read+write+execute). The file size is also shown and the disk size summary is shown at the bottom. 
+
+File dates are not displayed, as they all depend on the time from last reboot, and don't mean much in this environment.
+
+NOTE: the name is `_dir()` because `dir()` is a python builtin.
+
+#### grep()
+
+```
+    grep(filename, pattern, numbers=False)
+    in:  filename         the file to scan
+         pattern          a python regex to match
+         numbers          whether to prepend a line-number + space 
+    return: Null
+```
+
+You can search a file for a pattern, and any matching lines are displayed. 
+
+Searches using ^ (start of line) work fine, but searches with $ (end-of-line) aren't currently working.
+
+###### Examples
+
+```
+tf.grep('log.txt', '2021-03-\d\d')
+tf.grep('config.txt', 'user.\s=')
+tf.grep('config.ini', '\[\w*\]', numbers = True)
+```
+
+#### sed()
+
+The *sed* function is an inline file editor, based on `sed` from the Unix world. When invoked, it first renames the source file to have a `.bak` extension. That file is opened and each line of the source file is loaded in, and a regex pattern match is performed. If the line is changed/found/inserted, then the output is streamed to the new (output) file with the same name as the original; it appears to the user that the files is edited-in-place, with a .bak file created. 
+
+This version of `sed` has 6 commands:
+
+* a appends a line
+* i inserts a line
+* d deletes a line or lines
+* s does a search and replace
+* x does a grep and only saves lines that match
+* X does a grep and only saves lines that do not match
+
+If the single-letter command is preceded by a number or number-range, then the edit operation only applies to that line(s). A number range may be separated by `-` hyphen or `,` comma.
+
+##### Examples
+
+```
+12aMAX_FILE_NAME=255
+12iDEFAULT_DIR = "/"
+43-45d
+1,20s/^#\s*//
+```
+
+The x/X patterns are wrapped in a pair of delimiter characters, typically /, although any other character is allowed (except space). Valid X commands are:
+
+```
+x/abcd/
+10-20X/\w*\s*\d\d/
+x!ratio x/y!
+```
+
+Similarly, the s patterns are wrapped in a triplet of delimiter characters, typcially / also. Valid 's' commands are
+
+```
+s/toronto/Toronto/
+s/thier/their/
+10-120s/while\s(True|False)/while 1/
+s@ratio\s*=\s*num/denom@ratio = num/denom if denom else 0@
+```
+
+**Note**: you will need some free space on your disk, the same size as the source file, as a backup file is *always* made. To edit an 800k file, you should have 800k of free space.
+
+**Note**: The functions for
+
+* file delete (`rm, del`)
+* file move (`mv, move, rename`)
+* change/make/delete directory/folder (`chdir, mkdir, rmdir`)
+
+are not included in this list, because the `os` module already has functions that implement these directly: `os.remove(), os.rename(), os.chdir(), os.mkdir(), os.rmdir()`
+
+## Simple Command Line
+
+By invoking `tf.main()`, you will be presented a command prompt, similar to Linux, where the prompt shows you what directory/folder you are currently in, and a '$'. 
+
+From there, you can enter one of these commands:
+
+```
+cat   [-n] [-l<n>-<m>] <filename>
+cp    <src-file> <dest-file>  
+dir   [<dir name>]
+grep  <pattern> <filename>
+mkdir <foldername>
+sed   <pattern> <filename>
+mv    <src-file> <dest-file>
+rm    <filename>
+cd    [<dest dir>]
+mkdir <dirname>
+rmdir <dirname>
+help
+```
+
+You can also use `copy`, `move`, `del`, `list` and `ls` as synonyms for `cp`, `mv`, `rm`, `cat` and `dir` .  The `mv` can rename directories. 
+
+For the `cat/list` command, you can enable line numbers with `-n` and you can limit the display range with `-l n-m` where `n` and `m` are decimal numbers (and n should be less than m). These are all valid uses of `cat`
+
+```
+cat -n log.txt             # whole file
+cat -n -l223 log.txt       # one line  
+cat -l 223-239 log.txt     # 17 lines
+cat -l244-$ log.txt        # from 244 to the end
+```
+
+For `grep`  and `sed`, the patterns are *MicroPython* regular explressions, from the `re` module. If a pattern has a space character in it, then the pattern must be wrapped in  single-quote ' characters; patterns without an embedded space char can simply be typed. [The line parser is basically a `str.split()` unless a leading ' is detected.] To include a single quote in a quoted-pattern, you can escape it with \ .
+
+Here are some valid uses of `sed` and `grep`
+
+```text
+grep #define main.c
+grep '^\s*#define\s+[A-Z]' main.c
+sed 1,100s/recieve/receive/ doc.txt
+sed '33-$s/it is/it\'s/' doc.txt
+sed '45i   a new line of indented text' doc.txt
+```    
+
+The **REPL** typing-history is functional, so you can use the up-arrow to recall the last 4-5 commands. Use the left-arrow and backspace to correct lines that have errors.
+
+Commands with invalid syntax return a line of information, and are ignored. Non valid commands are simply eaten and ignored.
+
+## Limitations
+
+In its present form, the module has these limitations:  
+
+* filenames are limited to 255 chars
+* search patterns involving \ escapes may or may not work properly
+* the esp8266 implementation does not allow \1,\2 type pattern substitution
+* in the simple shell
+  * filenames must not have spaces
+  * patterns with spaces ***must*** be quoted
+  * the target of `cp`and `mv` *cannot* be a simple a directory-name as in Linux; write the whole filename *w.r.t,* the current directory
+* the complexity of pattern matching is limited. 
+  * try to format the grep patterns so they avoid deep stack recursion. For example, '([^#]|\\#)\*' has a very generous search term as the first half, and can cause deep-stack recursion. The equivalent '(\\#|[^#]\*)' is more likely to succeed.
+
+## Examples
+
+Make a simple change to a source file, perhaps modify a constant.
+
+```
+[function]  
+   tf.sed('main.py','10-30s/CITY_NAME = \'Toronto\'/CITY_NAME     = \'Ottawa\'/')  
+[command line]  
+   sed '10-30s/CITY_NAME = \'Toronto\'/CITY_NAME = \'Ottawa\'/' main.py  
+   sed 10-30s/Toronto/Ottawa/ main.py
+```
+
+Remove some comments from a source file.
+
+```
+[function]
+   tf.sed('main.py','X/^#\s*TODO:/')
+[command line]
+   sed X/^#\s*TODO:/ main.py
+```
+
+Search a log file for an incident
+
+```
+[function]
+   tf.grep('log.txt','^2021-02-12 16:\d\d',numbers=True)
+[command line]
+   grep [Ee]rror log.txt
+   grep '2021-02-12 16:\d\d' log.txt
+   [search and keep a record ]
+   cp log.txt log.details
+   sed 'x/2021-02-12 16:\d\d` log.details
+```
+
+## Installation
+
+Move the 'tf.py' file over to the target. You can use `webrepl` [command line program](https://github.com/micropython/webrepl)  or the **WEBREPL** [web page](https://micropython.org/webrepl/) .
+
+Once the module is present in the file system of the target, you can use the **REPL** command line interface to invoke it
+
+```
+ >>> import tf
+tf module loaded; members cp(), cat(), cd(), _dir(), grep() and sed()
+simple shell: cp/copy mv/move rm/del cat/list cd dir/ls mkdir rmdir grep sed help
+/$ 
+```
+
+This is the *simple command line*. You can type `dir` to get an idea of what's already in your flash file system, and `cat` to see the contents. [You'll probably find the files `boot.py` and `webrepl_cfg.py` are already installed]
+
+```
+  /$ dir
+ -rwx all       230 boot.py
+ -rwx all      2886 mail.log
+ -rwx all      2401 main.py
+ -rwx all      2259 main_test.py
+ -rwx all     99182 mqtt.log
+ -rwx all        98 test.py
+ drwx all         2 test_dir
+ -rwx all      6903 tf.py
+ -rwx all        15 webrepl_cfg.py
+ disk size:     392 KB   disk free: 212 KB
+ /$ cd test_dir
+ /test_dir$ dir
+ -rwx all        98 test.py
+ disk size:     392 KB   disk free: 212 KB
+ /test_dir$ 
+```
+
+If you don't need the *simple command line*, you can still use the methods listed above. Feel free to cut the `tf.py` module in half by deleting everything below the line
+
+```
+    def help():
+```
+
+## Performance
+
+Typical performance on an ESP8266 @80MHz, 90kB log file, 1200 lines; serial connected terminal @115200baud
+
+| operation            | time     | bytes/sec      |
+| --------------------:| --------:| --------------:|
+| copy                 | 5.7 s    | 16kB/s         |
+| cat                  | 8.3 s    | 10.8kB/s       |
+| grep                 | 7.5s     | 12kB/s         |
+| sed-append           | 6.4s     | 14KB/s         |
+| sed-search/replace   | 8.0-8.2s | 11.0-11.25kB/s |
+| sed-extract 30 lines | 2.5s     | 36kB/s         |
+
+**Note**: The copy() time is indicative of the flash-write speed. The grep() and cat() speeds are indicative of the serial rate, as all the characters must be sent through the UART at 115kbaud=11.5kB/s. The sed-extract() is faster, because it only writes 30 lines of text to the flash. The sed-append() is constrained by having to write the entire file.

+ 222 - 0
tf.py

@@ -0,0 +1,222 @@
+# tf  Text File manipulations
+#   for micropython and other tiny environments 
+
+#NOTE: the ESP8266 port cannot to \1,\2 type replacements in the s/search/replace/ operator
+import re,os,sys,gc
+
+def _file_scan(src,dest,start=1,end=0xFFFFFFFF,numbers=False,grep_func=None):
+  #src is a filename, dst is an open handle
+  i=0
+  try:
+    with open(src) as f:
+      for line in f:
+        i=i+1
+        if i<start or i>end:
+          continue
+        if grep_func and not grep_func(line):
+          continue
+        if numbers:
+          dest.write(str(i)+' ')
+        dest.write(line)
+  except:
+    print("could not open file {}".format(src))
+
+def cp(src_f, dst_f):
+  try:
+    with open(dst_f,'w') as g:
+      _file_scan(src_f,g)
+  except:
+    print("could not write to file {}".format(dst_f))
+
+def grep(filename, pattern, numbers=False):
+  m=re.compile(pattern)
+  if not m:
+    print("grep() called with invalid pattern")
+    return None
+  _file_scan(filename,sys.stdout,numbers=numbers,grep_func=(lambda x:m.search(x)))
+  print()
+
+def cat(filename, first=1, last=1000000, numbers=False, title=True):
+  if title:
+    print("===={}=====".format(filename))
+  _file_scan(filename,sys.stdout,first,last,numbers=numbers)
+  print()
+
+def sed(filename, sed_cmd, bak_ext="bak"):
+  #print("sed() called with sed_cmd=<{}>".format(sed_cmd))
+  # parse the sed_cmd
+  # group 1,3 are the n-start, n-end    group 4 is command
+  g=re.search("^(\d*)([,-](\d+|\$))?\s*([sdaixX].*)",sed_cmd)
+  if not g:
+    print("sed() failed; 2nd argument must be a number followed by one of sdaixX; no changes applied")
+    return 0,0
+  cmd=g.group(4)
+  #print("sed() cmd parsed into <{}>,<{}> and <{}>".format(g.group(1),g.group(3),g.group(4)))
+
+  start,end=(1,1000000)
+  if g.group(1):
+    start=end=int(g.group(1))
+  if g.group(3):
+    end=1000000 if g.group(3)=='$' else int(g.group(3))
+
+  op=cmd[0]
+  if op not in "sdiaxX":
+    print("sed requires an operation, one of 's,d,i,a,x or X'")
+    return 0,0
+  #print("sed command parser of <{}> returned {} {} {} {}".format(cmd,sr,de,ins,add))
+  if op in "sxX" and len(cmd)<2: 
+    print("invalid sed argument")
+    return (0,0)
+  if op=='s':
+    dl=cmd[1]
+    gs=re.search("s"+dl+"([^"+dl+"]*)"+dl+"([^"+dl+"]*)"+dl,cmd)
+    if not gs:
+      print("invalid sed search-and-replace pattern")
+      return (0,0)
+    s,r = gs.group(1),gs.group(2)
+    #print("search <{}> and replace <{}>".format(s,r))  
+    sp=re.compile(s) 
+  if op=='X' or op=='x':
+    dl=cmd[1]
+    gs=re.search("[xX]"+dl+"([^"+dl+"]*)"+dl,cmd)
+    if not gs:
+      print("invalid sed search pattern")
+      return (0,0)
+    sp=re.compile(gs.group(1)) 
+
+  extra=g.group(4)[1:] + '\n' 
+
+  try:
+    os.rename(filename,filename+'.'+bak_ext)
+  except:
+    print("problem with filename; backup failed; no changes made")
+    return (0,0)
+
+  i=h=0
+  try: 
+    with open(filename+'.'+bak_ext) as d:
+      with open(filename,'w') as f:
+        for lin in d:
+          i=i+1
+          m=(i>=start and i<=end)
+          if op=='s' and m:
+            if sp.search(lin): h+=1
+            lin=sp.sub(r,lin)
+          if op=='d' and m:
+            h+=1
+            continue   # delete line
+          if op=='i' and m:
+            #print("insert a line before {} <{}>".format(i,extra))
+            f.write(extra)
+            h+=1
+          if op in "aids":
+            f.write(lin)
+          elif (m and (op=='x' and sp.search(lin)) or (op=='X' and not sp.search(lin))):
+            f.write(lin)
+            h+=1
+          if op=='a' and m:
+            #print("append a line after {} <{}>".format(i,extra))       
+            f.write(extra)
+            h+=1
+        #f.write("--file modifed by sed()--\n")
+  except OSError:
+    print("problem opening file {}".format(filename))
+  except RuntimeError:
+    print("problem with the regex; try a different pattern")
+  return (i, h)
+
+def _dir(d=''):
+  try:  
+    for g in os.listdir(d):
+      s=os.stat(d+'/'+g)
+      print("{}rwx all {:9d} {}".format('d' if (s[0] & 0x4000) else '-',s[6],g))
+  except:
+    print("not a valid directory")
+  s=os.statvfs('/')
+  print("disk size:{:8d} KB   disk free: {} KB".format(s[0]*s[2]//1024,s[0]*s[3]//1024))
+
+
+'''-----cut here if you only need the above functions-----'''
+def _help():
+  print("simple shell v1.0")
+  print("  cp/copy <src-file> <dest-file>")
+  print("  mv/move <src-file> <dest-file>           rm/del <file>")
+  print("  cd [<folder>]       mkdir <folder>       rmdir <folder>")
+  print("  dir/ls [<folder>]")
+  print("  cat/list [-n] [-l <n>,<m>] <file>")
+  print("  grep <pattern> <file>")
+  print("  sed <pattern> <file>")
+  print("          where <pattern> is '[<n>,<m>] s/search/replace/' or '<n>[,<m>]d' or '<n>i<text>' or '<n>a<text' ")
+  print("file names must NOT have embedded spaces               options must be early on the command line")
+  print("search patterns with spaces require single-quotes      sed implements s/d/i/a/x/X")
+  print("sed does not work across line boundaries               sed s-patterns: non-/ delimiters are allowed")
+
+def parseQuotedArgs(st):
+  if st[0]=="'":
+    p=re.search("'((\'|[^'])*)'",st)
+    if not p:
+      print("quoted pattern error")
+      return ""
+    return p.group(1)
+  else:
+    return st.split()[0]
+
+def main():
+  print("simple shell: cp/copy mv/move rm/del cat/list cd dir/ls mkdir rmdir grep sed help")
+  while 1:
+    numbers=False
+    r=input(os.getcwd()+"$ ")
+    rp=r.split()
+    if not len(rp): continue
+    op=rp[0]
+    if op=='dir' or op=='ls':
+      _dir(rp[1] if len(rp)>1 else '')
+    elif op=='cat' or op=='list':
+      n=(" -n " in r) #print line-nums
+      s,e=(1,1000000) #start/end
+      g=re.search("\s+(-l\s*(\d+)([-,](\d+|\$)?)?)\s+",r[3:])
+      if g:
+        s=e=int(g.group(2))
+        if g.group(3):
+	      e=int(g.group(4)) if g.group(4) and g.group(4).isdigit() else 1000000
+      cat(rp[-1],s,e,numbers=n)
+    elif op=='grep':
+      if len(rp)<3:
+        print("grep pattern filename") 
+        continue
+      grep(rp[-1],parseQuotedArgs(r[5:]),numbers=True)
+    elif op=='sed':
+      if len(rp)<3:
+        print("sed pattern filename")
+        continue
+      lines, hits = sed(rp[-1],parseQuotedArgs(r[4:]))
+      print("Lines processed: {}  Lines modifed: {}".format(lines, hits))
+    elif op=='cd':
+      os.chdir(rp[1] if len(rp)>1 else '/')
+    elif op=='help':
+      _help()
+    else:
+      try:
+        if op=='cp' or op=='copy':
+          cp(rp[1],rp[2])
+        elif op=='mkdir':
+          os.mkdir(rp[1])
+        elif op=='rmdir':
+          os.rmdir(rp[1])
+        elif op=='mv' or op=='move':
+          os.rename(rp[1],rp[2])
+        elif op=='rm' or op=='del':
+          os.remove(rp[1])
+        else:
+          print("command not implemented")
+      except IndexError:
+        print("not enough argments; check syntax")
+      except OSError:
+        print("file not found")
+    gc.collect()
+  
+if __name__=="tf":
+  print("tf module loaded; members cp(), cat(), cd(), _dir(), grep() and sed()")
+  main()
+
+

+ 56 - 0
tf_test.py

@@ -0,0 +1,56 @@
+# tf_test  Unit test for Text File manipulations
+import tf
+
+# you should have a medium sized file called 'a' in '/'
+# and free space equivalent to 2x the size of 'a'
+
+def bench():
+  import time
+  a=time.ticks_us()
+  cp('a','b')
+  b=time.ticks_us()
+  print("time to copy={}".format((b-a)/1e6))
+  input("next")  
+
+  a=time.ticks_us()
+  grep('a','kernel')
+  b=time.ticks_us()
+  print("time to grep={}".format((b-a)/1e6))
+  input("next")  
+
+  a=time.ticks_us()
+  sed('a','s/kernel\s*/KERNEL /')
+  b=time.ticks_us()
+  print("time to sed-replace={}".format((b-a)/1e6))
+  input("next")  
+
+  os.remove('a.bak')
+  a=time.ticks_us()
+  cp('b','a')
+  b=time.ticks_us()
+  print("time to copy={}".format((b-a)/1e6))
+  input("next")
+
+  a=time.ticks_us()
+  sed('a','100-130x/(PM|AGP):/')
+  b=time.ticks_us()
+  print("time to sed-extract{}".format((b-a)/1e6))
+  input("next")
+
+  a=time.ticks_us()
+  cat('b', numbers=True)
+  b=time.ticks_us()
+  print("time to cat= {}".format((b-a)/1e6))
+  input("next")
+
+  os.remove('a.bak')
+  cp('b','a')
+
+  a=time.ticks_us()
+  sed('a', '100a!! a line of text!!')
+  b=time.ticks_us()
+  print("time to sed-insert= {}".format((b-a)/1e6))
+
+  os.remove('a.bak')
+
+